Presentation is loading. Please wait.

Presentation is loading. Please wait.

paper study for class presentation on Nov16th, 2005 slider by 陳奕先

Similar presentations


Presentation on theme: "paper study for class presentation on Nov16th, 2005 slider by 陳奕先"— Presentation transcript:

1 paper study for class presentation on Nov16th, 2005 slider by 陳奕先
tPatternHunter: gapped, fast and sensitive translated homology search Derek Kisman, Ming Li, Bin Ma, Li Wang Bioinformatics, 21(4): February 2005 paper study for class presentation on Nov16th, 2005 slider by 陳奕先

2 tPatternHunter "t" for translated search
what issue we'll meet when trying to apply PatternHunter technique on translated search? Protein has 20 different letters, much more than DNA's 4 letters 3 DNA letters makes a codon. at the hit extension stage, a DNA gap may cause a frameshift,

3 Protein has 20 different letters, much more than DNA's 4 letters
the space complexity of the hash table will be significantly larger than for DNA sequence PatternHunter used weight-11 seeds for DNA sequence. How big the seeds we should use for protein? 11 * log 4 = * log 20 = 6.51 tPH uses weight-5 spaced seeds (the default seed is )

4 only the five letters at the "1" position are checked for hits.
using BLOSUM 62 scores to evaluate. a "Hit": all five position has value at least 0, and the total score above a threshold T

5 Blosum62 Scoring Matrix

6 And the issue about frameshift ?
when performing DNA-protein or DNA-DNA search...... tPH regards the DNA sequences as a sequence of overlapped codons. T T T G C A F L C A

7

8 To improve the sensitivity, we can use not only one seed.
The default of tPH uses four weight-5 seeds (length 6 or 7), and threshold T=20 for BLOSUM62 how fast and how sensitive tPH is ???

9

10

11


Download ppt "paper study for class presentation on Nov16th, 2005 slider by 陳奕先"

Similar presentations


Ads by Google