Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,

Similar presentations


Presentation on theme: "Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,"— Presentation transcript:

1 Database Searches BLAST

2 Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer, Zhang, Zhang, Miller, Lipman, Nucleic Acids Res. 25 (1997) Main ideas: –Increase search speed by finding fewer, but better, hot spots during initial screening phase –Uses longer word sizes –Integrate scoring matrix into first phase Compare with FASTA, which requires exact matches

3 BLAST Terminology Segment pair: equal-length substrings of sequences S 1 and S 2 Locally maximal segment pair: segment pair whose alignment score cannot be improved by extending or shortening it Maximum segment pair (MSP) = segment pair with maximum score over all segment pairs in the sequences S 1 and S 2 High-scoring segment pair (HSP): A segment pair with score higher than some cutoff score, s. w is the length parameter; t is the threshold parameter

4 BLAST: Hits A hit is a w-length word in the database that aligns with a word from the query sequence with score > t BLAST looks for hits instead of exact matches –Allows word size to be kept high for speed, without sacrificing sensitivity Typically, w = 3-5 for amino acids, ~11-12 for DNA t is the most critical parameter: – ↑ t  ↓ “background” hits (faster) – ↓ t  ↑ ability to detect more distant relationships (at cost of increased noise

5 Hits For each word, evaluate score of match (exact or not) according to BLOSUM62 –E.g., for PQG, score is 7+5+6 = 18 There are 20 w possible w-length words, but considering only those with score > t, greatly reduces number of matches –E.g., there are 20 3 = 8000 possible matches to PQG, but only 50 achieve score > t = 13

6 BLAST

7 Extending a hit After locating a hit, BLAST attempts to extend hit in both directions, until score has drops more than X below the maximum score yet attained. Extension step typically accounts for > 90% of execution time.

8 Extending a hit

9 Improvement: 2-hit method Do extensions only when there are two hits on the same diagonal within some distance A of each other (e.g., A =40) Reduces sensitivity (ability to detect distantly related sequences) –To compensate, use lower t value (e.g., 11 rather than 13) Since we only extend when there are two nearby hits, many fewer regions are extended

10 Gapped BLAST Allows local alignments with indels (similar to FASTA) Local alignments from different diagonal are merged into a different local alignment followed by some indels followed by a second local alignment, etc. –equivalent to a path through the dynamic programming matrix composed of alternating diagonal sections and paths connecting them

11 Gapped BLAST Original BLAST implicitly handled gaps by finding several distinct HSPs and calculating a statistical assessment of the combined result –Two or more HSPs each below the cutoff value might in combination rise to statistical significance Gapped BLAST, extend hits by allowing gaps when hits are promising (exceed s g ): –Advantage: We can afford to miss some HSPs as long as at least one is found Use dynamic programming, starting from center of each high-scoring region if s > s g –s g is chosen such that gapped alignment is triggered in about 1/50 of the sequences compared

12 PSI-BLAST Position-Specific Iterated BLAST Generates a multiple alignment from statistically significant alignments produced by BLAST Produces a position-specific score matrix (PSSM) –Can search the database using the PSSM –Match sequences to profile –Generate new profiles –Repeat (iteration) –Search gradually extends to increasingly divergent sequences

13 Flavors of BLAST BLASTP - protein query against protein DB BLASTN - DNA/RNA query against GenBank (DNA) BLASTX - 6 frame trans. DNA query against proteinDB TBLASTN - protein query against 6 frame GB transl. TBLASTX - 6 frame DNA query to 6 frame GB transl. PSI-BLAST - protein ‘profile’ query against protein DB PHI-BLAST - protein pattern against protein DB


Download ppt "Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,"

Similar presentations


Ads by Google