Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative.

Similar presentations


Presentation on theme: "Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative."— Presentation transcript:

1 Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative strategies Iterative searching Reverse searching

2 Lecture 7 CS5662 “Words” for Dot-matrix analysis Useful ideas from DM Alignment –Diagonal represents local match –Broken diagonal = intervening mismatch –Displaced diagonals = Matches with gaps Advantage of using word-based alignment –Faster algorithm Word-list comparison faster than sequence comparison Hashes used for rapid comparison of words “Devil is in the details”

3 Lecture 7 CS5663 FASTA (Fast-All) Motivation: Needed rapid PSA method to search databases for matches to query sequence (1:n comparisons) ktup (k-tuple or word) based alignment –Create hash tables for sequences –Find matching ktups (“hot-spots”/short diagonals) in pair of sequences ktup size = 2 for protein (6 for DNA)

4 Lecture 7 CS5664 FASTA Find 10 best “diagonal-runs” –Group hot-spots by the (i-j) diagonal they lie in Main diagonal numbered 0; Positive diagonals lie above main diagonal, negative lie below –Diagonal-run = set of consecutive (not necessarily contiguous) hot-spots, penalized by size of intervening mismatch –Save top 10 diagonal runs

5 Lecture 7 CS5665 FASTA Find init1 –Init1 = best contiguous subsequence from top 10 diagonal runs, based on AAS (default BLOSUM50) Define local search space around init1 –Include (32 / ktup) +/- diagonals in search space For ktup = 2, 16 diagonals around init1 Perform Smith-Waterman PSA in reduced space –Report resulting alignment as opt

6 Lecture 7 CS5666 BLAST (Basic local alignment search tool) Built upon ideas derived from FASTA, with incorporation of new elements For every word in query, generate set of words –Use AAS for similarity score between query word and all possible words of same size –Include all words exceeding cut-off in set –Example: For word DED, and threshold 0, word set includes DED, DDD, EEE, EDE etc. For every query word, generate hot-spots based on set of similar words Then merge contiguous words along same diagonal (a la FASTA) to form High Scoring Pairs (HSPs)

7 Lecture 7 CS5667 FASTA versus BLAST Word matching exact in FASTA but inexact (AAS-based) in BLAST Larger word size in BLAST FASTA more sensitive (Why?) but slower (Why?) BLAST handles “low-complexity” inline –Programs DUST and/or SEG used for filtering sequences

8 Lecture 7 CS5668 Variations on BLAST-based searching Mapping query to different alphabets –Protein versus DNA, –DNA versus protein (Multiple reading frames) PSI-BLAST: Position-specific iterative BLAST –Use query to find hits –Assemble hits into on-the-fly Position-specific-scoring matrix (PSSM) RPS-BLAST: Reverse position-specific BLAST –Query is search space –Database of PSSMs used to search for match


Download ppt "Lecture 7 CS5661 Heuristic PSA “Words” to describe dot-matrix analysis Approaches –FASTA –BLAST Searching databases for sequence similarities –PSA –Alternative."

Similar presentations


Ads by Google