Presentation is loading. Please wait.

Presentation is loading. Please wait.

FASTA and BLAST Chitta Baral. FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals.

Similar presentations


Presentation on theme: "FASTA and BLAST Chitta Baral. FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals."— Presentation transcript:

1 FASTA and BLAST Chitta Baral

2 FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals (or high scoring regions) Step 2: –Score the 10 best diagonal runs using a scoring matrix. (allow mismatches, end extensions, joining of two diagonals; but no gaps) –(init1: single best sub-alignment found in this stage.) Step 3: –Merge non-overlapping diagonal runs to allow gaps (ins/del). –Score of joined regions = sum of individual scores – penalty –Score of the highest scoring region at the end of this step is called initn. Step 4: – Use a variant of Smith-Waterman algorithm on a narrow band around initn and construct an optimal alignment of this region. Modifications: –In Step 4, use a band around init1.

3 BLAST: basic steps Step 1 : –Set a word size (3 for protein and 11 for DNA); Create a word list for the query sequence –Eg. qlnfsagw  {ql, ln, nf, fs, sa, ag, gw} –Expand the list (using a threshold T, say 8) ql: ql, qm, hl, zl ln: ln, lb nf: nf, af, ny,df,qf, ef, gf, hf, kf, sf, tf, bf, zf fs: fs, fa fn, fd, fg, fp, ft, fb, ys sa: none ag: ag gw: gw, aw, rw, nw, dw, qw, ew, hw, iw, kw, mw, pw, sw, tw, vw, bw, zw, xw Step 2 –Scan through the string and whenever a word in the list is found try to extend it in both directions (no gaps) to get to a score beyond a threshold S. While extending use a parameter L that defines how long an extension will be tried to raise the score over S. Modifications of Step 2 : –Original BLAST: extension is continued as long as the score continues to increase –Another version: extension is stopped when the accumulated score stops increasing and has just begun to fall a certain amount below the best score found. –Blast2 (gapped BLAST) Lower value of T is used After extension try to combine (allowing gaps) Find maximal scoring segment. Use Smith-Waterman algorithm around a band of this segment (as in FASTA)

4 Home Work (due 3/31/03) Compare BLAST and FASTA. (Hint: Read the external pointers in the class notes page.)


Download ppt "FASTA and BLAST Chitta Baral. FASTA : Basic Steps Step 1: –Set a word size. (usually 6 for DNA and 2 for proteins) –Make a plot. –Find the long diagonals."

Similar presentations


Ads by Google