Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Sequence Alignment Dynamic Programming. Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS.

Similar presentations


Presentation on theme: "Multiple Sequence Alignment Dynamic Programming. Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS."— Presentation transcript:

1 Multiple Sequence Alignment Dynamic Programming

2 Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS  YAMYWVRQAPG LSLTCTVSGTSFDD  YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG  ATLVCLISDFYPGA  VTVAWKADS  AALGCLVKDYFPEP  VTVSWNSG-  VSLTCLVKGFYPSD  IAVEWESNG-  Goal: Bring the greatest number of similar characters into the same column of the alignment Similar to alignment of two sequences.

3 CLUSTALW MSA MSA of four oxidoreductase NAD binding domain protein sequences. Red: AVFPMILW. Blue: DE. Magenta: RHK. Green: STYHCNGQ. Grey: all others. Residue ranges are shown after sequence names. Chenna et al. Nucleic Acids Research, 2003, Vol. 31, No. 13 3497-3500

4 Multiple Sequence Alignment: Motivation Correspondence. Find out which parts “do the same thing” –Similar genes are conserved across widely divergent species, often performing similar functions Structure prediction –Use knowledge of structure of one or more members of a protein MSA to predict structure of other members –Structure is more conserved than sequence Create “profiles” for protein families –Allow us to search for other members of the family Genome assembly: Automated reconstruction of “contig” maps of genomic fragments such as ESTs MSA is the starting point for phylogenetic analysis

5 Multiple Sequence Alignment: Approaches Optimal Global Alignments -Dynamic programming –Generalization of Needleman-Wunsch –Find alignment that maximizes a score function –Computationally expensive: Time grows as product of sequence lengths Global Progressive Alignments - Match closely- related sequences first using a guide tree Global Iterative Alignments - Multiple re-building attempts to find best alignment Local alignments –Profiles, Blocks, Patterns

6 Scoring a multiple alignment Sum of pairsStarTree A A C CA A A A A A A CC CC

7 Sum of Pairs AAA AAC ACC A A A AA 10α A A A CA + (6α - 4β) A A C CA + (4α - 6β) = 20α - 10β

8 Sum-of-Pairs Scoring Function Score of multiple alignment = ∑ i <j score(S i,S j ) where score(S i,S j ) = score of induced pairwise alignment

9 Induced Pairwise Alignment S 1 S - T I S C T G - S - N I S 2 L - T I – C N G S S - N I S 3 L R T I S C S G F S Q N I Induced pairwise alignment of S 1, S 2 : S 1 S T I S C T G - S N I S 2 L T I – C N G S S N I

10 MSA: Dynamic Programming The two-sequence alignment algorithm can be generalized to any number of sequences. E.g., for three sequences X, Y, W define C[i,j,k] = score of optimum alignment among X[1..i], Y[1..j], W[1..k] As for two sequences, divide possible alignments into different classes, depending on how they end. –Use to devise recurrence relations for C[i,j,k] –C[i,j,k] is the maximum out of all possibilities

11 XiYjWkXiYjWk MSA: 7 ways alignment can end for 3 sequences X 1... X i-1 X i Y 1... Y j-1 Y j W 1... W k-1 W k -YjWk-YjWk Xi-WkXi-Wk XiYj-XiYj- --Wk--Wk -Yj--Yj- Xi--Xi--

12 Dynamic programming for three sequences VSN — S S — NA— AS——— VSNS S N A A S Start Each alignment is a path through the dynamic programming matrix

13 For 3 seqs. of length n, time is proportional to n 3 Dynamic Programming for Three Sequences C[i,j,k] C[i-1,j-1,k-1] There are 7 ways to get to C[i,j,k] C[i-1,j,k-1] Enumerate all possibilities and choose the best one

14 Dynamic Programming MSA: General Case For k sequences of length n, dynamic programming algorithm does (2 k -1) n k operations –Example: 6 sequences of length 100 require 6.4X10 13 calculations Space for table is n k Implementations (e.g., WashU MSA 2.1) use tricks and only search subset of dynamic programming table –Even this is expensive. E.g., Baylor CM Search launcher limits MSA to 8 sequences of 800 characters and 10 minutes processing time

15 Problems with SP scoring Pair-wise comparisons can over-score evolutionarily distant pairs. Reason: For 3 or more sequences, SP scoring does not correspond to any evolutionary tree But not:

16 Overcoming problems with SP scoring Use weights to incorporate evolution in sum of pairs scoring: –Some pair-wise alignments are more important than others E.g., more important to have a good alignment between mouse and human sequences than mouse and bird –Assign different weights to different pair-wise alignments. Weight decreases with evolutionary distance. Use star tree approach –one sequence is assigned as the ancestor and all others are contrasted it.

17 Star Alignments Construct multiple alignments using pair-wise alignment relative to a fixed sequence Out of a set S = {S 1, S 2,..., S r } of sequences, pick sequence S c that maximizes star_score(c) = ∑ {sim(S c, S i ) : 1 ≤ i ≤ r, i ≠ c} where sim(S i, S j ) is the optimal score of a pair-wise alignment between S i and S j

18 Algorithm 1.Compute sim(S i, S j ) for every pair (i,j) 2.Compute star_score(i) for every i 3.Choose the index c that minimizes star_score(c) and make it the center of the star 4.Produce a multiple alignment M such that, for every i, the induced pairwise alignment of S c and S i is the same as the optimum alignment of S c and S i.

19 Step 4: Detail S c AA--CCTT S 1 AATGCC-- S c A-ACC-TT S 2 AGACCGT- S c A-A--CC-TT S 1 A-ATGCC--- S 2 AGA--CCGT-


Download ppt "Multiple Sequence Alignment Dynamic Programming. Multiple Sequence Alignment VTISCTGSSSNIGAG  NHVKWYQQLPG VTISCTGTSSNIGS  ITVNWYQQLPG LRLSCSSSGFIFSS."

Similar presentations


Ads by Google