2 Multiple Alignment versus Pairwise Alignment Up until now we have only tried to align two sequences.What about more than two? And what for?A faint similarity between two sequences becomes significant if present in manyMultiple alignments can reveal subtle similarities that pairwise alignments do not reveal
3 Generalizing the Notion of Pairwise Alignment Alignment of 2 sequences is represented as a2-row matrixIn a similar way, we represent alignment of 3 sequences as a 3-row matrixA T - G C G -A _ C G T - AA T C A C - AScore: more conserved columns, better alignment
4 Aligning Three Sequences sourceSame strategy as aligning two sequencesUse a 3-D “Manhattan Cube”, with each axis representing a sequence to alignFor global alignments, go from source to sinksink
5 2-D cell versus 2-D Alignment Cell In 2-D, 3 edges in each unit squareIn 3-D, 7 edges in each unit cube
6 Architecture of 3-D Alignment Cell (i-1,j,k-1)(i-1,j-1,k-1)(i-1,j-1,k)(i-1,j,k)(i,j,k-1)(i,j-1,k-1)(i,j,k)(i,j-1,k)
7 Multiple Alignment: Dynamic Programming cube diagonal: no indelssi,j,k = max(x, y, z) is an entry in the 3-D scoring matrixsi-1,j-1,k-1 + (vi, wj, uk)si-1,j-1,k + (vi, wj, _ )si-1,j,k (vi, _, uk)si,j-1,k (_, wj, uk)si-1,j,k + (vi, _ , _)si,j-1,k + (_, wj, _)si,j,k (_, _, uk)face diagonal: one indeledge diagonal: two indels
8 Multiple Alignment: Running Time For 3 sequences of length n, the run time is 7n3; O(n3)For k sequences, build a k-dimensional Manhattan, with run time (2k-1)(nk); O(2knk)Conclusion: dynamic programming approach for alignment between two sequences is easily extended to k sequences but it is impractical due to exponential running time
9 Practically speaking Multiple alignment is a hard problem Yet, it is of extreme practical importanceComparing several species is a very common taskDoing this for the entire genome is an increasingly common demand!Several heuristic-based algorithms have been developed that employ greedy, divide-and-conquer, dynamic programming approaches in various combinationsThe algorithms we will see today are actually used by current multiple aligners
11 Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments:x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAGy: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAGcan we construct a multiple alignment that inducesthem?
12 Reverse Problem: Constructing Multiple Alignment from Pairwise Alignments Given 3 arbitrary pairwise alignments:x: ACGCTGG-C; x: AC-GCTGG-C; y: AC-GC-GAGy: ACGC--GAC; z: GCCGCA-GAG; z: GCCGCAGAGcan we construct a multiple alignment that inducesthem?NOT ALWAYSPairwise alignments may be inconsistent
13 Multiple Alignment: Greedy Approach Choose most similar pair of strings and combine into a profile , thereby reducing alignment of k sequences to an alignment of of k-1 sequences/profiles. RepeatThis is a heuristic greedy methodu1= ACg/tTACg/tTACg/cT…u2 = TTAATTAATTAA……uk = CCGGCCGGCCGG…u1= ACGTACGTACGT…u2 = TTAATTAATTAA…u3 = ACTACTACTACT……uk = CCGGCCGGCCGGk-1k
14 Progressive Alignment Progressive alignment is a variation of greedy algorithm with a somewhat more intelligent strategy for choosing the order of alignments.
15 Clustal Popular multiple alignment tool today Uses “progressive alignment”Three-step process1.) Construct pairwise alignments2.) Build Guide Tree3.) Progressive Alignment guided by the tree
16 Step 1: Pairwise Alignment Aligns each sequence again each other giving a similarity matrixSimilarity = exact matches / sequence length (percent identity)v1 v2 v3 v4v1 -vvv
17 Step 2: Guide Tree Create Guide Tree using the similarity matrix ClustalW uses the “neighbor-joining method”Guide tree roughly reflects evolutionary relations
19 Step 3: Progressive Alignment Start by aligning the two most similar sequencesFollowing the guide tree, add in the next sequences, aligning to the existing alignmentAn alignment is stored as a “consensus sequence”, to be aligned with other sequences or alignments laterConsensus sequence: Residue a if 75% of aligned sequences have an a at that position.Otherwise “X”.
20 Evaluation Evaluating alignment programs is very difficult What is a benchmark here ?We haven’t witnessed the process of evolution, so we cannot say for certain what the true alignment of “extant” sequences should beOne approach: “simulate” evolution
21 Simulating evolutionGenerate a random sequence and introduce realistic evolutionary changes to it, along branches of an assumed phylogenySubstitutions, insertions, deletions, insertion & deletion rates, duplications, introduction of repeat elements, etc.
22 Evaluating alignmentOnce simulation done, take all the sequences at the leaf nodes of the phylogeny (started with root)Align these sequences using softwareCompare computed alignment and known (“true”) alignmentsensitivity and specificity