"Hardness" of (multi-) sequence alignment Align 2 sequences of length N allowing gaps. ACCAC-ACA ACCACACA ::x::x:x: :xxxxxx: AC-ACCATA, A-----CACCATA, etc. 2N gap positions, gap lengths of 0 to N each: A naïve algorithm might scale by O(N 2N ). For N= 3x10 9 this is rather large. Now, what about k>2 sequences? or rearrangements other than gaps?
What is dynamic programming? A dynamic programming algorithm solves every subsubproblems just once and then saves its answer in a table, avoiding the work of recomputing the answer every time the subsubproblem is encountered. -- Cormen et al. "Introduction to Algorithms", The MIT Press.
Pairwise sequence alignment by the dynamic programming algorithm. The algorithm involves finding the optimal path in the path matrix. (a), which is equivalent to searching the optimal solution in the search tree (b). (a) Path Matrix(b) Search Tree AIMS A M O S Alignment AIM-S A-MOS Pruning by an optimization function XX..............
Methods for computing the optimal score in the dynamic programming algorithm (a ) the gap penalty is a constant. (b) the gap penalty is a linear function of the gap length. (a) (b) D i, j-l d D i-1, j D i-1, j-1 D i-1, j D i, j-l d w s(i), t(j) D i,j D i, j (2) b w s(i), t(j) D i,j (1) D i,j (3) b
Concepts of global and local optimality in the pairwise sequence alignment. The distinction is made as to how the initial values are assigned to the path matrix. (a) Global vs. Global (b) Local vs. Global 0 0 0...... 0....0....0....0....0 X (c) Local vs. Local
The dynamic programming algorithm can be applied to limited areas, rather than to the entire matrix, after rapidly searching the diagonals that contain candidate markers. n 1 m m n +m -1 j 1 1 i l l
Time and Space Complexity of Computing Alignments
The order of computing matrix elements in the path matrix, which is suitable for (a) sequential processing and (b) parallel processing. (I, j -1) (i, j) (i +1, j-1) (i +1, j ) (i -1, j -1) (i -1, j ) (a) (i, j -2) (i, j -1) (i, j) (i+1, j -2) (i +1, j -1)(i -1, j -1) (i -1, j ) (b)
Time and Space Problems Comparing two one-megabase genomes. Space: –An entry: 4 bytes; –Table: 4 * 10^6 * 10^6 = 4 G bytes memory. Time: –1000 MHz CPU: 1M entries/second; –10^12 entries: 1M seconds = 10 days.