Download presentation
Presentation is loading. Please wait.
Published byFlorence Mills Modified over 9 years ago
1
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU
2
Why learn about alignment algorithms? All publicly available alignment programs do the same thing –Sequence alignment using amino acids substitution matrices and affine gap penalties This is fast but not optimal –Protein alignment is done much more accurate using sequence profiles and position specific gap penalties (price for gaps depends on the structure) Must implement your own alignment algorithm to do this
3
Outline What you have been told is not entirely true :-) –Alignment algorithms are more complex The true sequence alignment algorithm story –The slow algorithm (O3) –The fast algorithm (O2)
4
Sequence alignment The old Story
5
Pairwise alignment: the solution ” Dynamic programming ” (the Needleman-Wunsch algorithm) Match score = 1 Mismatch score -1 Gap penalty = -2
6
Alignment depicted as path in matrix T C G C A T C A T C G C A T C A TCGCA TC-CA TCGCA T-CCA Score=2 Score=0 Match score = 1 Mismatch score = -1 Gap penalty = -2
7
Alignment depicted as path in matrix T C G C A T C A x Meaning of point in matrix: all residues up to this point have been aligned (but there are many different possible paths). Position labeled “x”: TC aligned with TC --TC-TCTC TC--T-CTC
8
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities.
9
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. score(x,y) = max score(x,y-1) - gap-penalty
10
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. score(x,y) = max score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y)
11
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. score(x,y) = max score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y) score(x-1,y) - gap-penalty
12
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities. Each new score is found by choosing the maximum of three possibilities. For each square in matrix: keep track of where best score came from. Fill in scores one row at a time, starting in upper left corner of matrix, ending in lower right corner. score(x,y) = max score(x,y-1) - gap-penalty score(x-1,y-1) + substitution-score(x,y) score(x-1,y) - gap-penalty
13
Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2
14
Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2
15
Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2
16
Dynamic programming: example A C G T A 1 -1 -1 -1 C -1 1 -1 -1 G -1 -1 1 -1 T -1 -1 -1 1 Gaps: -2
17
Dynamic programming: example T C G C A : : T C - C A 1+1-2+1+1 = 2
18
A now the truth
19
Dynamic programming: computation of scores T C G C A T C A x Any given point in matrix can only be reached from three possible positions (you cannot “align backwards”). => Best scoring alignment ending in any given point in the matrix can be found by choosing the highest scoring of the three possibilities.
20
Dynamic programming: example What about j-2 and a gap extension?
21
And now the true algorithm V LI L P V L L P n m Start from here
22
How does it work (score matrix d)? 5 V L I L P V L L P 141-3 1525-4 1525 -3-4-3-410 Blosum 50 matrix d matrix
23
How does it work? (The slow way O3) V LI L P V L L P 93 17104 155 4510 0 0 0 0 000000 Start from here! Gap-open W 1 = -5 Gap-extension U = -1 d matrix D matrix
24
How does it work? (The slow way O3) V LI L P V L L P 17104 155 4510 0 0 0 0 000000 Check all positions in (green) row and column to check score for gap extension. CPU intensive (O3) Gap-open W 1 = -5 Gap-extension U = -1 d matrix D matrix
25
And now you!
26
How does it work? Fill out the D matrix 20 V LI L P V L L P 143 1117104 9 155 2 4510 0 0 0 0 000000 Check all positions in (green) row and column to check score for gap extension. CPU intensive (O3) Gap-open W 1 = -5 Gap-extension U = -1 d matrixD matrix
27
How does it work (The slow way O3) 20 VLI L P V L L P 181493 111517104 89 155 2 34510 0 0 0 0 000000 Check all positions in (green) row and column to check score for gap extension. CPU intensive (O3) Gap-open W 1 = -5 Gap-extension U = -1 d matrix D matrix
28
And now the fast algorithm (O2) Database (m) Query (n) Open a gapExtending a gap P Q Affine gap penalties
29
And now the fast algorithm (O2) Database (m) Query (n) Open a gapExtending a gap P Q
30
And now the true algorithm (cont.) Database (m) Query (n) P Q
31
How does it work (D,Q, and P-matrices) D P m n V LI L P V L L P 5 0 0 0 0 000000 V LI L P V L L P 5 234510 0 0 0 0 000000 W 1 = -5 U = -1
32
How does it work (D,Q,P, and E-matrices) D P m n V LI L P V L L P 0 5 0 0 0 0 000000 V LI L P V L L P 5 234510 0 0 0 0 000000 W 1 = -5 U = -1
33
How does it work (D,Q, and P-matrices) D Q m n V L I L P V L L P 5 0 0 000000 V LI L P V L L P 5 234510 0 0 0 0 000000 W 1 = -5 U = -1
34
How does it work (D,Q,P, and E-matrices) D Q m n V L I L P V L L P 05 0 0 000000 V LI L P V L L P 5 234510 0 0 0 0 000000 W 1 = -5 U = -1
35
How does it work (D, Q, and P-matrices) D Q m n V L I L P V L L P 05 0 0 000000 V LI L P V L L P 5 234510 0 0 0 0 000000 P V LI L P V L L P 0 5 0 0 0 0 000000
36
How does it work (D, Q, and P-matrices) D Q m n V L I L P V L L P 05 0 0 000000 V LI L P V L L P 155 234510 0 0 0 0 000000 P V LI L P V L L P 0 5 0 0 0 0 000000
37
How does it work. Eij-matrix. (Keeping track on the path) eij = 1 match eij = 2 gap-opening database (move vertical) eij = 3 gap-extension database (move vertical) eij = 4 gap-opening query (move horizontal) eij = 5 gap-extension query (move horizontal) V LI L P V L L P 2 41 0 0 0 0 000000
38
The fast algorithm (cont.) Database (m) Query (n) P Q eij = 1 match eij = 2 gap-opening database eij = 3 gap-extension database eij = 4 gap-opening query eij = 5 gap-extension query E14523E14523
39
How does it work (D,Q,P, and E-matrices) D Q m n V L I L P V L L P 05 0 0 000000 V LI L P V L L P 155 234510 0 0 0 0 000000 P V LI L P V L L P 0 5 0 0 0 0 000000 E V LI L P V L L P 12 41 0 0 0 0 000000
40
How does it work (D,Q,P, and E-matrices) D Q P m n E 6 V L I L P V L L P 101293 345104 -2 05 0 0 0 0 000000 1 V LI L P V L L P 1133 51113 51412 55541 0 0 0 0 000000 13 V LI L P V L L P 94-2 11125 89100 2345 0 0 0 0 000000 20 V LI L P V L L P 181493 111517104 89 155 234510 0 0 0 0 000000
41
And the alignment D m n 20 V LI L P V L L P 181493 111517104 89 155 234510 0 0 0 0 000000 E 1 V LI L P V L L P 1133 51113 51412 55541 0 0 0 0 000000 VLILP VL-LP
42
And the alignment. Gap extensions D-matrix m n 19 V LI P L V L L P 131793 91412104 789155 123510 0 0 0 0 000000 E 1 V LI L P V L L P 1133 11113 51512 55541 0 0 0 0 000000 A 13 10 4 0 A 1 1 4 5 0 E-matrix
43
And the alignment D m n 19 V LI P L V L L P 131793 91412104 789155 123510 0 0 0 0 000000 E 1 V LI L P V L L P 1133 11113 51512 55541 0 0 0 0 000000 A 13 10 4 0 A 1 1 4 5 0 15 -5 -1 = 9?
44
And the alignment D m n 19 V LI P L V L L P 131793 91412104 789155 123510 0 0 0 0 000000 E 1 V LI L P V L L P 1133 11113 51512 55541 0 0 0 0 000000 A 13 10 4 0 A 1 1 4 5 0 5 -5 -1 = 9?
45
And the alignment D m n 19 V LI P L V L L P 131793 91412104 789155 123510 0 0 0 0 000000 E 1 V LI L P V L L P 1133 11113 51512 55541 0 0 0 0 000000 A 13 10 4 0 A 1 1 4 5 0 VLIALP VL--LP 15 -5 -1 = 9? *** 5 -5 -1 = 9?
46
And now you!
47
Summary Alignment is more complicated than what you have been told. Simple algorithmic tricks allow for alignment in O2 time More heuristics to improve speed –Limit gap length –Look for high scoring regions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.