 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,

Presentation on theme: " If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,"— Presentation transcript:

 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Score(i, 0) = i * g Score(j, 0) = j * g  Identifying the actual alignment is done by tracing back the pointers starting at lower-right corner Global Alignment Summary

To compute GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize the cells of row 0 and column 0 only 2. for each column c, set cell(0, c) to i*gap 3. for each row r, set cell(r, 0) to i*gap 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score) Global Alignment Algorithm

 Align CACTAG and GATTACA using g = -2, s = -1, m = 2 Global Alignment Example -GATTACA -0-2-4-6-8-10-1214 C-2-3-5-7-9-8-10 A-4-31-3-5-7-6 C -50-2-4-3-5 T-8-7-3120-2-4 A-10-9-50420 G-12-8-7-3-2231

 Motivation CAGCACTTGGATTCTCGG(global alignment) CAGC––––G––T––––GG CAGCA-CTTGGATTCTCGG(semi-global alignment) –––CAGCGTGG––––––––  Second alignment may be preferable despite the lower score  Modify the algorithm so that terminal gaps are not penalized (i.e. gaps at both ends) Semi-Global Alignment

 Modify the algorithm so that terminal gaps are not penalized Semi-Global Alignment -GATTACA -0-2-4-6-8-10-1214 C-2-3-5-7-9-8-10 A-4-31-3-5-7-6 C -50-2-4-3-5 T-8-7-3120-2-4 A-10-9-50420 G-12-8-7-3-2231

 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column  Identifying actual alignment same as global alignment Semi-Global Alignment Summary

To compute SEMI-GLOBAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 (no gap pen.) 3. for each row r, set cell(r, 0) to 0 (no gap penalty) 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of option1, option2, option3 8. return the Matrix (or highest score) Semi-Global Alignment Algorithm

 Align GACTATGA and ATTA using g = -2, s = -1, m = 2 Semi-Global Alignment Example -GACTATGA - A T T A

 Goal is to find two substrings (common regions) from the two sequences that have the highest global alignment score AAAACCCCCGGGGTTA TTCCCGGGAACCAACC  Similar to previous two methods, but stops extending the current sub-alignment until its score becomes negative Local Alignment

 Modify the algorithm to identify high score common fragment Local Alignment -GATTACA -0-2-4-6-8-10-12-14 C-2-3-5-7-9-8-10 A-4-31-3-5-7-6 C -50-2-4-3-5 T-8-7-3120-2-4 A-10-9-50420 G-12-8-7-3-2231

 Align GACTATGA and ATTA using g = -2, s = -1, m = 2 Local Alignment -GATTACA - C A C T A G

 Align GACTATGA and ATTA using g = -2, s = -1, m = 2 Local Alignment -GATTACA -00000000 C0 A0 C0 T0 A0 G0

 Align GACTATGA and ATTA using g = -2, s = -1, m = 2 Local Alignment -GATTACA -00000000 C00000020 A00200204 C00010042 T00023123 A00201534 G02010342

T C C C C T G G A A C C A A C C ------------------------------------------------- |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| A|0 0 0 0 0 0 0 0 0 2 2 0 0 2 2 0 0| A|0 0 0 0 0 0 0 0 0 2 4 2 0 2 4 2 0| A|0 0 0 0 0 0 0 0 0 2 4 3 1 2 4 3 1| A|0 0 0 0 0 0 0 0 0 2 4 3 2 3 4 3 2| C|0 0 2 2 2 2 0 0 0 0 2 6 5 3 2 6 5| C|0 0 2 4 4 4 2 0 0 0 0 4 8 6 4 4 8| C|0 0 2 4 6 6 4 2 0 0 0 2 6 7 5 6 6| C|0 0 2 4 6 8 6 4 2 0 0 2 4 5 6 7 8| C|0 0 2 4 6 8 7 5 3 1 0 2 4 3 4 8 9| G|0 0 0 2 4 6 7 9 7 5 3 1 2 3 2 6 7| G|0 0 0 0 2 4 5 9 11 9 7 5 3 1 2 4 5| G|0 0 0 0 0 2 3 7 11 10 8 6 4 2 0 2 3| G|0 0 0 0 0 0 1 5 9 10 9 7 5 3 1 0 1| T|0 2 0 0 0 0 2 3 7 8 9 8 6 4 2 0 0| T|0 2 1 0 0 0 2 1 5 6 7 8 7 5 3 1 0| A|0 0 1 0 0 0 0 1 3 7 8 6 7 9 7 5 3| Local Alignment Example

 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] 0 Score(i, 0) = 0 Score(j, 0) = 0 Gap cost g is set to 0 for last row and last column  Recovering Alignment: Find the entry with highest value anywhere in the matrix and use that as the starting point for tracing back until a 0 is found Local Alignment Summary

To compute LOCAL ALIGNMENT given two sequences: 1. create a matrix with rows, cols equal to the lengths of the two sequences, respectively # initialize to 0 the cells of row 0 and column 0 2. for each column c, set cell(0, c) to 0 3. for each row r, set cell(r, 0) to 0 4. for each row in the matrix starting at 1: 5. for each col in the matrix starting at 1: 6. calculate option1, option2, option3 using gap penalty of 0 for last row and for last columns 7. set the current cell to the largest value of 0, option1, option2, option3 8. return the Matrix (or highest score) Local Alignment Algorithm

global alignment Needleman SB, Wunsch CD. (1970). "A general method applicable to the search for similarities in the amino acid sequence of two proteins". J Mol Biol 48 (3): 443-53.A general method applicable to the search for similarities in the amino acid sequence of two proteins semiglobal alignment local alignment Smith TF, Waterman MS (1981). "Identification of Common Molecular Subsequences". J Mol Biol 147: 195–197Identification of Common Molecular Subsequences Images from from UMN CS5481

 So far used uniform gap penalty, i.e. k gaps = k*g penalty  Another possibility is to use two types of gap penalty  gap opening penalty (go) – for starting a gapped region  gap extension penalty (ge) – for continuing a gap region  typically gap opening penalty set higher (biased against gaps) and gap extension penalty is lower (once gap region started, ok to extend)  Gap penalty G for k gaps now becomes G(k) = go + (k-1)*ge (also called affine gap penalty) Gap Penalty Revisited

 Modify the algorithm to support gap open/extension penalty Affine Gap Penalty -GATTACA - C A C T A G

 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-k, j) + G(k)1 ≤ k ≤ i Score(i, j-k) + G(k)1 ≤ k ≤ i Score(i, j) = max Score(i-1, j-1) + mif A[i] == B[j] Score(i-1, j-1) + sif A[i] <> B[j] Score(i, 0) = G(i) Score(j, 0) = G(j)  Horizontally and Vertically now need to try all cells for possible source of gap opening Global Alignment, Affine Gap

Download ppt " If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,"

Similar presentations