Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.

Similar presentations


Presentation on theme: "Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003."— Presentation transcript:

1 Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003

2 Developing Pairwise Sequence Alignment Algorithms2 Outline Group assignments for project Overview of global and local alignment References for sequence alignment algorithms Discussion of Needleman-Wunsch iterative approach to global alignment Discussion of Smith-Waterman recursive approach to local alignment Discussion Discussion of LCS Algorithm and how it can be extended for Global alignment (Needleman-Wunsch) Local alignment (Smith-Waterman) Affine gap penalties

3 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms3 Project Group Members Group 1: Ahmed and Jake Group 2: Ram and Ting Group 3: Andy and Margarita Group 4: Ali and Enrique

4 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms4 Overview of Pairwise Sequence Alignment Dynamic Programming Applied to optimization problems Useful when Problem can be recursively divided into sub-problems Sub-problems are not independent Needleman-Wunsch is a global alignment technique that uses an iterative algorithm and no gap penalty (could extend to fixed gap penalty). Smith-Waterman is a local alignment technique that uses a recursive algorithm and can use alternative gap penalties (such as affine). Smith-Waterman’s algorithm is an extension of Longest Common Substring (LCS) problem and can be generalized to solve both local and global alignment. Note: Needleman-Wunsch is usually used to refer to global alignment regardless of the algorithm used.

5 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms5 Project References http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html Lecture: Database search (4/15) Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield

6 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms6 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://poweredge.stanford.edu/BioinformaticsArchive/Cla ssicArticlesArchive/needlemanandwunsch1970.pdf) Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/Clas sicArticlesArchive/smithandwaterman1981.pdf) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981. Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArt iclesArchive/smith1990.pdf) Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990.

7 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms7 Needleman-Wunsch (1 of 3) Match = 1 Mismatch = 0 Gap = 0

8 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms8 Needleman-Wunsch (2 of 3)

9 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms9 Needleman-Wunsch (3 of 3) From page 446: It is apparent that the above array operation can begin at any of a number of points along the borders of the array, which is equivalent to a comparison of N-terminal residues or C-terminal residues only. As long as the appropriate rules for pathways are followed, the maximum match will be the same. The cells of the array which contributed to the maximum match, may be determined by recording the origin of the number that was added to each cell when the array was operated upon.

10 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms10 Smith-Waterman (1 of 3) Algorithm The two molecular sequences will be A=a 1 a 2... a n, and B=b 1 b 2... b m. A similarity s(a,b) is given between sequence elements a and b. Deletions of length k are given weight W k. To find pairs of segments with high degrees of similarity. we set up a matrix H. First set H k0 = H ol = 0 for 0 <= k <= n and 0 <= l <= m. Preliminary values of H have the interpretation that H i j is the maximum similarity of two segments ending in a i and b j. respectively. These values are obtained from the relationship H ij =max{H i-1,j-1 + s(a i,b j ), max {H i-k,j – W k }, max{H i,j-l - W l }, 0} ( 1 ) k >= 1 l >= 1 1 <= i <= n and 1 <= j <= m.

11 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms11 Smith-Waterman (2 of 3) The formula for H ij follows by considering the possibilities for ending the segments at any a i and b j. (1)If a i and b j are associated, the similarity is H i-l,j-l + s(a i,b j ). (2) If a i is at the end of a deletion of length k, the similarity is H i – k, j - W k. (3) If b j is at the end of a deletion of length 1, the similarity is H i,j-l - W l. (typo in paper) (4) Finally, a zero is included to prevent calculated negative similarity, indicating no similarity up to a i and b j.

12 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms12 Smith-Waterman (3 of 3) The pair of segments with maximum similarity is found by first locating the maximum element of H. The other matrix elements leading to this maximum value are than sequentially determined with a traceback procedure ending with an element of H equal to zero. This procedure identifies the segments as well as produces the corresponding alignment. The pair of segments with the next best similarity is found by applying the traceback procedure to the second largest element of H not associated with the first traceback.

13 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms13 Longest Common Subsequence (LCS) Problem Reference: Pevzner Can have insertion and deletions but no substitutions (no mismatches) Ex: V: ATCTGAT W:TGCATA LCS:TCTA

14 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms14 LCS Problem (cont.) Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj On board example: Pevzner Fig 6.1

15 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms15 Indels – insertions and deletions (e.g., gaps) alignment of V and W V = columns of similarity matrix (horizontal) W = rows of similarity matrix (vertical) Space (gap) in V  (UP) insertion Space (gap) in W  (LEFT) deletion Match (no mismatch in LCS) (DIAG)

16 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms16 LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT

17 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms17 Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1)

18 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms18 Extend LCS to Global Alignment si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = extend gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion)

19 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms19 Extend to Local Alignment 0(no negative scores) si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = extend gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

20 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms20 Discussion on adding affine gap penalties Affine gap penalty Score for a gap of length x -(  +  x) Where  > 0 is the insert gap penalty  > 0 is the extend gap penalty On board example from http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html

21 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms21 Alignment with Gap Penalties Can apply to global or local (w/ zero) algorithms  si,j= max {  si-1,j -  si-1,j - (  +  )  si,j= max {  si1,j-1 -  si,j-1 - (  +  ) si-1,j-1 +  (vi, wj) si,j= max {  si,j  si,j Note: keeping with traversal order in Figure 6.1,  is replaced by , and  is replaced by 

22 May 20, 2003 Developing Pairwise Sequence Alignment Algorithms22 Implementing Global Alignment Program in C/C++ Keeping it simple (e.g., without classes or structures) Score matrix Traceback matrix Simple algorithm: Read in two sequences Compute score and traceback matrices (modified LCS) Print alignment score = score[n][m] Print each aligned sequence (modified PRINT-LCS) using traceback For debugging – can also print the score and traceback matrices


Download ppt "Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003."

Similar presentations


Ads by Google