Presentation is loading. Please wait.

Presentation is loading. Please wait.

Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.

Similar presentations


Presentation on theme: "Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002."— Presentation transcript:

1 Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002

2 Developing Sequence Alignment Algorithms in C++2 Outline Hand out project Group assignments References for sequence alignment algorithms Board example of Needleman-Wunch Discussion of LCS Algorithm and how it can be extended for global alignment (Smith- Waterman) Extensions: local alignment and gap penalties

3 May 21, 2002 Developing Sequence Alignment Algorithms in C++3 Project Group Members Group 1: Bonnie, Eduardo, Sara Group 2: Thi, Edain Group 3: Michael, Hardik, Daisy Group 4: Dennis, Ivonne, Patrick Group 5: Chuck, Ronny

4 May 21, 2002 Developing Sequence Alignment Algorithms in C++4 Project References http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_alignme nts.html http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dyna mic.html http://www.sbc.su.se/~per/molbioinfo2001/dynprog/dyna mic.html Lectures: Database search (4/16) and Rationale for DB Searching (5/16) Computational Molecular Biology – An Algorithmic Approach, Pavel Pevzner Introduction to Computational Biology – Maps, sequences, and genomes, Michael Waterman Algorithms on Strings, Trees, and Sequences – Computer Science and Computational Biology, Dan Gusfield

5 May 21, 2002 Developing Sequence Alignment Algorithms in C++5 Classic Papers Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. (http://poweredge.stanford.edu/BioinformaticsArchive/Cla ssicArticlesArchive/needlemanandwunsch1970.pdf) Needleman, S.B. and Wunsch, C.D. A General Method Applicable to the Search for Similarities in Amino Acid Sequence of Two Proteins. J. Mol. Biol., 48, pp. 443-453, 1970. Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981.(http://poweredge.stanford.edu/BioinformaticsArchive/Clas sicArticlesArchive/smithandwaterman1981.pdf) Smith, T.F. and Waterman, M.S. Identification of Common Molecular Subsequences. J. Mol. Biol., 147, pp. 195-197, 1981. Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990. (http://poweredge.stanford.edu/BioinformaticsArchive/ClassicArt iclesArchive/smith1990.pdf) Smith, T.F. The History of the Genetic Sequence Databases. Genomics, 6, pp. 701-707, 1990.

6 May 21, 2002 Developing Sequence Alignment Algorithms in C++6 Longest Common Subsequence (LCS) Problem Can have insertion and deletions but no substitutions Ex: V: ATCTGAT W:TGCATA LCS:TCTA

7 May 21, 2002 Developing Sequence Alignment Algorithms in C++7 LCS Problem (cont.) Similarity score s i-1,j s i,j = max { s i,j-1 s i-1,j-1 + 1, if vi = wj

8 May 21, 2002 Developing Sequence Alignment Algorithms in C++8 Indels – insertions and deletions (e.g., gaps) alignment is V and W Alignment A is a 2xl matrix (l >= n,m) First row of A contains characters of V interspersed with l-n spaces Second row of A contains characters of W interspersed with l-m spaces Space in first row = insertion  (UP) Space in second row = deletion  (LEFT) Match (no mismatch in LCS) (DIAG)

9 May 21, 2002 Developing Sequence Alignment Algorithms in C++9 LCS(V,W) Algorithm for i = 1 to n si,0 = 0 for j = 1 to n s0,j = 0 for i = 1 to n for j = 1 to m if vi = wj si,j = si-1,j-1 + 1; bi,j = DIAG else if si-1,j >= si,j-1 si,j = si-1,j; bi,j = UP else si,j = si,j-1; bi,j = LEFT

10 May 21, 2002 Developing Sequence Alignment Algorithms in C++10 Print-LCS(b,V,i,j) if i = 0 or j = 0 return if bi,j = DIAG PRINT-LCS(b, V, i-1, j-1) print vi else if bi,j = UP PRINT-LCS(b, V, i-1, j) else PRINT-LCS(b, V, I, j-1)

11 May 21, 2002 Developing Sequence Alignment Algorithms in C++11 Extend LCS to Global Alignment si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = extend gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM Modify LCS and PRINT-LCS algorithms to support global alignment (On board discussion)

12 May 21, 2002 Developing Sequence Alignment Algorithms in C++12 Extend to Local Alignment 0(no negative scores) si-1,j +  (vi, -) si,j= max {si,j-1 +  (-, wj) si-1,j-1 +  (vi, wj)  (vi, -) =  (-, wj) = -  = extend gap penalty  (vi, wj) = score for match or mismatch – can be fixed, from PAM or BLOSUM

13 May 21, 2002 Developing Sequence Alignment Algorithms in C++13 Discussion on adding affine gap penalties Affine gap penalty Score for a gap of length x -(  +  x) Where  > 0 is the insert gap penalty  > 0 is the extend gap penalty On board example from http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html http://www.sbc.su.se/~arne/kurser/swell/pairwise_ali gnments.html

14 May 21, 2002 Developing Sequence Alignment Algorithms in C++14 Alignment with Gap Penalties Can apply to global or local (w/ zero) algorithms  si,j= max {  si-1,j -  si-1,j - (  +  )  si,j= max {  si1,j-1 -  si,j-1 - (  +  ) si-1,j-1 +  (vi, wj) si,j= max {  si,j  si,j

15 May 21, 2002 Developing Sequence Alignment Algorithms in C++15 Implementing Global Alignment Program in C++ Keeping it simple (e.g., without classes or structures) Score matrix Traceback matrix Simple algorithm: Read in two sequences Compute score and traceback matrices (modified LCS) Print alignment score = score[n][m] Print each aligned sequence (modified PRINT-LCS) using traceback For debugging – can also print the score and traceback matrices


Download ppt "Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002."

Similar presentations


Ads by Google