Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment II CIS 667 Spring 2004. Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.

Similar presentations


Presentation on theme: "Sequence Alignment II CIS 667 Spring 2004. Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an."— Presentation transcript:

1 Sequence Alignment II CIS 667 Spring 2004

2 Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an alignment that gives that similarity?  We will use the (already computed) array from the previous algorithm  Start at entry (m, n) and repeat the choices made to get the similarity score  Note that sometimes we had more than one choice giving the same optimal score

3 Optimal Alignments Each choice gives one column of the alignment If we have two or three choices, we systematically choose one of them We will use a recursive algorithm The algorithm will produce two arrays - align-s and align-t  The elements of these arrays are either spaces or symbols from the sequences

4 Algorithm Align input: indices i, j, array a given by algorithm Similarity output: alignment in align-s, align-t, and length in len if i = 0 and j = 0 then len  0 else if i > 0 and a[i, j] = a[i - 1, j] + g then Align(i - 1, j, len) len  len + 1 align-s[len]  s[i] align-t[len]  - else if i>0 and j>0 and a[i,j] = a[i-1,j-1] + p(i,j) then Align(i - 1, j - 1, len) len  len + 1 align-s[len]  s[i] align-t[len]  t[j] else // j > 0 and a[i, j] = a[i, j - 1] + g Align(i, j - 1, len) len  len + 1 align-s[len]  - align-t[len]  t[j]

5 Algorithm Complexity First algorithm has four loops  O(m), O(n), O(mn)  So complexity is: O(m) + O(n) + O(mn) = O(mn) = O(n 2 ) Second algorithm is  O(len) = O(m + n)

6 Local Comparison A local alignment between s and t is an alignment between a substring of s and a substring of t We want to find the highest scoring local alignment between two sequences Modify the original algorithm so that each entry (i, j) of the matrix will hold the highest score of an alignment between a suffix of s[1..i] and a suffix of t[1..j]

7 Local Comparison First row and column initialized to 0 We now fill in the other elements of a as before, choosing the maximum of, now, 4 values  We have the previous three choices, plus a fourth choice - 0  We always have the choice zero, by aligning the two empty suffixes  Find the alignment same way as before, but stop if we reach an entry with value zero  Start search at the largest value in the array

8 Local Alignment with match: +1, mismatch -1, gap 0 AACCTATAGCT 000000000000 G000000000100 C000110000021 G000000000101 A011000101000 T000001021001 A011000203210 T000001132212 A011000224321

9 Semiglobal Comparisons The basic algorithm compares two sequences in their entirety  Gap penalty assessed whether in middle or at end of one or more sequences  Not always desirable  Suppose we want to search for the short sequence ACGT within the longer sequence AAACACGTGTCC AAACACGTGTCC ----ACGT----

10 Semiglobal Comparisons We don’t want to penalize the gaps at the end as we do those in middle since they don’t have biological significance  Usually result from incomplete data acquisition  This approach is known as semiglobal alignment  We can modify the basic algorithm for this type of alignment

11 Semiglobal Comparisons Suppose we don’t want to charge for spaces after the last character of s  Consider an optimal alignment  Spaces after the end of s are matched with a suffix of t  Removing final part of alignment, we have an alignment between s and a prefix of t  So find optimal alignment between s and a prefix of t - but these are already computed in last row of a! So take max value from last row of a

12 Semiglobal Comparisons Suppose we don’t want to charge for spaces after the last character of t  Consider an optimal alignment  Spaces after the end of t are matched with a suffix of s  Removing final part of alignment, we have an alignment between t and a prefix of s  So find optimal alignment between t and a prefix of s - but these are already computed in last column of a! So take max value from last column of a

13 Semiglobal Comparisons What about spaces at the beginning of s and t?  These are represented by the values in the first row and column of a  So, if we don’t want to charge for them, just initialize this row and column to be all 0  So the changes to the basic algorithm are:  Initialize row 1, column 1 to zero  Look for maximum in last row or column


Download ppt "Sequence Alignment II CIS 667 Spring 2004. Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an."

Similar presentations


Ads by Google