Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment Tutorial #2

Similar presentations


Presentation on theme: "Sequence Alignment Tutorial #2"— Presentation transcript:

1 Sequence Alignment Tutorial #2
. © Ydo Wexler & Dan Geiger

2 Sequence Comparison Much of bioinformatics involves sequences
DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20

3 Global Alignment Input: two sequences over the same alphabet
Output: an alignment of the two sequences Example: GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A

4 Global Alignment -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
Example (cont): -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: Perfect matches Mismatches Insertions & deletions (indel) Best biological explanaiton Biological data Hypotheses space Symmetric view of evolution

5 Global Alignment scoring scheme
Score each position independently: Match: +1 Mismatch: -1 Indel: -2 Score of an alignment is sum of position scores Example: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Score: (+1x13) + (-1x2) + (-2x4) = 3 ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score: (+1x5) + (-1x6) + (-2x11) = -23

6 Sequence Alignment Variants
Two basic variants of sequence alignment: Global alignment (Needelman-Wunsch) Local alignment (Smith-Waterman) Today we’ll see : Overlap alignment Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture

7 Overlap Alignment Consider the following problem:
Find the most significant overlap between two sequences S,T ? Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.

8 Overlap Alignment Formally:
given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same as Global alignment except we don’t not penalise overhanging ends.

9 Overlap Alignment Initialization: V[i,0]=0 , V[0,j]=0
Recurrence: as in global alignment Score: maximum value at the bottom line and rightmost line

10 Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5

11 Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5

12 Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme: Match: +4 Mismatch: -1 Indel: -5

13 Overlap Alignment (Example)
The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE- Scoring scheme : Match: +4 Mismatch: -1 Indel:

14 Affine gap scores Observation: Insertions and deletions often occur in blocks longer than a single nucleotide. Consequence: Current scoring scheme gives a constant penalty per gap unit. This does not score well the above phenomenon. Question: How do we modify the scheme to incorporate this?

15 Alignment with affine gap scores
Penalty score for a gap of length g : d - penalty for introduction of a gap e - penalty for elongating the gap by one unit. Typically d > e Problem: When aligning S[i] to a gap we do not know whether to penalize by d or e. Solution: we compute 3 matrices simultaneously M(i,j) - the score obtained by aligning S[i] to T[j] IS(i,j) - the score obtained by aligning S[i] to a gap IT(i,j) - the score obtained by aligning T[j] to a gap

16 Affine gap scores Initialization: depending on the problem (global, local,…) Recurrence: uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’) M(i-1,j-1) M(i-1,j) IS(i-1,j-1) IS(i-1,j) IT(i-1,j-1) IT(i-1,j) M(i,j-1) IS(i,j-1) IT(i,j-1) We assume that a deletion will not be followed directly by an insertion. This can be obtained by using

17 Why are two matrices enough?
Affine gap scores Simplification: Why are two matrices enough?


Download ppt "Sequence Alignment Tutorial #2"

Similar presentations


Ads by Google