Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alignment II Dynamic Programming

Similar presentations


Presentation on theme: "Alignment II Dynamic Programming"— Presentation transcript:

1 Alignment II Dynamic Programming

2 Pair-wise sequence alignments
Idea: Display one sequence above another with spaces inserted in both to reveal similarity A: C A T - T C A - C | | | | | B: C - T C G C A G C

3 Two types of alignment S = CTGTCGCTGCACG T = TGCCGTG Global alignment
Local alignment CTGTCG-CTGCACG -TGC-CG-TG---- CTGTCGCTGCACG-- TGC-CGTG

4 Global alignment: Scoring
CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space penalty:  score(A) = w – x - y w = #matches x = #mismatches y = #spaces

5 Global alignment: Scoring
Reward for matches: 10 Mismatch penalty: 2 Space penalty: 5 C T G T C G – C T G C - T G C – C G – T G - Total = 11

6 Optimum Alignment The score of an alignment is a measure of its quality Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y

7 Alignment algorithms Global: Needleman-Wunsch Local: Smith-Waterman
NW and SW use dynamic programming Variations: Gap penalty functions Scoring matrices

8 Global Alignment: Algorithm

9 Theorem. C(i,j) satisfies the following relationships:
Initial conditions: Recurrence relation: For 1  i  n, 1  j  m:

10 Justification S1 S2 . . . Si-1 Si T1 T2 . . . Tj-1 Tj
C(i-1,j-1) + w(Si,Tj) C(i-1,j)  S1 S Si — T1 T Tj-1 Tj C(i,j-1) 

11 Example Case 1: Line up Si with Tj i - 1 i S: C A T T C A C
T: C - T T C A G j -1 j Case 2: Line up Si with space i - 1 i S: C A T T C A - C T: C - T T C A G - j Case 3: Line up Tj with space i S: C A T T C A C - T: C - T T C A - G j -1 j

12 Computation Procedure
C(i-1,j) C(i-1,j-1) C(i,j-1) C(i,j) C(n,m)

13 λ C T C G C A G C λ -5 -10 -15 -20 -25 -30 -35 10 5 C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A C +10 for match, -2 for mismatch, -5 for space

14 λ C T C G C A G C λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 * C A T T C A C Traceback can yield both optimum alignments

15 End-gap free alignment
Gaps at the start or end of alignment are not penalized Match: +2 Mismatch and space: -1 Best global Best end-gap free Score = 1 Score = 9

16 Motivation: Shotgun assembly
Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence. Problem: Use the overlapping sections to ”paste” the subsequences together. Overlapping pairs will have low global alignment score, but high end-space free score because of overlap.

17 Motivation: Shotgun assembly

18 Algorithm Same as global alignment, except:
Initialize with zeros (free gaps at start) Locate max in the last row/column (free gaps at end)

19 λ C T C G C A G C λ C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G +10 for match, -2 for mismatch, -5 for gap

20 Local Alignment: Motivation
Ignoring stretches of non-coding DNA: Non-coding regions are more likely to be subjected to mutations than coding regions. Local alignment between two sequences is likely to be between two exons. Locating protein domains: Proteins of different kind and of different species often exhibit local similarities Local similarities may indicate ”functional subunits”.

21 Local alignment: Example
S = g g t c t g a g T = a a a c g a Match: +2 Mismatch and space: -1 Best local alignment: g g t c t g a g a a a c – g a - Score = 5

22 Local Alignment: Algorithm
C [i, j] = Score of optimally aligning a suffix of s with a suffix of t. Initialize top row and leftmost column to zero.

23 λ C T C G C A G C λ 1 2 C A T T C A C +1 for a match, -1 for a mismatch, -5 for a space

24 Some Results Most pairwise sequence alignment problems can be solved in O(mn) time. Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88]. Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].

25 Reducing space requirements
O(mn) tables are often the limiting factor in computing large alignments There is a linear space technique that only doubles the time required [Hirschberg77]

26 λ C T C G C A G C λ C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G IDEA: We only need the previous row to calculate the next

27 Linear-space Alignments
mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn

28 Affine Gap Penalty Functions
Gap penalty = h + gk where k = length of gap h = gap opening penalty g = gap continuation penalty Can also be solved in O(nm) time using dynamic programming


Download ppt "Alignment II Dynamic Programming"

Similar presentations


Ads by Google