Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment Kun-Mao Chao (趙坤茂)

Similar presentations


Presentation on theme: "Sequence Alignment Kun-Mao Chao (趙坤茂)"— Presentation transcript:

1 Sequence Alignment Kun-Mao Chao (趙坤茂)
Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:

2 GenBank 200.0

3 GenBank 215.0

4 GenBank 220.0

5 orz’s sequence evolution
orz (kid) OTZ (adult) Orz (big head) Crz (motorcycle driver) on_ (soldier) or2 (bottom up) oΩ (back high) STO (the other way around) Oroz (me) the origin? their evolutionary relationships? their putative functional relationships?

6 What? The truth is more important than the facts. THETR UTHIS MOREI

7 Dot Matrix

8

9 Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: C---TTAACT CGGATCA--T Sequence A Sequence B

10 Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: Mismatch Match C---TTAACT CGGATCA--T Deletion gap Insertion gap

11 Alignment Graph C---TTAACT CGGATCA--T Sequence A: CTTAACT
Sequence B: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T

12 A simple scoring scheme
Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C T T A A C T C G G A T C A - - T = +12 Alignment score

13 An optimal alignment -- the alignment of maximum score
Let A=a1a2…am and B=b1b2…bn . Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj With proper initializations, Si,j can be computed as follows.

14 Computing Si,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n

15 Initializations C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 C T T A A C T

16 S3,5 = ? C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 ? C T T A A C T

17 S3,5 = ? C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 7-3=4 -3+8=5 -5-3=-8 C T T A A C T

18 S3,5 = 5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T optimal score

19 C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14
8 – 5 – = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

20 Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment?

21 Initializations G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 C AA T T G A

22 S4,2 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 ? C AA T T G A

23 S4,2 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 0-3=-3 -11-5=-16 -14-3=-17 C AA T T G A

24 S5,5 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 ? C AA T T G A

25 S5,5 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 16-3=13 19-5=14 C AA T T G A

26 S5,5 = 14 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14
Match: 8 Mismatch: -5 Gap symbol: -3 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 14 24 21 18 32 29 1 27 C AA T T G A optimal score

27 C A A T - T G A G A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27
= 27 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 -17 14 24 21 18 32 29 1 27 C AA T T G A

28

29 Global Alignment vs. Local Alignment

30 Maximum-sum interval Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum. 9 –3 1 7 – –4 2 –7 6 – For each position, we can compute the maximum-sum interval ending at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.

31 Computing a segment sum in O(1) time?
Input: a sequence of real numbers a1a2…an Query: the sum of ai ai+1…aj

32 Computing a segment sum in O(1) time
prefix-sum(i) = a1+a2+…+ai all n prefix sums are computable in O(n) time. sum(i, j) = prefix-sum(j) – prefix-sum(i-1) j i prefix-sum(j) prefix-sum(i-1)

33 Maximizing sum(i, j) sum(i, j) = prefix-sum(j) – prefix-sum(i-1)
O(n)-time Method 1 sum(i, j) = prefix-sum(j) – prefix-sum(i-1) For each location j, prefix-sum(j) is fixed. To compute the maximum-sum interval ending at position j can be done by finding the minimum prefix-sum before position j. j i prefix-sum(j) prefix-sum(i-1)

34 Maximum-sum interval (The recurrence relation)
Define S(i) to be the maximum sum of the intervals ending at position i. O(n)-time Method 2 ai If S(i-1) < 0, concatenating ai with its previous interval gives less sum than ai itself.

35 Maximum-sum interval (Tabular computation)
9 – – –4 2 –7 6 – S(i) – – The maximum sum

36 Maximum-sum interval (Traceback)
9 – – –4 2 –7 6 – S(i) – – The maximum-sum interval:

37 An optimal local alignment
Si,j: the score of an optimal local alignment ending at (i, j) between a1a2…ai and b1b2…bj. With proper initializations, Si,j can be computed as follows.

38 local alignment C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T

39 local alignment C G G A T C A T 8 5 2 3 13 11 C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 2-3=-1 5+8=13 3-3=0 C T T A A C T

40 local alignment C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T The best score

41 A – C - T A T C A T 8-3+8-3+8 = 18 C G G A T C A T 8 5 2 3 13 11 10 7
8 5 2 3 13 11 10 7 18 C T T A A C T The best score

42 Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment?

43 Did you get it right? G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15
8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

44 A A T – T G A A T C T G = 37 G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

45 Osamu Gotoh

46 Affine gap penalties C - - - T T A A C T C G G A T C A - - T
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: -3 (w(-,b) = w(a,-) = -3) Each gap is charged an extra gap-open penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +12 Alignment score: 12 – 4 – 4 = 4

47 Affine gap panalties A gap of length k is penalized x + k·y.
gap-open penalty Three cases for alignment endings: ...x ...x ...x ...- x gap-symbol penalty an aligned pair This is the same as the scoring scheme that penalizes the first symbol x + y and an extended symbol y. a deletion an insertion

48 Affine gap penalties Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

49 Affine gap penalties (A gap of length k is penalized x + k·y.)

50 Affine gap penalties S I D S I D -y w(ai,bj) -x-y S I D D -x-y I S -y

51 Constant gap penalties
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: 0 (w(-,b) = w(a,-) = 0) Each gap is charged a constant penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +27 Alignment score: 27 – 4 – 4 = 19

52 Constant gap penalties
Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

53 Constant gap penalties

54 Restricted affine gap panalties
A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c Five cases for alignment endings: ...x ...x ...x ...- x and 5. for long gaps an aligned pair a deletion an insertion

55 Restricted affine gap penalties

56 D(i, j) vs. D’(i, j) Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)

57 Max{S(i,j)-x-ky, S(i,j)-x-cy}


Download ppt "Sequence Alignment Kun-Mao Chao (趙坤茂)"

Similar presentations


Ads by Google