Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment Kun-Mao Chao (趙坤茂)

Similar presentations


Presentation on theme: "Sequence Alignment Kun-Mao Chao (趙坤茂)"— Presentation transcript:

1 Sequence Alignment Kun-Mao Chao (趙坤茂)
Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:

2 Useful Websites MIT Biology Hypertextbook
The International Society for Computational Biology: National Center for Biotechnology Information (NCBI, NIH): European Bioinformatics Institute (EBI): DNA Data Bank of Japan (DDBJ):

3 orz’s sequence evolution
orz (kid) OTZ (adult) Orz (big head) Crz (motorcycle driver) on_ (soldier) or2 (bottom up) oΩ (back high) STO (the other way around) Oroz (me) the origin? their evolutionary relationships? their putative functional relationships?

4 What? The truth is more important than the facts. THETR UTHIS MOREI

5 Dot Matrix Sequence A:CTTAACT Sequence B:CGGATCAT C G G A T C A T

6 Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: C---TTAACT CGGATCA--T Sequence A Sequence B

7 Pairwise Alignment Sequence A: CTTAACT Sequence B: CGGATCAT
An alignment of A and B: Mismatch Match C---TTAACT CGGATCA--T Deletion gap Insertion gap

8 Alignment Graph C---TTAACT CGGATCA--T Sequence A: CTTAACT
Sequence B: CGGATCAT C G G A T C A T C T T A A C T C---TTAACT CGGATCA--T

9 A simple scoring scheme
Match: +8 (w(x, y) = 8, if x = y) Mismatch: -5 (w(x, y) = -5, if x ≠ y) Each gap symbol: -3 (w(-,x)=w(x,-)=-3) C T T A A C T C G G A T C A - - T = +12 Alignment score

10 An optimal alignment -- the alignment of maximum score
Let A=a1a2…am and B=b1b2…bn . Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj With proper initializations, Si,j can be computed as follows.

11 Computing Si,j j w(ai,bj) w(ai,-) i w(-,bj) Sm,n

12 Initializations C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24
-3 -6 -9 -12 -15 -18 -21 -24 C T T A A C T

13 S3,5 = ? C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
-3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 ? C T T A A C T

14 S3,5 = 5 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7
-3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T optimal score

15 C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14
8 – 5 – = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 -8 -11 -14 14 C T T A A C T

16 Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal alignment?

17 Initializations G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24
-3 -6 -9 -12 -15 -18 -21 -24 C AA T T G A

18 S4,2 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
-3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 ? C AA T T G A

19 S5,5 = ? G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4
-3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 ? C AA T T G A

20 S5,5 = 14 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14
-3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 14 24 21 18 32 29 1 27 C AA T T G A optimal score

21 C A A T - T G A G A A T C T G C -5 +8 +8 +8 -3 +8 +8 -5 = 27
= 27 G A A T C T G C -3 -6 -9 -12 -15 -18 -21 -24 -5 -8 -11 -14 -4 -7 -10 -13 3 11 8 5 2 -1 19 16 13 10 7 14 24 21 18 32 29 1 27 C AA T T G A

22 Global Alignment vs. Local Alignment

23 Maximum-sum interval Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum. 9 –3 1 7 – –4 2 –7 6 – For each position, we can compute the maximum-sum interval starting at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.

24 Maximum-sum interval (The recurrence relation)
Define S(i) to be the maximum sum of the intervals ending at position i. ai If S(i-1) < 0, concatenating ai with its previous interval gives less sum than ai itself.

25 Maximum-sum interval (Tabular computation)
9 – – –4 2 –7 6 – S(i) – – The maximum sum

26 Maximum-sum interval (Traceback)
9 – – –4 2 –7 6 – S(i) – – The maximum-sum interval:

27 An optimal local alignment
Si,j: the score of an optimal local alignment ending at ai and bj With proper initializations, Si,j can be computed as follows.

28 local alignment C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T Match: 8
Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 ? C T T A A C T

29 local alignment C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T
Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T 8 5 2 3 13 11 10 7 18 C T T A A C T The best score

30 A – C - T A T C A T 8-3+8-3+8 = 18 C G G A T C A T 8 5 2 3 13 11 10 7
8 5 2 3 13 11 10 7 18 C T T A A C T The best score

31 Now try this example in class
Sequence A: CAATTGA Sequence B: GAATCTGC Their optimal local alignment?

32 Did you get it right? G A A T C T G C 8 5 2 3 16 13 10 7 4 24 21 18 15
8 5 2 3 16 13 10 7 4 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

33 A A T – T G A A T C T G = 37 G A A T C T G C 8 5 2 3 16 13 10 7 4 1 24 21 18 15 12 19 29 26 23 37 34 32 C AA T T G A

34 Affine gap penalties C - - - T T A A C T C G G A T C A - - T
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: -3 (w(-,b) = w(a,-) = -3) Each gap is charged an extra gap-open penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +12 Alignment score: 12 – 4 – 4 = 4

35 Affine gap panalties A gap of length k is penalized x + k·y.
gap-open penalty Three cases for alignment endings: ...x ...x ...x ...- x gap-symbol penalty an aligned pair a deletion an insertion

36 Affine gap penalties Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

37 Affine gap penalties (A gap of length k is penalized x + k·y.)

38 Affine gap penalties S I D S I D -y w(ai,bj) -x-y S I D D -x-y I S -y

39 Constant gap penalties
Match: +8 (w(a, b) = 8, if a = b) Mismatch: -5 (w(a, b) = -5, if a ≠ b) Each gap symbol: 0 (w(-,b) = w(a,-) = 0) Each gap is charged a constant penalty: -4. -4 -4 C T T A A C T C G G A T C A - - T = +27 Alignment score: 27 – 4 – 4 = 19

40 Constant gap penalties
Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion. Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion. Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

41 Constant gap penalties

42 Restricted affine gap panalties
A gap of length k is penalized x + f(k)·y. where f(k) = k for k <= c and f(k) = c for k > c Five cases for alignment endings: ...x ...x ...x ...- x and 5. for long gaps an aligned pair a deletion an insertion

43 Restricted affine gap penalties

44 D(i, j) vs. D’(i, j) Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j) Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c D(i, j) <= D’(i, j)

45 Max{S(i,j)-x-ky, S(i,j)-x-cy}


Download ppt "Sequence Alignment Kun-Mao Chao (趙坤茂)"

Similar presentations


Ads by Google