Presentation is loading. Please wait.

Presentation is loading. Please wait.

Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global.

Similar presentations


Presentation on theme: "Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global."— Presentation transcript:

1 Global alignment algorithm CS 6890 Zheng Lu

2 Introduction Global alignments find the best match over the total length of both sequences. We do global alignment to determine if the two sequences are homologous. Homologous sequences are sometimes similar over some regions but different over other regions.

3 Problem description DNA sequences in human and mouse are similar over exon region but different over intron region. Homologous sequences have much lower similarity, if the different regions are much longer than the similar regions. So, we need a generalized global alignment to handle sequences with intermittent similarities.

4 Paper Review A generalized global alignment algorithm Xiaoqiu Huang, and Kun-Mao Chao Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames

5 Outline The alignment model and modification of dynamic programming to compute the score. Construct the algorithm to achieve the space efficient.

6 Alignment Model A = a1a2... am B = b1b2... bn A generalized global alignment contains substitutions, gaps and difference block. There are 3 types of difference block.

7 Alignment model

8 Definition S(i,j): The maximum score of general alignments of Ai and Bj. H(i,j):The maximum score of general alignments of Ai and Bj that end with a difference block. D(i,j): general alignments that end with a deletion gap and I (i,j) end with an insertion gap.

9 Dynamic Programming The base condition and recurrence S(0,0) = 0, S(i,0) = max{D(i,0),H(i,0)} for i > 0, S(0,j) = max{I (0,j), H(0,j)} for j > 0, S(i,j) = max{S(i−1,j−1) +σ(ai,bj), D(i, j ), I (i, j ), H(i, j )}

10 Dynamic Programming D(0,j) = S(0,j) − q for j ≥ 0, D(i,0) = D(i −1,0) − r for i > 0, D(i,j) = max{D(i −1,j) − r,S(i −1,j) − q − r } for i > 0 and j > 0. I (i,0) = S(i,0) − q for i ≥ 0, I (0,j) = I (0, − 1) − r for j > 0, I (i,j) = max{I (i,j −1)−r,S(i,j −1)−q −r } for i > 0 and j > 0.

11 Dynamic Programming H(i, j)= −d for i = 0 or j = 0, H(i, j)=max{H(i,j − 1),H(i −1,j),S(i,j −1)−d, S(i −1,j)−d} for i > 0 and j > 0. S(m,n) is the optimal score for the general alignment.

12 Parameter Problem The problem is how to determine the parameter d. score(T) = score(T’) + score(t) − d. Because T is optimal, we have score(T) ≥ score(T’) score(t) ≥ d. So, d will be set to the minimum score of local alignment.

13

14 Algorithm Analysis The advantage of this algorithm is space efficient. It’s linear space. To achieve the linear spaces we need determine a pair of positions imid and jmid. Then recursively compute the alignment before and after the position.

15 Middle pair position hk =H(imid,jh) + H(imid,jh)+d H is the maximum score of alignment starts with block difference. hk =max{H(imid,j)+H (imid,j)+d | 0≤j≤n}. jh is the position where the maximum score hk is obtained. (imid, jh) is the middle pair position for group 1.

16 Middle pair position df =max{D(imid,j)+D(imid,j)+q | 0≤j≤n}. jd: position at which the maximum score df is obtained. st =max{S(imid,j)+S(imid,j)|0 ≤j≤n}. js: position at which the maximum score st is obtained

17 Optimal general alignment The score of an optimal general alignment of A and B is max{df, hk, st}. jmid is the corresponding one of jd,jh and js After find the middle pair position, we can compute the score recursively.

18 Result Implement the program named GAP3 GenBank Accession U47924(222 930) GenBank Accession AC002397 (227 538) d = 300, ms = -20, q = 60, r = 2. 154 similar regions, total length is 43,445 Average identity is 79%.

19 Result H. influenzae protein (1409residues) and E. coli protein (1306 residues). σ= BLOSUM62, q = 10, r = 2, and d = 40. 6 similar regions, total length is 581. 30% average identity.

20 Discussion Needleman-Wunsch global alignment is fast but need highly similar regions. This proposed algorithm is much slower but can find lower similarities. It is best used to compare the short genomic regions with lower similarities. Can be used as a general pair wise program for HMM based methods such as HMMER.


Download ppt "Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global."

Similar presentations


Ads by Google