# Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion.

## Presentation on theme: "Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion."— Presentation transcript:

Eugene W.Myers and Webb Miller

Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion

Introduction Space, not time Hirschberg’s Algorithm Maximizing the similarity score of an alignment Gotoh’s Algorithm Minimizing the difference score of a conversion Linear space version for affine gap penalties. For a megabyte of memory. W.Myers and Miller : sequences of length 62500 Altschul and Erickson : sequences length < 1070

Transformation (1/2) Hirschberg’s AlgorithmGotoh’s Algorithm Aligned Pair Affine Gap Penalties

Transformation (2/2) Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4 <

Example(1/2) Hirschberg’s Algorithm Gotoh’s Algorithm Match80 Mismatch-513 Gap-open-44 Gap Symbol-37

Example(2/2) 1A : ACGGTTCAAG B : ACGGTTCAAG 2A : ACGGTTCAAG B : ACGGATCAAG 3 Hirschberg’s AlgorithmGotoh’s Algorithm Cost C (minimum)

R99922005 黃博平

Some notations : the i-symbol prefix of A : the j-symbol prefix of B C(i, j):minimum cost of a conversion of to

Simple gap(1/4) gap(k)= h*k

Simple gap(2/4) 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC Space= O(n^2)

Simple gap(3/4) m/2

Simple gap(4/4) Forward score and backward score Space: O(m+n)

Affine gap(1/8) A gap of length k : cost = g + k*h A - - - T A A C T C G A A T C - - T

Affine gap(2/8) C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletes I(i, j):minimum cost of a conversion of to that inserts

Affine gap(3/8) if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0

Affine gap(4/8) if i > 0 and j> 0 if i = 0 and j> 0

Affine gap(5/8) if i > 0 and j> 0 if i > 0 and j= 0

Affine gap(6/8)

Affine gap(7/8) *4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC C D I

Affine gap(8/8) *4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC I D C

R99922041 陳彥璋

Observation i-th row of C and D depends only on row i and i-1. i-th row of I depends only on row i. CDI

Linear Space Use two one-dimension arrays (CC and DD) and three variables.

Linear Space

Algorithm

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC C D I g = 2.0 h = 0.5 CC DD t = 2.0

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC C D I g = 2.0 h = 0.5 CC DD t = 2.0

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC s c e CC DD g = 2.0 h = 0.5 i = 5 t = 4.5 C D I

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC s c e CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 C D I

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC s c CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 e C D I

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC s CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 e c C D I

*4.55.05.5 *5.05.56.0 *2.55.05.5 *3.03.55.0 *3.54.04.5 *4.04.55.0 0.02.53.03.5 2.50.02.53.0 2.51.02.5 3.53.03.52.0 4.03.53.04.5 4.04.54.0 A A G AGTACAGTAC **** 4.55.02.53.0 5.05.55.03.5 5.56.05.56.0 6.56.05.5 6.57.06.57.0 A A G AGTACAGTAC AGTACAGTAC Optimal conversion cost. CC DD C D I

What is the conversion of AGTAC and AAG ?

B95902077 王柏易

Midpoint Hirschberg (1975): recursive divide-and-conquer Backward Computing Forward Computing

Gap Penalty i-1, j-1i, j-1 i-1, ji, j

Gap Penalty CC( j) = minimum cost of a conversion of Ai* to Bj DD( j) = minimum cost of a conversion of Ai* to Bj that ends with a delete

Gap Penalty RR(N - j) = minimum cost of a conversion of Ai* T to Bj T SS(N - j) = minimum cost of a conversion of Ai* T to Bj T that begins with a delete

Find Midpoint with Gap Penalty Backward Computing Forward Computing How to compute the midpoint?

R99922035 李政緯

Midpoint The problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}

Midpoint Recall the above algorithm, we do save the space of II and JJ. We can reduce it into min {CC + RR, DD + SS - g}

Midpoint Remember that we should find min j ∈ [0, N] {min { CC + RR, DD + SS - g, II + JJ - g}} i* j j+1

Midpoint Type 1 recurrence Type 2 recurrence i* j* i* j*

Example A = agtac, B = aag, i* = 2 agtac a__ag Recurrsive call on (a, a) and (ac, ag)

R99922062 涂宗瑋

Implementation Storage Requirement Memory v.s. Sequence length Compared with classic dynamic programming algorithm

Storage Requirement(1/4) Vectors : CC,DD,RR, and SS Space: 4N words M + N words for an optimal conversion M = N = 38 40

Storage Requirement(2/4) 16384 words for the table(w):replacement costs 128*128 wASCII [1]ASCII [2]ASCII[3]ASCII[4]ASCII[…]ASCII[128] ASCII [1]W1,1W1,2W1,3W1,4W1,…W1,128 ASCII [2]W2,1W2,2W2,3W2,4W2,…W2,128 ASCII [3]W3,1W3,2W3,3W3,4W3,…W3,128 ASCII [4]W4,1W4,2W4,3W4,4W4,…W4,128 ASCII[…]W…,1W…,2W…,3W…,4W…,…W…,128 ASCII[128]W128,1W128,2W128,3W128,4W128,…W128,128

Storage Requirement(3/4) 16 words for the table(w):replacement costs 4*4 ATCG AW(A,A)W(A,T)W(A,C)W(A,G) TW(T,A)W(T,T)W(T,C)W(T,G) CW(C,A)W(C,T)W(C,C)W(C,G) GW(G,A)W(G,T)W(G,C)W(G,G)

Storage Requirement(4/4) M + N bytes for the sequences A and B. A and B could be compressed DNA sequences only 2(M + N) bits are necessary

Memory v.s. Sequence length Maximum length of sequences that can be aligned in a given amount of memory Altschul and Erickson : 7MN-bit approach Memory (bytes)Linear Space(w/o op.) Linear Space(with op.) Altschul and Erickson 64K40002666270 128k80005333382 256k1600010666540 1000k62500416661069 N = Memory / 4*4N = Memory / 6*4N = sqrt(Memory *8 / 7)

Compared with classic dynamic programming algorithm classic dynamic programming algorithm (Wagner and Fischer, 1974).

Compared with classic dynamic programming algorithm Space : classic dynamic programming algorithm : O(MN) linear-space algorithm O(N + lgM) Time : Both O(MN) But in practice, linear-space slower than classic dynamic programming algorithm. linear-space : classic DP = 2.84 : 1

R99945020 林澤豪

58 0-3-6-9-12-15-18-21-24 -3852-4-7-10-13 -6530-3741-2 -920-2-5529 -12-3-5630107 -15-4-6-831-285 -18-7-9-110-2963 -21-10-12-14-386414 C G G A T C A T CTTAACTCTTAACT Reduce problem

Reduce problem(cont.)

60 Reduce problem(cont.) m/2 Partition line

Download ppt "Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion."

Similar presentations