Space-Saving Strategies for Computing Δ-points

Slides:



Advertisements
Similar presentations
Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion.
Advertisements

Sequence Alignment Tutorial #2
Global Alignment: Dynamic Progamming Table s 1 : acagagtaac s 2 : acaagtgatc -acaagtgatc - a c a g a g t a a c j s2s2 i s1s1 Scores: match=1, mismatch=-1,
Sequence Alignment Tutorial #2
Space Efficient Alignment Algorithms and Affine Gap Penalties
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez June 24, 2005.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment Cont’d. Evolution Scoring Function Sequence edits: AGGCCTC  Mutations AGGACTC  Insertions AGGGCCTC  Deletions AGG.CTC Scoring Function:
Introduction to Sequence Alignment PENCE Bioinformatics Research Group University of Alberta May 2001.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
Alignment II Dynamic Programming
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez.
Space-Saving Strategies for Computing Δ-points Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Sequence Alignment.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Theory of Computing Lecture 13 MAS 714 Hartmut Klauck.
M.M. Dalkilic, PhD Monday, September 08, 2008 Class V Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic,
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Lecture 15 Algorithm Analysis
Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
Learning to Align: a Statistical Approach
CS502: Algorithms in Computational Biology
Homology Search Tools Kun-Mao Chao (趙坤茂)
Online Courses A note given in BCC class on May 10, 2016
Sequence Alignment Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
A note given in BCC class on March 15, 2016
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence Alignment Using Dynamic Programming
Sequence Alignment 11/24/2018.
Minimum Spanning Trees
Using Dynamic Programming To Align Sequences
SMA5422: Special Topics in Biotechnology
Shortest-Paths Trees Kun-Mao Chao (趙坤茂)
Heaviest Segments in a Number Sequence
Sequence Alignment Kun-Mao Chao (趙坤茂)
Lecture 14 Algorithm Analysis
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Sequence Alignment Kun-Mao Chao (趙坤茂)
A Note on Useful Algorithmic Strategies
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence Alignment Kun-Mao Chao (趙坤茂)
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Multiple Sequence Alignment
Minimum Spanning Trees
Approximation Algorithms for the Selection of Robust Tag SNPs
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Sequence Alignment (I)
Trees Kun-Mao Chao (趙坤茂)
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Homology Search Tools Kun-Mao Chao (趙坤茂)
Minimum Spanning Trees
Multiple Sequence Alignment
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Presentation transcript:

Space-Saving Strategies for Computing Δ-points Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan http://www.csie.ntu.edu.tw/~kmchao

Δ-points S-(i, j): the best score of a path from (0, 0) to (i, j). S+(i, j): the best score of a path from (i, j) to (M, N). Δ-points: S-(i, j) + S+(i, j) >= Δ S - S +

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T optimal score

C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 S- Matrix C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

-21 C G G A T C A T -18 -15 C T T A A C T -12 -9 -6 -3 -24 Match: 8 Mismatch: -5 Gap symbol: -3 S+ Matrix C G G A T C A T -21 -18 -15 -12 -9 -6 -3 -24 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S+ Matrix C G G A T C A T 14 3 6 8 10 12 1 -10 -21 11 13 2 4 -7 -18 5 16 7 -4 -15 -1 -12 9 15 18 -9 -2 -6 -13 -3 -24 C T T A A C T

C G G A T C A T C T T A A C T Match: 8 Mismatch: -5 Gap symbol: -3 S- and S+ Matrix C G G A T C A T 14 -3 3 -6 6 -9 8 -12 10 -15 12 -18 1 -21 -10 -24 5 2 11 -1 13 -4 -7 4 -13 16 7 -2 -5 9 15 18 -8 -11 -14 C T T A A C T

C G G A T C A T C T T A A C T Match: 8 Mismatch: -5 S- and S+ Matrix Gap symbol: -3 S- and S+ Matrix C G G A T C A T 14 -3 3 -6 6 -9 8 -12 10 -15 12 -18 1 -21 -10 -24 5 2 11 -1 13 -4 -7 4 -13 16 7 -2 -5 9 15 18 -8 -11 -14 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix Δ = 14 C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix Δ = 13 C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

The leftmost/rightmost Δ-paths For simple scoring schemes, finding the leftmost Δ-path and the rightmost Δ-path is easy. For affine gap penalties, it is more complicated.

Two alignments may not intersect!

Method 1: O(MN) time; O(MN) space

Method 2: O(M2N) time; O(N) space Each row takes O(MN) time. In total, O(M) x O(MN) = O(M2N) S + M

Method 3: O(MN) time; O(N) space

Method 4: O(MN log M) time; O(N log M) space

Method 4: O(MN log M) time; O(N log M) space (cont’d) … O(log M) layers M O(N) O(N) O(N) O(N) O(N)

The computation of S-(i, j) and S+(i, j) inside a block

Method 5: O(MN log min {M, N}) time; O(M+N) space

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 S- Matrix C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

Method 5: O(MN log min {M, N}) time; O(M+N) space (cont’d) … O(log min {M, N}) layers M 4(M+N) 2(M+N) M+N 1/2(M+N) 1/4(M+N)

Method 6: O(MN log log min {M, N}) time; O(M+N) space Real Size 1/25 1/23 N 1/210 1/25 1/22 M 1/29 1/219

Method 6 (cont’d) The width at layer i is M/22i+i-1 Partition Lines Number of Cuts M 4N 22 1 M/22 2N 2/1/22 = 23 2 M/22/23 = M/25 N 1/1/25 = 25 3 M/25/25 = M/210 N/2 1/2/1/210 = 29 4 M/210/29 = M/219 N/22 1/22/1/219 = 217 5 M/219/217 = M/236 N/23 1/23/1/236 = 233 6 M/236/233 = M/269 N/24 1/24/1/269 = 265

Method 7: O(1/ε MN) time; O(1/ε MεN) space Here we use ε= 1/2 to illustrate the idea. Solve each M1/2N problem M1/2 S - S + M

Method 8: O(1/εMN) time; O(1/ε M1+ε+ N) space Here we use ε= 1/2 to illustrate the idea. O(N) M Solve each M1/2M problem M1/2 S - S + M

Methods Method 1: O(MN) time; O(MN) space Method 2: O(M2N) time; O(M) space Method 3: O(MN) time; O(M) space Method 4: O(MN log M) time; O(N log M) space Method 5: O(MN log min {M, N}) time; O(M+N) space Method 6: O(MN log log min {M, N}) time; O(M+N) space Method 7: O(1/εMN) time; O(1/ ε MεN) space Method 8: O(1/εMN) time; O(1/ε M1+ε+ N) space

Bonus points O(MN) time; O(M+N) space o(MN log log min {M, N}) time; O(M+N) space O(1/εMN) time; o(1/ε M1+ε+N) space