Space-Saving Strategies for Computing Δ-points

Slides:



Advertisements
Similar presentations
Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion.
Advertisements

Sequence Alignment Tutorial #2
Global Alignment: Dynamic Progamming Table s 1 : acagagtaac s 2 : acaagtgatc -acaagtgatc - a c a g a g t a a c j s2s2 i s1s1 Scores: match=1, mismatch=-1,
Sequence Alignment Tutorial #2
Space Efficient Alignment Algorithms and Affine Gap Penalties
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez June 24, 2005.
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment Cont’d. Evolution Scoring Function Sequence edits: AGGCCTC  Mutations AGGACTC  Insertions AGGGCCTC  Deletions AGG.CTC Scoring Function:
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology.
Tree edit distance1 Tree Edit Distance.  Minimum edits to transform one tree into another Tree edit distance2 TED.
Alignment II Dynamic Programming
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez.
Space-Saving Strategies for Computing Δ-points Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Sequence Alignment.
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Sequence Alignment Algorithms Morten Nielsen Department of systems biology, DTU.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
M.M. Dalkilic, PhD Monday, September 08, 2008 Class V Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic,
Chapter 3 Computational Molecular Biology Michael Smith
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Lecture 15 Algorithm Analysis
Algorithms for the Maximum Subarray Problem Based on Matrix Multiplication Authours : Hisao Tamaki & Takeshi Tokuyama Speaker : Rung-Ren Lin.
Space-Saving Strategies for Analyzing Biomolecular Sequences Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Learning to Align: a Statistical Approach
CS502: Algorithms in Computational Biology
Homology Search Tools Kun-Mao Chao (趙坤茂)
Online Courses A note given in BCC class on May 10, 2016
Sequence Alignment Kun-Mao Chao (趙坤茂)
Homology Search Tools Kun-Mao Chao (趙坤茂)
A note given in BCC class on March 15, 2016
Dynamic-Programming Strategies for Analyzing Biomolecular Sequences
Homology Search Tools Kun-Mao Chao (趙坤茂)
Sequence Alignment Using Dynamic Programming
Minimum Spanning Trees
Using Dynamic Programming To Align Sequences
SMA5422: Special Topics in Biotechnology
Shortest-Paths Trees Kun-Mao Chao (趙坤茂)
Intro to Alignment Algorithms: Global and Local
Sequence Alignment Kun-Mao Chao (趙坤茂)
Lecture 14 Algorithm Analysis
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Sequence Alignment Kun-Mao Chao (趙坤茂)
A Note on Useful Algorithmic Strategies
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence Alignment Kun-Mao Chao (趙坤茂)
Space-Saving Strategies for Analyzing Biomolecular Sequences
Multiple Sequence Alignment
Minimum Spanning Trees
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Sequence Alignment (I)
Trees Kun-Mao Chao (趙坤茂)
Space-Saving Strategies for Computing Δ-points
A Note on Useful Algorithmic Strategies
A Note on Useful Algorithmic Strategies
Homology Search Tools Kun-Mao Chao (趙坤茂)
Minimum Spanning Trees
Multiple Sequence Alignment
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Computing Δ-points
Presentation transcript:

Space-Saving Strategies for Computing Δ-points Kun-Mao Chao (趙坤茂) Department of Computer Science and Information Engineering National Taiwan University, Taiwan http://www.csie.ntu.edu.tw/~kmchao

Δ-points S-(i, j): the best score of a path from (0, 0) to (i, j). S+(i, j): the best score of a path from (i, j) to (M, N). Δ-points: S-(i, j) + S+(i, j) >= Δ S - S +

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T optimal score

C T T A A C – T C G G A T C A T 8 – 5 –5 +8 -5 +8 -3 +8 = 14 8 – 5 –5 +8 -5 +8 -3 +8 = 14 C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 S- Matrix C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

-21 C G G A T C A T -18 -15 C T T A A C T -12 -9 -6 -3 -24 Match: 8 Mismatch: -5 Gap symbol: -3 S+ Matrix C G G A T C A T -21 -18 -15 -12 -9 -6 -3 -24 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S+ Matrix C G G A T C A T 14 3 6 8 10 12 1 -10 -21 11 13 2 4 -7 -18 5 16 7 -4 -15 -1 -12 9 15 18 -9 -2 -6 -13 -3 -24 C T T A A C T

C G G A T C A T C T T A A C T Match: 8 Mismatch: -5 Gap symbol: -3 S- and S+ Matrix C G G A T C A T 14 -3 3 -6 6 -9 8 -12 10 -15 12 -18 1 -21 -10 -24 5 2 11 -1 13 -4 -7 4 -13 16 7 -2 -5 9 15 18 -8 -11 -14 C T T A A C T

C G G A T C A T C T T A A C T Match: 8 Mismatch: -5 S- and S+ Matrix Gap symbol: -3 S- and S+ Matrix C G G A T C A T 14 -3 3 -6 6 -9 8 -12 10 -15 12 -18 1 -21 -10 -24 5 2 11 -1 13 -4 -7 4 -13 16 7 -2 -5 9 15 18 -8 -11 -14 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix Δ = 14 C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

Match: 8 Mismatch: -5 Gap symbol: -3 S- + S+ Matrix Δ = 13 C G G A T C A T 14 -1 -2 -3 -17 -31 -45 13 12 11 1 -16 -15 -30 -29 C T T A A C T

The leftmost/rightmost Δ-paths For simple scoring schemes, finding the leftmost Δ-path and the rightmost Δ-path is easy. For affine gap penalties, it is more complicated.

Two alignments may not intersect!

Method 1: O(MN) time; O(MN) space

Method 2: O(M2N) time; O(N) space Each row takes O(MN) time. In total, O(M) x O(MN) = O(M2N) S + M

Method 3: O(MN) time; O(N) space

Method 4: O(MN log M) time; O(N log M) space

Method 4: O(MN log M) time; O(N log M) space (cont’d) … O(log M) layers M O(N) O(N) O(N) O(N) O(N)

The computation of S-(i, j) and S+(i, j) inside a block

Method 5: O(MN log min {M, N}) time; O(M+N) space

C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 Match: 8 Mismatch: -5 Gap symbol: -3 S- Matrix C G G A T C A T -3 -6 -9 -12 -15 -18 -21 -24 8 5 2 -1 -4 -7 -10 -13 3 7 4 1 -2 -5 9 6 10 -8 -11 -14 14 C T T A A C T

Method 5: O(MN log min {M, N}) time; O(M+N) space (cont’d) … O(log min {M, N}) layers M 4(M+N) 2(M+N) M+N 1/2(M+N) 1/4(M+N)

Method 6: O(MN log log min {M, N}) time; O(M+N) space Real Size 1/25 1/23 N 1/210 1/25 1/22 M 1/29 1/219

Method 7: O(1/ε MN) time; O(1/ε MεN) space Here we use ε= 1/2 to illustrate the idea. Solve each M1/2N problem M1/2 S - S + M

Method 8: O(1/εMN) time; O(1/ε M1+ε+ N) space Here we use ε= 1/2 to illustrate the idea. O(N) M Solve each M1/2M problem M1/2 S - S + M

Methods Method 1: O(MN) time; O(MN) space Method 2: O(M2N) time; O(M) space Method 3: O(MN) time; O(M) space Method 4: O(MN log M) time; O(N log M) space Method 5: O(MN log min {M, N}) time; O(M+N) space Method 6: O(MN log log min {M, N}) time; O(M+N) space Method 7: O(1/εMN) time; O(1/ ε MεN) space Method 8: O(1/εMN) time; O(1/ε M1+ε+ N) space

Bonus points O(MN) time; O(M+N) space o(MN log log min {M, N}) time; O(M+N) space O(1/εMN) time; o(1/ε M1+ε+N) space