Eugene W.Myers and Webb Miller. Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Overview What is Dynamic Programming? A Sequence of 4 Steps
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Global Alignment: Dynamic Progamming Table s 1 : acagagtaac s 2 : acaagtgatc -acaagtgatc - a c a g a g t a a c j s2s2 i s1s1 Scores: match=1, mismatch=-1,
6/11/2015 © Bud Mishra, 2001 L7-1 Lecture #7: Local Alignment Computational Biology Lecture #7: Local Alignment Bud Mishra Professor of Computer Science.
§ 8 Dynamic Programming Fibonacci sequence
Global alignment algorithm CS 6890 Zheng Lu. Introduction Global alignments find the best match over the total length of both sequences. We do global.
Space Efficient Alignment Algorithms and Affine Gap Penalties
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez June 24, 2005.
Sequence Alignment Algorithms in Computational Biology Spring 2006 Edited by Itai Sharon Most slides have been created and edited by Nir Friedman, Dan.
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Expected accuracy sequence alignment
Sequence Alignment Cont’d. Sequence Alignment -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Introduction to Sequence Alignment PENCE Bioinformatics Research Group University of Alberta May 2001.
Dynamic Programming and Biological Sequence Comparison Part I.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter 11: Core String Edits.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
1 A Linear Space Algorithm for Computing Maximal Common Subsequences Author: D.S. Hirschberg Publisher: Communications of the ACM 1975 Presenter: Han-Chen.
Alignment II Dynamic Programming
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Class 2: Basic Sequence Alignment
Space Efficient Alignment Algorithms Dr. Nancy Warter-Perez.
1 Theory I Algorithm Design and Analysis (11 - Edit distance and approximate string matching) Prof. Dr. Th. Ottmann.
. Sequence Alignment I Lecture #2 This class has been edited from Nir Friedman’s lecture. Changes made by Dan Geiger, then Shlomo Moran. Background Readings:
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Developing Pairwise Sequence Alignment Algorithms
CS 5263 Bioinformatics Lecture 4: Global Sequence Alignment Algorithms.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Cédric Notredame (19/10/2015) Using Dynamic Programming To Align Sequences Cédric Notredame.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 16.
1 Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA.
Expected accuracy sequence alignment Usman Roshan.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Sequence Alignment Tanya Berger-Wolf CS502: Algorithms in Computational Biology January 25, 2011.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 21.
Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
CS502: Algorithms in Computational Biology
Piecewise linear gap alignment.
Sequence Alignment Kun-Mao Chao (趙坤茂)
Bioinformatics: The pair-wise alignment problem
Sequence Alignment Using Dynamic Programming
Sequence Alignment 11/24/2018.
Using Dynamic Programming To Align Sequences
SMA5422: Special Topics in Biotechnology
Sequence Alignment with Traceback on Reconfigurable Hardware
CSE 589 Applied Algorithms Spring 1999
Sequence Alignment Kun-Mao Chao (趙坤茂)
Sequence Alignment Kun-Mao Chao (趙坤茂)
Space-Saving Strategies for Computing Δ-points
Space-Saving Strategies for Analyzing Biomolecular Sequences
Advanced Analysis of Algorithms
Space-Saving Strategies for Computing Δ-points
Presentation transcript:

Eugene W.Myers and Webb Miller

Outline Introduction Gotoh's algorithm O(N) space Gotoh's algorithm Main algorithm Implementation Conclusion

Introduction Space, not time Hirschberg’s Algorithm Maximizing the similarity score of an alignment Gotoh’s Algorithm Minimizing the difference score of a conversion Linear space version for affine gap penalties. For a megabyte of memory. W.Myers and Miller : sequences of length Altschul and Erickson : sequences length < 1070

Transformation (1/2) Hirschberg’s AlgorithmGotoh’s Algorithm Aligned Pair Affine Gap Penalties

Transformation (2/2) Match = 8, Mismatch = -5, Gap Symbol = -3, Gap-open = -4 <

Example(1/2) Hirschberg’s Algorithm Gotoh’s Algorithm Match80 Mismatch-513 Gap-open-44 Gap Symbol-37

Example(2/2) 1A : ACGGTTCAAG B : ACGGTTCAAG 2A : ACGGTTCAAG B : ACGGATCAAG 3 Hirschberg’s AlgorithmGotoh’s Algorithm Cost C (minimum)

R 黃博平

Some notations : the i-symbol prefix of A : the j-symbol prefix of B C(i, j):minimum cost of a conversion of to

Simple gap(1/4) gap(k)= h*k

Simple gap(2/4) A A G AGTACAGTAC Space= O(n^2)

Simple gap(3/4) m/2

Simple gap(4/4) Forward score and backward score Space: O(m+n)

Affine gap(1/8) A gap of length k : cost = g + k*h A T A A C T C G A A T C - - T

Affine gap(2/8) C(i, j):minimum cost of a conversion of to D(i, j):minimum cost of a conversion of to that deletes I(i, j):minimum cost of a conversion of to that inserts

Affine gap(3/8) if i > 0 and j> 0 if i = 0 and j> 0 if i > 0 and j= 0 if i = 0 and j= 0

Affine gap(4/8) if i > 0 and j> 0 if i = 0 and j> 0

Affine gap(5/8) if i > 0 and j> 0 if i > 0 and j= 0

Affine gap(6/8)

Affine gap(7/8) * * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC C D I

Affine gap(8/8) * * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC I D C

R 陳彥璋

Observation i-th row of C and D depends only on row i and i-1. i-th row of I depends only on row i. CDI

Linear Space Use two one-dimension arrays (CC and DD) and three variables.

Linear Space

Algorithm

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC C D I g = 2.0 h = 0.5 CC DD t = 2.0

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC C D I g = 2.0 h = 0.5 CC DD t = 2.0

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC s c e CC DD g = 2.0 h = 0.5 i = 5 t = 4.5 C D I

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC s c e CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 C D I

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC s c CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 e C D I

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC s CC DD t = 4.5 i = 5 j = 1 g = 2.0 h = 0.5 e c C D I

* * * * * * A A G AGTACAGTAC **** A A G AGTACAGTAC AGTACAGTAC Optimal conversion cost. CC DD C D I

What is the conversion of AGTAC and AAG ?

B 王柏易

Midpoint Hirschberg (1975): recursive divide-and-conquer Backward Computing Forward Computing

Gap Penalty i-1, j-1i, j-1 i-1, ji, j

Gap Penalty CC( j) = minimum cost of a conversion of Ai* to Bj DD( j) = minimum cost of a conversion of Ai* to Bj that ends with a delete

Gap Penalty RR(N - j) = minimum cost of a conversion of Ai* T to Bj T SS(N - j) = minimum cost of a conversion of Ai* T to Bj T that begins with a delete

Find Midpoint with Gap Penalty Backward Computing Forward Computing How to compute the midpoint?

R 李政緯

Midpoint The problem of calculating the midpoint is that when we concatenate two substrings into one, we may coalesce two gaps into one Which means that we may consider min { CC + RR, DD + SS - g, II + JJ - g}

Midpoint Recall the above algorithm, we do save the space of II and JJ. We can reduce it into min {CC + RR, DD + SS - g}

Midpoint Remember that we should find min j ∈ [0, N] {min { CC + RR, DD + SS - g, II + JJ - g}} i* j j+1

Midpoint Type 1 recurrence Type 2 recurrence i* j* i* j*

Example A = agtac, B = aag, i* = 2 agtac a__ag Recurrsive call on (a, a) and (ac, ag)

R 涂宗瑋

Implementation Storage Requirement Memory v.s. Sequence length Compared with classic dynamic programming algorithm

Storage Requirement(1/4) Vectors : CC,DD,RR, and SS Space: 4N words M + N words for an optimal conversion M = N = 38 40

Storage Requirement(2/4) words for the table(w):replacement costs 128*128 wASCII [1]ASCII [2]ASCII[3]ASCII[4]ASCII[…]ASCII[128] ASCII [1]W1,1W1,2W1,3W1,4W1,…W1,128 ASCII [2]W2,1W2,2W2,3W2,4W2,…W2,128 ASCII [3]W3,1W3,2W3,3W3,4W3,…W3,128 ASCII [4]W4,1W4,2W4,3W4,4W4,…W4,128 ASCII[…]W…,1W…,2W…,3W…,4W…,…W…,128 ASCII[128]W128,1W128,2W128,3W128,4W128,…W128,128

Storage Requirement(3/4) 16 words for the table(w):replacement costs 4*4 ATCG AW(A,A)W(A,T)W(A,C)W(A,G) TW(T,A)W(T,T)W(T,C)W(T,G) CW(C,A)W(C,T)W(C,C)W(C,G) GW(G,A)W(G,T)W(G,C)W(G,G)

Storage Requirement(4/4) M + N bytes for the sequences A and B. A and B could be compressed DNA sequences only 2(M + N) bits are necessary

Memory v.s. Sequence length Maximum length of sequences that can be aligned in a given amount of memory Altschul and Erickson : 7MN-bit approach Memory (bytes)Linear Space(w/o op.) Linear Space(with op.) Altschul and Erickson 64K k k k N = Memory / 4*4N = Memory / 6*4N = sqrt(Memory *8 / 7)

Compared with classic dynamic programming algorithm classic dynamic programming algorithm (Wagner and Fischer, 1974).

Compared with classic dynamic programming algorithm Space : classic dynamic programming algorithm : O(MN) linear-space algorithm O(N + lgM) Time : Both O(MN) But in practice, linear-space slower than classic dynamic programming algorithm. linear-space : classic DP = 2.84 : 1

R 林澤豪

C G G A T C A T CTTAACTCTTAACT Reduce problem

Reduce problem(cont.)

60 Reduce problem(cont.) m/2 Partition line