Aligning Alignments Exactly John Kececioglu and Dean Starrett Proceedings of the eighth annual international conference on Computational molecular biology.

Slides:



Advertisements
Similar presentations
Multiplying Matrices Two matrices, A with (p x q) matrix and B with (q x r) matrix, can be multiplied to get C with dimensions p x r, using scalar multiplications.
Advertisements

Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.
Longest Common Subsequence
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Multiple Sequence Alignment
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Molecular Evolution Revised 29/12/06
Computability and Complexity 13-1 Computability and Complexity Andrei Bulatov The Class NP.
BNFO 602 Multiple sequence alignment Usman Roshan.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Space Efficient Alignment Algorithms and Affine Gap Penalties
Sequencing and Sequence Alignment
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:
Expected accuracy sequence alignment
Computational Genomics Lecture #3a Much of this class has been edited from Nir Friedman’s lecture which is available at Changes.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Aligning Alignments Soni Mukherjee 11/11/04. Pairwise Alignment Given two sequences, find their optimal alignment Score = (#matches) * m - (#mismatches)
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Dynamic Programming and Biological Sequence Comparison Part I.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Multiple Sequence alignment Chitta Baral Arizona State University.
Achieving Minimum Coverage Breach under Bandwidth Constraints in Wireless Sensor Networks Maggie X. Cheng, Lu Ruan and Weili Wu Dept. of Comput. Sci, Missouri.
Sequence Alignment III CIS 667 February 10, 2004.
BNFO 602 Multiple sequence alignment Usman Roshan.
Chapter 11: Limitations of Algorithmic Power
Aligning Alignments Exactly By John Kececioglu, Dean Starrett CS Dept. Univ. of Arizona Appeared in 8 th ACM RECOME 2004, Presented by Jie Meng.
Dynamic Programming Reading Material: Chapter 7 Sections and 6.
Finding the optimal pairwise alignment We are interested in finding the alignment of two sequences that maximizes the similarity score given an arbitrary.
. Sequence Alignment Tutorial #3 © Ydo Wexler & Dan Geiger.
Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
The Complexity of Algorithms and the Lower Bounds of Problems
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Multiple Sequence Alignment
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
. Multiple Sequence Alignment Tutorial #4 © Ilan Gronau.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Multiple Sequence Alignment S 1 = AGGTC S 2 = GTTCG S 3 = TGAAC Possible alignment A-TA-T GGGGGG G--G-- TTATTA -TA-TA CCCCCC -G--G- AG-AG- GTTGTT GTGGTG.
Chapter 11 Limitations of Algorithm Power. Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples:
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Toward Efficient Flow-Sensitive Induction Variable Analysis and Dependence Testing for Loop Optimization Yixin Shou, Robert A. van Engelen, Johnnie Birch,
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
TECH Computer Science NP-Complete Problems Problems  Abstract Problems  Decision Problem, Optimal value, Optimal solution  Encodings  //Data Structure.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Fragment assembly of DNA A typical approach to sequencing long DNA molecules is to sample and then sequence fragments from them.
Multiple Sequence Alignment Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW:
CrossWA: A new approach of combining pairwise and three-sequence alignments to improve the accuracy for highly divergent sequence alignment Che-Lun Hung,
Chapter 3 Computational Molecular Biology Michael Smith
Multiple sequence comparison (MSC) Reading: Setubal/Meidanis, 3.4 Gusfield, Algorithms on Strings, Trees and Sequences, chapter 14.
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Sequence alignment CS 394C: Fall 2009 Tandy Warnow September 24, 2009.
1 P and NP. 2 Introduction The Traveling Salesperson problem and thousands of other problems are equally hard in the sense that if we had an efficient.
Example 2 You are traveling by a canoe down a river and there are n trading posts along the way. Before starting your journey, you are given for each 1
Aligning Genomes Genome Analysis, 12 Nov 2007 Several slides shamelessly stolen from Chr. Storm.
Piecewise linear gap alignment.
Sequence comparison: Dynamic programming
Computability and Complexity
Sequence Alignment Using Dynamic Programming
Sequence Alignment with Traceback on Reconfigurable Hardware
CS 581 Tandy Warnow.
Chapter 11 Limitations of Algorithm Power
Multiple Sequence Alignment
Computational Genomics Lecture #3a
Presentation transcript:

Aligning Alignments Exactly John Kececioglu and Dean Starrett Proceedings of the eighth annual international conference on Computational molecular biology pp , 2004

Abstract A basic computational problem that arises in both the construction and local-search phases of the best heuristics for multiple sequence alignment is that of aligning the columns of two multiple alignments. When the scoring function is the sum-of-pairs objective and induced pairwise alignments are evaluated using linear gap-costs, we call this problem Aligning Alignments. While seemingly a straightforward extension of two-sequence alignment, we prove it is actually NP-complete.

Abstract (cont.) As explained in the paper, this provides the first demonstration that minimizing linear gap-costs, in the context of multiple sequence alignment, is inherently hard. We also develop an exact algorithm for Aligning Alignments that is remarkably efficient in practice, both in time and space. Even though the problem is NP-complete, computational experiments on both biological and simulated data show we can compute optimal alignments for all benchmark instances in two standard datasets, and solve very large random instances with highly-gapped sequences.

Definition: Input: –multiple alignments A and B –substitution cost function δ –gap initiation cost γ, and gap extension cost λ Output –An alignment of the columns of A versus the columns of B that minimizes the sum-of-pairs objective with linear gap costs.

The alignments A and B are viewed as two sequences. The problem is to align these two sequences by inserting, deleting, and substituting columns. The cost for a gap of length x is γ+ λx –Whenλ=0, it can be solved in polynomial time. –Whenλ>0, it is NP-complete.

Shape To evaluate the cost of an alignment with linear gap-costs we need to determine the number of gaps initiated by a column. A shape s for an alignment with k rows is an ordered partition s = (s 1, s 2, s 3, …, s p ) where 1 ≦ p ≦ k.

Example AC-TTC-- ACGTT--- -CATTC-- A-ACTCGG s = ( {2}, {1, 3}, {4} ) Furthermore we denote the flat shape by φ= ({1, 2, 3, …, k})

Recurrence S(i, j) is the set of shapes of all alignments of a prefix of the input. For a shape s and a column c, the shape obtained by concatenating c onto any alignment ending in s is denoted s ◦ c.

Time Complexity To align two alignments,each having k sequences and n columns, the exact algorithm takes worst-case time