Www.bioalgorithms.infoAn Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date: 2004.12.03.

Slides:



Advertisements
Similar presentations
Longest Common Subsequence
Advertisements

DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
Dynamic Programming: Sequence alignment
1 A simple, practical and complete O( n 3 /log(n))-time Algorithm for RNA folding using the Four-Russians Speedup Yelena Frid and Dan Gusfield Department.
Chapter 7 Dynamic Programming.
Outline The power of DNA Sequence Comparison The Change Problem
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Refining Edits and Alignments Υλικό βασισμένο στο κεφάλαιο 12 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University.
1 Reduction between Transitive Closure & Boolean Matrix Multiplication Presented by Rotem Mairon.
Sabegh Singh Virdi ASC Processor Group Computer Science Department
Dynamic Programming: Edit Distance
Dynamic Programming Reading Material: Chapter 7..
Space Efficient Alignment Algorithms and Affine Gap Penalties
Introduction to Bioinformatics Algorithms Dynamic Programming: Edit Distance.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Sequence Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
CSC401 – Analysis of Algorithms Lecture Notes 12 Dynamic Programming
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Introduction To Bioinformatics Tutorial 2. Local Alignment Tutorial 2.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
1 State Reduction: Row Matching Example 1, Section 14.3 is reworked, setting up enough states to remember the first three bits of every possible input.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Introduction to Bioinformatics Algorithms Multiple Alignment.
Class 2: Basic Sequence Alignment
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Case Study. DNA Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known.
Dynamic Programming I Definition of Dynamic Programming
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Dynamic Programming: Sequence alignment CS 466 Saurabh Sinha.
An Introduction to Bioinformatics 2. Comparing biological sequences: sequence alignment.
Dynamic Programming.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.
Dynamic Programming: Manhattan Tourist Problem Lecture 17.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Introduction to Bioinformatics Algorithms Divide & Conquer Algorithms.
The Manhattan Tourist Problem Shane Wood 4/29/08 CS 329E.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Dr Nazir A. Zafar Advanced Algorithms Analysis and Design Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar.
Divide & Conquer Algorithms
Bioinformatics: The pair-wise alignment problem
CSE 5290: Algorithms for Bioinformatics Fall 2011
Sequence Alignment Using Dynamic Programming
Sequence Alignment with Traceback on Reconfigurable Hardware
Intro to Alignment Algorithms: Global and Local
Dynamic Programming-- Longest Common Subsequence
Bioinformatics Algorithms and Data Structures
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

Introduction to Bioinformatics Algorithms Block Alignment and the Four-Russians Speedup Presenter: Yung-Hsing Peng Date:

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Outline Block Alignment Four-Russians Speedup Constructing LCS in Sub-quadratic Time

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Is It Possible to Align Sequences in Subquadratic Time? Dynamic Programming takes O(n 2 ) for global alignment Can we do better? Yes, use Four-Russians Speedup

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Blocks partition n n/tn/t n/tn/t t t n

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Block Alignment: Examples validinvalid

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Block Alignment Problem Goal: Find the longest block path through an edit graph Input: Two sequences, u and v partitioned into blocks of size t. This is equivalent to an n x n edit graph partitioned into t x t subgrids Output: The block alignment of u and v with the maximum score (longest block path through the edit graph

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Constructing Alignments within Blocks n/tn/t Block pair represented by each small square Solve mini-alignmnent problems

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Block Alignment: Dynamic Programming Let s i,j denote the optimal block alignment score between the first i blocks of u and first j blocks of v s i,j = max s i-1,j -  block s i,j-1 -  block s i-1,j-1 +  i,,j  block is the penalty for inserting or deleting an entire block  i,j is score of block in row i, column j.

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Block Alignment Runtime (cont ’ d) Computing all  i,j requires solving (n/t)*(n/t) mini block alignments, each of size (t*t) So computing  i,j takes O([n/t]*[n/t]*t*t) = O(n 2 ) time This is the same as dynamic programming How do we speed this up?

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Look-up Table for Four Russians Technique Lookup table “Score” AAAAAA AAAAAC AAAAAG AAAAAT AAAACA … AAAAAA AAAAAC AAAAAG AAAAAT AAAACA … each sequence has t nucleotides size is only n, instead of (n/t)*(n/t)

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info New Recurrency The new lookup table Score is indexed by a pair of t-nucleotide strings, so s i,j = max s i-1,j -  block s i,j-1 -  block s i-1,j-1 + Score(i th block of v, j th block of u

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Four Russians Technique Let t = log(n), where t is block size, n is sequence size. Instead of having (n/t)*(n/t) mini-alignments, construct 4 t x 4 t mini-alignments for all pairs of strings of t nucleotides (huge size), and put in a lookup table. However, size of lookup table is not rreally that huge if t is small. Let t = (logn)/4. Then 4 t x 4 t = n

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Four Russians Speedup Runtime Each access takes O(logn) time, because we have to transfer the little string into an index. Overall running time: O( [n 2 /t 2 ]*logn ) Since t = logn, substitute in: O( [n 2 /{logn} 2 ]*logn) > O( n 2 /logn )

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info So Far … We know how to speed up when we are solving block alignment In order to speed up the mini-alignment calculations to under n 2, we create a lookup table of size n, which consists of all scores for all t-nucleotide pairs Running time goes from quadratic, O(n 2 ), to subquadratic: O(n 2 /logn)

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Block Alignment vs. LCS Unlike the block partitioned graph, the LCS path does not have to pass through the vertices of the blocks. block alignment longest common subsequence

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Points of Interest block alignment has (n/t)*(n/t) = (n 2 /t 2 ) points of interest LCS alignment has O(n 2 /t) points of interest

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Traversing Blocks for LCS (cont ’ d) If we used this to compute the grid, it would take quadratic, O(n 2 ) time, but we want to do better. we know these scores we can calculate these scores t x t block

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Four Russians Speedup Build a lookup table for all possible values of the four variables: 1.subsequence of sequence u in this block (4 t possibilities) 2.subsequence of sequence v in this block (4 t possibilities) 3.all pairs of possible scores for the first row s *,j 4.all pairs of possible scores for the first column s *,j For each quadruple we store the value of the score for the last row and last column. This will be a huge table, but we can eliminate alignments scores that don’t make sense

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Efficient Encoding of Alignment Scores Instead of recording numbers that correspond to the index in the sequences u and v, we can use binary to encode the differences between the alignment scores original encoding binary encoding

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info An Example ----CTTCGATGA T T A C G T G C A

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info An Example ----CTTCGATGA T0/01001/0000 T011 A001 C110 G0/10111/10001/0 T001 G001 C000 A0/ /1

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Reducing Lookup Table Size 2 t possible scores (t = # blocks) 4 t possible strings Lookup table size is (2 t * 2 t )*(4 t * 4 t ) = 2 6t Let t = (logn)/4; Table size is: 2 6((logn)/4) = n (6/4) = n (3/2) Time = O( [n 2 /t 2 ]*logn ) O( [n 2 /{logn} 2 ]*logn) > O( n 2 /logn )

An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Summary We take advantage of the fact that for each block of t = log(n), we can pre-compute all possible scores and store them in a lookup table of size n (3/2) We used the Four Russian speedup to go from a quadratic running time for LCS to subquadratic running time: O(n 2 /logn)