Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007.

Slides:



Advertisements
Similar presentations
Dynamic Programming 25-Mar-17.
Advertisements

A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem Yufeng Wu and Dan Gusfield UC Davis COCOON07 July 16, 2007.
CS 336 March 19, 2012 Tandy Warnow.
Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Types of Algorithms.
Greedy Algorithms Amihood Amir Bar-Ilan University.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
MS 101: Algorithms Instructor Neelima Gupta
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
David Luebke 1 5/4/2015 CS 332: Algorithms Dynamic Programming Greedy Algorithms.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
15-May-15 Dynamic Programming. 2 Algorithm types Algorithm types we will consider include: Simple recursive algorithms Backtracking algorithms Divide.
1 Dynamic Programming Jose Rolim University of Geneva.
Lecture 6: Huffman Code Thinh Nguyen Oregon State University.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
The max flow problem
Sequence Alignment Variations Computing alignments using only O(m) space rather than O(mn) space. Computing alignments with bounded difference Exclusion.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
Near-Optimal Network Design with Selfish Agents By Elliot Anshelevich, Anirban Dasgupta, Eva Tardos, Tom Wexler STOC’03 Presented by Mustafa Suleyman CIFTCI.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Two Solutions in Search of Killer Apps. Dimacs workshop on Algorithms in Human Population Genomics Dan Gusfield UC Davis.
Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005.
Sequence Alignment II CIS 667 Spring Optimal Alignments So we know how to compute the similarity between two sequences  How do we construct an.
Inference of Genealogies for Recombinant SNP Sequences in Populations Yufeng Wu Computer Science and Engineering Department University of Connecticut
Backtracking.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover Recombination in Populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Fundamentals of Algorithms MCS - 2 Lecture # 7
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Introduction to Genetic Algorithms. Genetic Algorithms We’ve covered enough material that we can write programs that use genetic algorithms! –More advanced.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Meiotic Recombination (single-crossover) PrefixSuffix  Recombination is one of the principal evolutionary forces responsible for shaping genetic variation.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Donghyun (David) Kim Department of Mathematics and Computer Science North Carolina Central University 1 Chapter 7 Time Complexity Some slides are in courtesy.
Lecture 9COMPSCI.220.FS.T Lower Bound for Sorting Complexity Each algorithm that sorts by comparing only pairs of elements must use at least 
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
Introduction to NP Instructor: Neelima Gupta 1.
A Binary Linear Programming Formulation of the Graph Edit Distance Presented by Shihao Ji Duke University Machine Learning Group July 17, 2006 Authors:
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Dynamic Programming for the Edit Distance Problem.
Yufeng Wu and Dan Gusfield University of California, Davis
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
Branch and Bound.
Applied Combinatorics, 4th Ed. Alan Tucker
Dynamic Programming Merge Sort 1/18/ :45 AM Spring 2007
Searching for solutions: Genetic Algorithms
Dynamic Programming Merge Sort 5/23/2019 6:18 PM Spring 2008
Presentation transcript:

Improved Algorithms for Inferring the Minimum Mosaic of a Set of Recombinants Yufeng Wu and Dan Gusfield UC Davis CPM 2007

2 Recombination Recombination: one of the principle genetic forces shaping sequence variations within species. Two equal length sequences generate a new equal length sequence Prefix Suffix Breakpoint

Founders and Mosaic Current sequences are descendents of a small number of founders. –A current sequence is composed of blocks from the founders, due to recombination. –No mutations since formation of founders Breakpoint Founders Sampled sequences in current population Mosaic

4 The Minimum Mosaic Problem Given a set of aligned binary sequences in the current population and assume the number of founders is known to be K f, find set of founders and the mosaic with the minimum number of breakpoints Assume K f = Three Founders Four breakpoints: minimum for all possible three founders

5 Status of the Minimum Mosaic Problem First studied by E. Ukkonen (WABI 2002). –Dynamic programming method. Not practical when the number of rows is more than 20 and K f >2. No polynomial-time algorithm was known even when K f is small. No NP-completeness result is known. Our results: –A simple polynomial-time algorithm for K f = 2 case. –Exact and practical method for data of medium range for K f  3.

The Two-Founder Case Key: at columns 1 and 2, the founders are either or. There are two rows with 00/11, and three rows with 01/10. So, at least two breakpoints between columns 1 and 2 with founders as Founders Remove uniform columns 0?1?0?1? ? 1?  2 breakpoints between c1 and c  2 breakpoints between c2 and c3 Study pairs of neighboring columns

The Two-Founder Case (Cont.) No matter which founder states are chosen for previous column, we can always choose the needed founders for current column # breakpoints between two columns Local founders c1c2 c3c4 c5 c6 c7 At least = 11 breakpoints needed. On the other hand, we can construct two founders that use the same local optimal founders, and thus 11 breakpoints is global optimum. Founders

8 Three or More Founders: Assuming Known Founders Three Founders With known founders, can minimize breakpoints for each sequence, and thus also minimize the total number of breakpoints. For each input sequence, starting from the left, insert a breakpoint at the end of longest segments matching one founder. Founder mapping: at each position c in any input sequence s, which founder s[c] takes its value from. Breakpoint! Input Sequences Founder 1Founder 2 Founder Mapping

Enumerating Founders for Founder- Unknown Case In reality, founders are not known. A straightforward way is to simply enumerate all possible sets of founders, and then run the previous method to find the minimum mosaic At each column, there are 2 kf –2 founder settings. Let m be the number of columns, fully enumerate all possible sets of founders takes  (2 m*kf ) time. Infeasible when m or K f is large. Need more ideas to develop a practical method. First, we do the enumeration in the form of search paths in a search tree.

Search Paths and Search Tree It works but exponential blowup of the search paths! Obvious idea to reduce search space: branch and bound (compute a lower bound and …). But we found a different idea is more useful Founder setting at column one Num of tot. breakpoints up to current column c1 c c On-line computation: Compute partial solution up to the current column for speedup Founder settings up to column 3 The founder-known method can be run with partially-known founders! Assume three founders

Dropping Search Paths that are Beaten by Another Search Path P1 and P2 are two search paths up to column 2. Can we say P1 is better than P2? Not really, because maybe P2 can lead to fewer breakpoints later on. But, suppose the number of input sequences is 5. We can then say P1 beats P2 (and so drop P2). Why? P1 P2 <=39 <= 5 bkpts>= 0 bkpts An optimal search path following P2 40 Assume three founders Founder Config.

A More Powerful Beaten Rule P1 P2 Still five input rows. Now can not say P1 beats P2. But remember we have founder matching… MatchRows MatchRows So P1 beats P2 since at most 3 rows need extra breakpoints to get onto a path from P2, and P2 uses 4 more breakpoints than P1. These two rows have the same founder mappings. P1Row2 P2Row2 No extra breakpoints at rows 2 and 4 If no bkpt at P2, no bkpt at p1 too

How Practical Is Our Method? Source of data and image: UNC Chapel Hill Five founders 20 rows, 36 columns UNC’s heuristic solution: 54 breakpoints Enumerating founder states is impossible! Our method takes 5 minutes to find the optimal solutions: 53 breakpoints. It is also practical for 50x50 matrix with four founders.

14 Open Problems and Software Is the minimum mosaic problem NP- complete? Is there a polynomial-time algorithm for the minimum mosaic problem for small (say three to ten) number of founders? Software available at: Thank you.