Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.

Slides:



Advertisements
Similar presentations
Approximation algorithms for geometric intersection graphs.
Advertisements

Problems and Their Classes
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Bart Jansen 1.  Problem definition  Instance: Connected graph G, positive integer k  Question: Is there a spanning tree for G with at least k leaves?
Lower Bound for Sparse Euclidean Spanners Presented by- Deepak Kumar Gupta(Y6154), Nandan Kumar Dubey(Y6279), Vishal Agrawal(Y6541)
Interval Graph Test.
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
The Breakpoint Graph The Breakpoint Graph Augment with 0 = n
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Vertex Cover, Dominating set, Clique, Independent set
Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – ©Shai Lubliner.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements. Basic Biology: DNA Genetic information is stored in deoxyribonucleic acid (DNA) molecules. A single DNA molecule is a sequence.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
A Simpler 1.5-Approximation Algorithm for sorting by transposition Tzvika Hartman.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Approximation Algorithms
Sorting by Cuts, Joins and Whole Chromosome Duplications
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Andrew’s Leap 2011 Pancakes With A Problem Steven Rudich.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Conservation of Combinatorial Structures in Evolution Scenarios
Vertex Cover, Dominating set, Clique, Independent set
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Overview Biological background Definitions Unsigned Permutations  Approximation Algorithm Sorting Signed Permutations  Simplified Algorithm

What is the evolutionary path ? What is the ancestor chromosome? Chromosomes  lists of genes  permutation Unknown ancestor Human (X chrom.) Mouse (X chrom.)

Mutation at chromosome level  Inversion ( )  ( )  Transposition ( )  ( )  Translocation ( )  ( ) Inversions  Known as reversals  The most common  Most often reflect the differences between and within species What is the minimum number of reversals required to transform one perm. into another? Reversal distance  good approx. for evolutionary distance

, 2, 3, 4, 5, 6, 7, 8, 9, 10 Reversals Genes (blocks)

Reversals , 2, 3, 8, 7, 6, 5, 4, 9, 10

Reversals Breakpoints 1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Given a perm. , find a minimum length series of reversals  1, …,  t, such that  o  1 o  2 …. o  t = (1, 2, …, n) In 1997 A. Caprara proved that this problem is NP-hard Sorting by Reversals

Breakpoint  a pair of adjacent positions (i,i+1) s. t. |  i -  i+1 | ≠ 1 The values  i  i+1 are not consecutive If |  i -  i+1 | = 1 then the values  i  i+1 are adjacent Introduce  0 = 0,  n+1 = n+1   (0,1) breakpoint if  1 ≠ 1   (n,n+1) breakpoint if  n ≠ n A reversal affects the breakpoints only at its endpoints  Any reversal can remove or induce at most 2 bkpts.

Strip  A maximal run of increasing (decreasing) elements. Identity permutation has no breakpoints and any other permutation has at least one breakpoint Greedy  at each step remove the maximum number of breakpoints. Ф(  ) = number of breakpoints in  While( Ф(  ) > 0)  Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip) Greedy ends in at most Ф(  ) steps.

Quality of approximation Lemma1: Every permutation with a decreasing strip has a reversal that removes one breakpoint. Proof: consider the decreasing strip with  i being the smallest   i -1 must be in an increasing strip that lies to the left or right Breakpoint that will be removed

Lemma2:  has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips   has a reversal that removes two bkpts. Proof: consider the decreasing strip with  i being the smallest  increasing strip must be to the left.   i consider the decreasing strip with  j being the largest  decreasing strip containing  j +1 must be to the right.   j

Fact 1:  i and  j must overlap   j must lie in  i  if it doesn’t then  o  i has the decreasing strip that contains  j   i must lie in  j  if it doesn’t then  o  j has the decreasing strip that contains  i

Fact 2.  i =  j If  i -  j ≠ 0 then - if  i -  j contains an increasing strip   o  j has a decreasing strip - if  i -  j contains an decreasing strip   o  i has a decreasing strip Then  =  i =  removes 2 breakpoints.

Lemma 3: Greedy solves a permutation with a decreasing strip in at most Ф(  ) – 1 reversals Obs:  if  i has no decreasing strip  at step i-1 the reversal removed 2 bkpts.   we can use one reversal to create a decr. strip  exists a reversal that removes at least one bkpt Theorem1: Greedy sorts every permutation in at most Ф(  ) reversals.  If  has a decreasing strip  at most Ф(  ) -1 reversals  If  has no decreasing strip  every reversal induces a decreasing strip  after one step we can apply lemma3  at most Ф(  ) reversals

Corollary: Greedy is a 2-approximation algorithm  Every reversal removes at most 2 bkpts  OPT(  ) ≥ Ф(  ) /2 ≥ Greedy(  ) /2   Greedy(  ) ≤ 2* OPT(  ). Runtime  #of steps  O(n).  At each step we need to analyze reversals  O(n 2 ).  Total runtime = O(n 3 ).  analyze only reversals that remove bkpts  O(n 2 ).

Signed permutations:  reversals change the sign: (1,2,3,4,5,6,7,8,9,10)  (1,2,3,-8,-7,-6,-5,-4,9,10) Problem: Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm.  polynomial algorithm (Hannenhalli&Pevzner ’95)  relies on several intermediary constructions  these constructions have been simplified  first completely elementary treatment of the problem (Bergeron ’05)

Oriented pair  a pair of consecutive integers with different signs (0,3,1,6,5,-2,4,7)  o.p. (3,-2) and (1,-2). o.p.  reversals that create consecutive integers (3,-2) : (0,3,1,6,5,-2,4,7)  (0,3,2,-5,-6,-1,4,7) (1,-2) : (0,3,1,6,5,-2,4,7)  (0,3,-5,-6,-1,-2,4,7) Oriented reversal: reversal that creates consecutive integers Score of a reversal: # of oriented pairs it creates.

Algorithm1: As long as  has an oriented pair, choose the oriented reversal that has the maximal score.  output will be a permutation with positive elements.   0 and  n+1 are positive;  if there is a negative element there exists an o.p. Claim1: If Alg1 applies k reversals to , yielding  ’ then d(  ) = d(  ’) + k.

Sorting positive perms.:  - signed perm. with positive elements - circular order: 0 successor of n+1.  - reduced if it does not contain consecutive elements.  framed interval in  : i  j+1  j+2 …  j+k-1 i+k s.t. i <  j+1  j+2 …  j+k-1 < i+k ( )  hurdle a framed int. that contains no shorter framed int. ( )

Idea: create oriented pairs and then apply Algorithm1 Operations on Hurdles: Hurdle Cutting: i  j+1  j+2 …i+1…  j+k-1 i+k ( )  ( ) Hurdle Merging: i … i+k … i’ … i’…i’+k’ ( ) Simple hurdle  if cutting it decreases the # of hurdles Super hurdles  if cutting it increases the # of hurdles ( )

Algorithm2:  has 2k hurdles  merge any two non-consecutive hurdles  has 2k+1 hurdles  cut one simple hurdle (if it has none merge any two non-consecutive) Claim2: Alg1 + Alg2 optimally sort any signed perm.

Proof of claims:  breakpoint graph  1. each positive el x  2x-1,2x and each negative (-x)  2x,2x-1 ( ) ( ) arcs

Arcs  oriented if they span an odd # of elements Arc overlap graph:  Vertices -> arcs from breakpoint graph  Edges  arcs overlap

Every oriented vertex corresponds to an oriented pair. Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1. T= #oriented vertices. U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v Oriented component  if it contains an oriented v Safe reversal  does not create new unoriented components.

Theorem (Hannenhalli&Pevzner). Any sequence of oriented safe reversals is optimal. Theorem. An oriented reversal of maximal score is safe.  claim1 holds. Claim2 is proven in a similar manner.

J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory A. Caprara. Sorting by reversals is difficult S. Hannenhalli and Pavel Pevzner. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999