 # Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.

## Presentation on theme: "Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering."— Presentation transcript:

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering

Overview Biological background Definitions Unsigned Permutations  Approximation Algorithm Sorting Signed Permutations  Simplified Algorithm

What is the evolutionary path ? What is the ancestor chromosome? Chromosomes  lists of genes  permutation Unknown ancestor Human (X chrom.) Mouse (X chrom.)

Mutation at chromosome level  Inversion (1 2 3 4 5 6 7)  (1 4 3 2 5 6 7)  Transposition (1 2 3 4 5 6 7)  (1 5 6 2 3 4 7)  Translocation (1 2 3 4 5 6 7)  (1 2 3 4 5 2 3 4 6 7) Inversions  Known as reversals  The most common  Most often reflect the differences between and within species What is the minimum number of reversals required to transform one perm. into another? Reversal distance  good approx. for evolutionary distance

1 32 4 10 5 6 8 9 7 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Reversals Genes (blocks)

Reversals 1 32 4 10 5 6 8 9 7 1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Reversals 1 32 4 10 5 6 8 9 7 Breakpoints 1, 2, 3, 8, 7, 6, 5, 4, 9, 10

Given a perm. , find a minimum length series of reversals  1, …,  t, such that  o  1 o  2 …. o  t = (1, 2, …, n) In 1997 A. Caprara proved that this problem is NP-hard Sorting by Reversals

Breakpoint  a pair of adjacent positions (i,i+1) s. t. |  i -  i+1 | ≠ 1 The values  i  i+1 are not consecutive If |  i -  i+1 | = 1 then the values  i  i+1 are adjacent Introduce  0 = 0,  n+1 = n+1   (0,1) breakpoint if  1 ≠ 1   (n,n+1) breakpoint if  n ≠ n A reversal affects the breakpoints only at its endpoints  Any reversal can remove or induce at most 2 bkpts.

Strip  A maximal run of increasing (decreasing) elements. Identity permutation has no breakpoints and any other permutation has at least one breakpoint Greedy  at each step remove the maximum number of breakpoints. Ф(  ) = number of breakpoints in  While( Ф(  ) > 0)  Choose a reversal that removes the maximum number of breakpoints. (if there is a tie favor the reversal that leaves a decreasing strip) Greedy ends in at most Ф(  ) steps.

Quality of approximation Lemma1: Every permutation with a decreasing strip has a reversal that removes one breakpoint. Proof: consider the decreasing strip with  i being the smallest   i -1 must be in an increasing strip that lies to the left or right Breakpoint that will be removed

Lemma2:  has a decreasing strip. If every reversal that removes one bkpt leaves a permutation with no decreasing strips   has a reversal that removes two bkpts. Proof: consider the decreasing strip with  i being the smallest  increasing strip must be to the left.   i consider the decreasing strip with  j being the largest  decreasing strip containing  j +1 must be to the right.   j

Fact 1:  i and  j must overlap   j must lie in  i  if it doesn’t then  o  i has the decreasing strip that contains  j   i must lie in  j  if it doesn’t then  o  j has the decreasing strip that contains  i

Fact 2.  i =  j If  i -  j ≠ 0 then - if  i -  j contains an increasing strip   o  j has a decreasing strip - if  i -  j contains an decreasing strip   o  i has a decreasing strip Then  =  i =  removes 2 breakpoints.

Lemma 3: Greedy solves a permutation with a decreasing strip in at most Ф(  ) – 1 reversals Obs:  if  i has no decreasing strip  at step i-1 the reversal removed 2 bkpts.   we can use one reversal to create a decr. strip  exists a reversal that removes at least one bkpt Theorem1: Greedy sorts every permutation in at most Ф(  ) reversals.  If  has a decreasing strip  at most Ф(  ) -1 reversals  If  has no decreasing strip  every reversal induces a decreasing strip  after one step we can apply lemma3  at most Ф(  ) reversals

Corollary: Greedy is a 2-approximation algorithm  Every reversal removes at most 2 bkpts  OPT(  ) ≥ Ф(  ) /2 ≥ Greedy(  ) /2   Greedy(  ) ≤ 2* OPT(  ). Runtime  #of steps  O(n).  At each step we need to analyze reversals  O(n 2 ).  Total runtime = O(n 3 ).  analyze only reversals that remove bkpts  O(n 2 ).

Signed permutations:  reversals change the sign: (1,2,3,4,5,6,7,8,9,10)  (1,2,3,-8,-7,-6,-5,-4,9,10) Problem: Given a signed perm., find the minimum length series of reversals that transforms it into the identity perm.  polynomial algorithm (Hannenhalli&Pevzner ’95)  relies on several intermediary constructions  these constructions have been simplified  first completely elementary treatment of the problem (Bergeron ’05)

Oriented pair  a pair of consecutive integers with different signs (0,3,1,6,5,-2,4,7)  o.p. (3,-2) and (1,-2). o.p.  reversals that create consecutive integers (3,-2) : (0,3,1,6,5,-2,4,7)  (0,3,2,-5,-6,-1,4,7) (1,-2) : (0,3,1,6,5,-2,4,7)  (0,3,-5,-6,-1,-2,4,7) Oriented reversal: reversal that creates consecutive integers Score of a reversal: # of oriented pairs it creates.

Algorithm1: As long as  has an oriented pair, choose the oriented reversal that has the maximal score.  output will be a permutation with positive elements.   0 and  n+1 are positive;  if there is a negative element there exists an o.p. Claim1: If Alg1 applies k reversals to , yielding  ’ then d(  ) = d(  ’) + k.

Sorting positive perms.:  - signed perm. with positive elements - circular order: 0 successor of n+1.  - reduced if it does not contain consecutive elements.  framed interval in  : i  j+1  j+2 …  j+k-1 i+k s.t. i <  j+1  j+2 …  j+k-1 < i+k (0 2 5 4 3 6 1 7 )  hurdle a framed int. that contains no shorter framed int. (0 2 5 4 3 6 1 7 )

Idea: create oriented pairs and then apply Algorithm1 Operations on Hurdles: Hurdle Cutting: i  j+1  j+2 …i+1…  j+k-1 i+k (0 1 4 3 2 5 )  (0 -3 -4 -1 2 5 ) Hurdle Merging: i … i+k … i’ … i’…i’+k’ (0 2 5 4 3 6 1 7) Simple hurdle  if cutting it decreases the # of hurdles Super hurdles  if cutting it increases the # of hurdles (0 2 5 4 3 -6 1 7 )

Algorithm2:  has 2k hurdles  merge any two non-consecutive hurdles  has 2k+1 hurdles  cut one simple hurdle (if it has none merge any two non-consecutive) Claim2: Alg1 + Alg2 optimally sort any signed perm.

Proof of claims:  breakpoint graph  1. each positive el x  2x-1,2x and each negative (-x)  2x,2x-1 (0 -1 3 5 4 6 -2 7) (0 2 1 5 6 9 10 7 8 11 12 4 3 13 ) arcs

Arcs  oriented if they span an odd # of elements Arc overlap graph:  Vertices -> arcs from breakpoint graph  Edges  arcs overlap

Every oriented vertex corresponds to an oriented pair. Fact2: Score of an oriented reversal (oriented vertex v) is T+U-O+1. T= #oriented vertices. U= #unoriented vertices adjacent to v O= #oriented vertices adjacent to v Oriented component  if it contains an oriented v Safe reversal  does not create new unoriented components.

Theorem (Hannenhalli&Pevzner). Any sequence of oriented safe reversals is optimal. Theorem. An oriented reversal of maximal score is safe.  claim1 holds. Claim2 is proven in a similar manner.

J. Kececioglu and D. Sankoff. Exact and approximation algorithms for sorting by reversals, with application to genome rearrangement. 1995. A. Bergeron. A very elementary presentation of the Hannenhalli-Pevzner Theory. 2005 A. Caprara. Sorting by reversals is difficult. 1997 S. Hannenhalli and Pavel Pevzner. Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. 1999

Download ppt "Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering."

Similar presentations