Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena.

Similar presentations


Presentation on theme: "1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena."— Presentation transcript:

1 1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena SUNY Stony Brook

2 2 Genome Rearrangement events –duplication –translocation –reversal (inversion) occur primarily during reproduction allow large-scale genomic comparisons

3 3 Sorting by Reversals genome represented as a permutation on 1, 2, …, n – n = # homologous genes among species assumptions –can identify genes –genes are distinct operation: reversal of a subsequence (of genes) –models inversion (occurs during crossover) one of the permutations can be 1, 2, …, n –appropriately relabel others

4 4 6 reversal in our model (for f(l) = l ): cost = 18 Example 4328715611109 432178569 11 1234876591011 1234567891011

5 5 Our Model unsigned cost of reversal of subsequence of length l is f(l) total sorting cost (or distance) is f (length(s j ))  Sj are reversed subsequences

6 6 Cost Functions additive f(x+y) = f(x) + f(y) subadditive f(x+y) < f(x) + f(y) superadditive f(x+y) > f(x) + f(y) other –e.g. bitonic f(l)

7 7 Problems algorithm to sort any permutation –worst-case min cost approximate min cost for a given permutation

8 8 Extremal Costs highly subadditive: e.g. unit cost, f(l) = 1 –NP complete [Caprara, ’97] –series of approximation ratios: 2, 1.75, 1.375 highly superadditive: f(l) > l 2 –essentially bubblesort

9 9 Our Results additive cost function –specifically f(l) = l QuickSort-like algorithm for worst-case –complexity: O(n lg 2 n) min cost approximation ratio of O(lg 2 n)

10 10 MedianEject(a,b) find r maximal blocks of wrong-sided elements with respect to median for lg r do:flip every other pair of blocks of wrong-sided and adjacent blocks move wrong-sided blocks to median boundary reverse left and right blocks

11 11 complexity: O((b-a) lg r) Sample Run

12 12 ReversalSort(a,b) MedianEject (a,b); ReversalSort (a, ); ReversalSort (,b); Complexity T(n) = 2  T ( ) + O(f(n) lg n) O(f(n)lg 2 n) = O(n lg 2 n) for f(n)~n 2 n

13 13 Algorithmic Improvements Isimplify “short” phases IImerge 2 last steps of MedianEject when possible ( 2p+q vs. 3p+q ) IIIapply II recursively pqp

14 14 Approximation Ratio M(p) is the maximal total distance between pairs of out-of order elements Lemma 4:min cost is  (M(p)) but Lemma 6: # of out-of order elts < 3  M(p) + Lemma 7:MedianEject touches only elements within linear range from out-of-order elements yields: each round of MedianEject takes O(M(p)  lg 2 n) ReversalSort costs O(M(p)  lg 2 n) ReversalSort is at most O((lg 2 n) times optimal

15 15 use our cost (= distance) to build phylogenetic trees 4 plants (chloroplastic genes) consistent with [Martin et al., PNAS Sept ‘02] work in progress [M. Shoham] Bioinformatic “Validation” CyanophoraCyanidiumGuilardiaPorphyra

16 16 weighted genes tighter approximation ratio –close to O(lg n) –can get to constant? other cost functions (incl. bitonic) the signed case Open Problems: Algorithmic

17 17 chromosomal ordering what is the right cost function? –consider cost (l) = l d combine with constant-based models –restricted regions –“undesired” reversal sequences deal with duplication and translocation events Open Problems: Modeling


Download ppt "1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena."

Similar presentations


Ads by Google