1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Locating conserved genes in whole genome scale Prudence Wong University of Liverpool June 2005 joint work with HL Chan, TW Lam, HF Ting, SM Yiu (HKU),
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Lecture 4 Divide and Conquer for Nearest Neighbor Problem
UMass Lowell Computer Science Analysis of Algorithms Spring, 2002 Chapter 5 Lecture Randomized Algorithms Sections 5.1 – 5.3 source: textbook.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Parallel Sorting Algorithms Comparison Sorts if (A>B) { temp=A; A=B; B=temp; } Potential Speed-up –Optimal Comparison Sort: O(N lg N) –Optimal Parallel.
Advanced Topics in Algorithms and Data Structures Lecture 6.1 – pg 1 An overview of lecture 6 A parallel search algorithm A parallel merging algorithm.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Greedy Algorithms And Genome Rearrangements
Heuristic alignment algorithms and cost matrices
Design of Optimal Multiple Spaced Seeds for Homology Search Jinbo Xu School of Computer Science, University of Waterloo Joint work with D. Brown, M. Li.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Sorting Algorithms CS 524 – High-Performance Computing.
Prune-and-search Strategy
2 -1 Analysis of algorithms Best case: easiest Worst case Average case: hardest.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Chapter 14 Genetic Algorithms.
Genomic Sorting with Length-Weighted Intervals Seminar in Bioinformatics Advanced Algorithms in Computational Biology Spring 2005, Technion Asaf.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
The Complexity of Algorithms and the Lower Bounds of Problems
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting CS 202 – Fundamental Structures of Computer Science II Bilkent.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Semester Project: Greedy Algorithms and Genome Rearrangements August/17/2012 Name: Xuanyu Hu Professor: Elise de Doncker.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Sorting by Cuts, Joins and Whole Chromosome Duplications
1 Chapter 14 Genetic Algorithms. 2 Chapter 14 Contents (1) l Representation l The Algorithm l Fitness l Crossover l Mutation l Termination Criteria l.
1Computer Sciences Department. Book: Introduction to Algorithms, by: Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein Electronic:
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
1 Prune-and-Search Method 2012/10/30. A simple example: Binary search sorted sequence : (search 9) step 1  step 2  step 3  Binary search.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
Lecture 2 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Tumor Genomes Compromised genome stability Mutation and selection Chromosomal aberrations –Structural: translocations, inversions, fissions, fusions. –Copy.
Algorithm Design and Analysis (ADA)
Reconstructing the Evolutionary History of Complex Human Gene Clusters
Genome Rearrangement and Duplication Distance
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
Mattew Mazowita, Lani Haque, and David Sankoff
CS 583 Analysis of Algorithms
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Double Cut and Join with Insertions and Deletions
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Rearrangement Phylogeny of Genomes in Contig form
Presentation transcript:

1 Computer Science Department Technion – Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena SUNY Stony Brook

2 Genome Rearrangement events –duplication –translocation –reversal (inversion) occur primarily during reproduction allow large-scale genomic comparisons

3 Sorting by Reversals genome represented as a permutation on 1, 2, …, n – n = # homologous genes among species assumptions –can identify genes –genes are distinct operation: reversal of a subsequence (of genes) –models inversion (occurs during crossover) one of the permutations can be 1, 2, …, n –appropriately relabel others

4 6 reversal in our model (for f(l) = l ): cost = 18 Example

5 Our Model unsigned cost of reversal of subsequence of length l is f(l) total sorting cost (or distance) is f (length(s j ))  Sj are reversed subsequences

6 Cost Functions additive f(x+y) = f(x) + f(y) subadditive f(x+y) < f(x) + f(y) superadditive f(x+y) > f(x) + f(y) other –e.g. bitonic f(l)

7 Problems algorithm to sort any permutation –worst-case min cost approximate min cost for a given permutation

8 Extremal Costs highly subadditive: e.g. unit cost, f(l) = 1 –NP complete [Caprara, ’97] –series of approximation ratios: 2, 1.75, highly superadditive: f(l) > l 2 –essentially bubblesort

9 Our Results additive cost function –specifically f(l) = l QuickSort-like algorithm for worst-case –complexity: O(n lg 2 n) min cost approximation ratio of O(lg 2 n)

10 MedianEject(a,b) find r maximal blocks of wrong-sided elements with respect to median for lg r do:flip every other pair of blocks of wrong-sided and adjacent blocks move wrong-sided blocks to median boundary reverse left and right blocks

11 complexity: O((b-a) lg r) Sample Run

12 ReversalSort(a,b) MedianEject (a,b); ReversalSort (a, ); ReversalSort (,b); Complexity T(n) = 2  T ( ) + O(f(n) lg n) O(f(n)lg 2 n) = O(n lg 2 n) for f(n)~n 2 n

13 Algorithmic Improvements Isimplify “short” phases IImerge 2 last steps of MedianEject when possible ( 2p+q vs. 3p+q ) IIIapply II recursively pqp

14 Approximation Ratio M(p) is the maximal total distance between pairs of out-of order elements Lemma 4:min cost is  (M(p)) but Lemma 6: # of out-of order elts < 3  M(p) + Lemma 7:MedianEject touches only elements within linear range from out-of-order elements yields: each round of MedianEject takes O(M(p)  lg 2 n) ReversalSort costs O(M(p)  lg 2 n) ReversalSort is at most O((lg 2 n) times optimal

15 use our cost (= distance) to build phylogenetic trees 4 plants (chloroplastic genes) consistent with [Martin et al., PNAS Sept ‘02] work in progress [M. Shoham] Bioinformatic “Validation” CyanophoraCyanidiumGuilardiaPorphyra

16 weighted genes tighter approximation ratio –close to O(lg n) –can get to constant? other cost functions (incl. bitonic) the signed case Open Problems: Algorithmic

17 chromosomal ordering what is the right cost function? –consider cost (l) = l d combine with constant-based models –restricted regions –“undesired” reversal sequences deal with duplication and translocation events Open Problems: Modeling