Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
School of CSE, Georgia Tech
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – ©Shai Lubliner.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
1 Michal Ozery-Flato and Ron Shamir 2 The Genomic Sorting Problem HOW?
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
1 Sorting by Transpositions Based on the First Increasing Substring Concept Advisor: Professor R.C.T. Lee Speaker: Ming-Chiang Chen.
Two Discrete Optimization Problems Problem #2: The Minimum Cost Spanning Tree Problem.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
A Linear-Time Algorithm for Computing Inversion Distance between signed Permutations with an experimental Study David Bader, Bernard Moret, Mi Yan Presented.
Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
1 A fast algorithm for Maximum Subset Matching Noga Alon & Raphael Yuster.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Simpler 1.5-Approximation Algorithm for sorting by transposition Tzvika Hartman.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Combinatorial Optimization Problems in Computational Biology Ion Mandoiu CSE Department.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
7. Lecture WS 2003/04Bioinformatics III1 Genome-scale evolution: multiple genome rearrangement, phylogeny based on whole genome sequence Material of this.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
15. Lecture WS 2004/05Bioinformatics III1 V15: genome rearrangement – current status * Genome comparison mouse – human: syntenic regions * Breakpoint analysis.
Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University
An Algorithm for the Consecutive Ones Property Claudio Eccher.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Conservation of Combinatorial Structures in Evolution Scenarios
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003

Comparative Genomics The practice of analyzing and comparing the genetic material of different species for the purpose of studying evolution, the function of genes and inherited diseases. Chromosome breakage and mistakes in repair, along with a number of other processes, give rise to changes in gene order. These have important consequences for the evolution of species.

Problem Definition During biological evolution, inter- and intra- chromosomal exchanges of chromosomal fragments disrupt the order of genes on a chromosome. The genome rearrangements approach, is the use of combinatorial optimization techniques, to infer a sequence of rearrangement events to account for the differences among the genomes.

Outline Problem definition Genome Comparison Possible chromosomal changes Sorting by reversals : - Previous work - Definitions - Duality Theorem Our technique :- Bit Vector Method - Experimental results : - Synthetic datasets - Real datasets - Breakpoints Technique Conclusions and Future work

Genome Comparison In the late 1980 was discovered remarkable and novel pattern of evolutionary change in plant organelles. Jeffrey Palmer and his collegues compared the mitochondrial genomes of cabbage and turnip, which are very closely related. Molecules which are almost identical in gene sequences, differ dramatically in gene order. {Sridhar, Pevzner 1995} This discovery and many other studies proved that genome rearrangements represent a common mode of molecular evolution.

Cabbage and Turnip Gene orientation

Single Chromosome Operations Reversal: A section of a chromosome is excised, reversed in orientation, and re-inserted. (abc 1 c 2 c 3 c 4 de -> ab-c 4 -c 3 -c 2 -c 1 de) Transposition: A section of a chromosome is excised and inserted at new position in the chromosome, without changing orientation. (abcd -> cdab) Inverted transposition: Exactly like transposition, except that the transposed segment changes orientation. (abcd -> -c-dab) Gene duplication: A section of a chromosome is duplicated, so that multiple copies exist of every gene in that section. (abc -> abcb, abc -> abbc) Gene loss: A section of a chromosome is excised and lost. (abc->ac )

Operations on 2 Chromosomes Translocation: The end of one chromosome is broken and attached to the end of another chromosome. Fusion: two chromosomes merge. Fission: one chromosome splits up into two chromosomes.

Genomic Sorting Problem Given genomes the genomic sorting problem is to find a series of reversals where and t is minimal. We call t the genomic distance between and

Sorting by Reversals Genome rearrangements can be modelled by a combinatorical problem of sorting by reversals. Break and Invert A T G C C T G T A C T A A T G A T G T C C C T A Reversal

Sorting by Reversals (Cont.) Minimum Sorting by Reversals: Given a permutation , what is the shortest sequence (  1  2….  t ) of reversals that sorts ?  Complexity remains open. (NP-Hard) {Caprara ‘97} Minimum Signed Sorting by Reversals: Given a signed permutation , what is the shortest sequence (  1  2….  t ) of reversals that sorts  ?  Solvable in polynomial time.

Sorting of Signed Permutations Transforming cabbage into turnip. {Hannenhalli, S., and Pevzner, P. ‘95} - Polynomial algorithm for sorting signed permutations by reversals A Very Elementray Presentation of the Hannenhalli-Pevzner Theory, {A. Bergeron’95} – Polynomial algorithm for sorting signed permutations, efficiently implemented using bit vectors. Experiments in Computing Sequences of Reversals, {A. Bergeron and F. Strasbourg’95} – Polynomial algorithm for sorting signed permutations. Fast Sorting by Reversal, {Berman, P., Hannenhalli, S. ‘96. }- exploit a few combinatorial properties of the cycle graph of a permutation and provided a polynomial algorithm. A Faster and Simpler Algorithm for Sorting Signed Permutations by Reversals, {Kaplan, H., Shamir, R., and Tarjan, R. ‘99.} – O(n 2 ) using hurdles, cycles and fortress. A Linear-Time Algorithm for Computing Inversion Distance between Signed Permutations with an Experimental Study, {Moret, and Yan’ 00} - Computes reversal distance (without actually sorting) in O(n) time. Computes the connected components using stack rather than Union-Find. {Hannenhalli- Pevzner ’96} (GRAPPA program) A Very Elementray Presentation of the Hannenhalli-Pevzner Theory, {A. Bergeron’95} – Polynomial algorithm for sorting signed permutations, efficiently implemented using bit vectors.

Outline Problem definition Genome Comparison Possible chromosomal changes Sorting by reversals : - Previous work - Definitions Our technique :- Bit Vector Method Experimental results : - Synthetic datasets - Real datasets - Breakpoints Technique Conclusions and Future work

What is a Permutation? Permutation (  ) : an ordered arrangement of the set { 1,2,…,n} Signed Permutation (  ): a permutation where the elements are oriented a reversal switches element orientation { }  (7,-5) = { }

Let i ~ j if | i – j | = 1. Extend permutation by adding = 0 and = n + 1. We call pair of elements, 0 ≤ i ≤ n, of an adjacency if ~ and a breakpoint if is not ( ~ ) BreakPoint ~ =0=n+1 ~

~ The breakpoint graph of a permutationis a edge-colored graphwith 2n+2 vertices We join vertices and by a black edge We join vertices and by a gray edge if What is breakpoint graph?

Breakpoint graph – signed case Straight edges – every other pair of consecutive elements Curved edges - every other pair of consecutive integers Every connected component of the graph is a cycle

Correlation between the breakpoints and reversal distance Correlations exists between the reversal distance and the number of breakpoints Sorting by reversals corresponds to eliminating breakpoints Every resersal can eliminate at most 2 breakpoints {Shamir, 95}

Outline Problem definition Genome Comparison Possible chromosomal changes Sorting by reversals : - Previous work - Definitions - Duality Theorem (Hurdles !!) Our technique :- Vector-Method Experimental results : - Synthetic datasets - Real datasets -Breakpoints Technique Conclusions and Future work

Hurdle Hurdle - an unoriented component whose elements are consecutive Simple hurdle - a hurdle whose deletion decreases the number of hurdles Super hurdles - hurdles that are not simple

Duality Theorem for Sorting Signed Permutations Hannenhalli and Pevzner, For every signed permutation = if is a fortress otherwise

Safe reversal For an arbitary reversal Reversal is safe if C=3, h=1 C = 5, h= 2

Outline Problem definition Genome Comparison Possible chromosomal changes Sorting by reversals : - Previous work - Definitions - Duality Theorem (Hurdles !!) Our technique :- Bit Vector Method Experimental results : - Synthetic datasets - Real datasets - Breakpoints Technique Conclusions and Future work

Our Approach Finding hurdles and fortresses in a graph are difficult and expensive {Kaplan, H., Shamir, R., and Tarjan, R. ‘99.} Use oriented sort to remove the oriented components in a graph and then apply the breakpoint approach to perform the remaining reversals We used the bit-vector approach to perform the oriented sort

Oriented Sort Choose among the several candidates, a safe reversal, that is a reversal that decreases the reversal distance. Theorem : The reversal that maximizes the number of oriented vertices is safe {A. Bergeron’95}

Basic Sorting – oriented pair An oriented pair is a pair of consecutive integers, that is with opposite signs Example: ( ) Oriented pairs are: (1,-2), (3, -4)

Reversalscore Reversal score The number of oriented pairs in the resulting permutation as a result of a reversal Example: ( ) (3, -2) (1, -2) ( ) ( ) Score 4Score 2 ( )

Algorithm As long as has an oriented pair choose the oriented reversal that has maximal score ( –2 4 7) ( ) (-3, 4) ( ) (-1,2) ( ) (-6,7) ( ) (-5,6) ( )

Orientededge Oriented edge Letbe a gray edge incident toblack edges andThen is oriented if and only if i – k = j - l Edge is oriented (contains 3 [odd] number of vertices). I= 20, j=21, k=22, l=23 I-k = -2 = j-l = -2 Bergeron Pevzner.

Oriented reversals, if, and, if Reversals that create consecutive integers are always induced by oriented pairs. Such reversals are called oriented reversal. Reversals induces by an oriented pair will be: Example: The pair (1, -2) induces the reversal: ( –2 4 7) ( –5 –6 4 7)

Interleaving Graph Every 2 components are adjacent if there is an overlap between them but neither of them contains the other. C

Constructing the Bit Matrix Consider the sequence P = –2 4 7 Represent P i by 2i-1, 2i if P i is +ve and 2i-1, 2i if P i is +ve and 2i, 2i-1 otherwise P i is -ve 2i, 2i-1 otherwise P i is -ve Bit Matrix Parity Scores

The Algorithm Step 1. Select the vertex v i with the maximum score and perform the these operations until we reach a situation when parity of all the vertices is zero Step 2. If the sequence is not sorted completely apply the breakpoint technique to complete the sorting

Outline Problem definition Genome Comparison Possible chromosomal changes Sorting by reversals : - Previous work - Definitions - Duality Theorem (Hurdles !!) Our technique :- Bit Vector Method Experimental results : - Synthetic datasets - Real datasets - Breakpoints Technique Conclusions and Future work

Experimental Settings 1- Synthetic Datasets:  generated random signed permutation of different lengths and evolution rate using GRIMM permutation generation module 2- Real Datasets:  Used GRAPPA test sets for different species of “Campanulaceae” (flower plant)  MGR (multiple genome rearrangement) human-mouse gene order data  Genome.org Herpes Virus that affects human

Experiment 1 - Synthetic 1- Generated files of random permutations of different lengths (50, 100, 200, 400, 800, 1600) each file with 50 permutations. 2- We computed the number of correctly sorted permutations. 3- Evolution rate varies : 20,30,40

Experiment 2 - Synthetic 1- Generated files of random permutations of different lengths (50, 100, 200, 400, 800, 1600) each file with 50 permutations. 2- We computed the time needed to obtain the correctly sorted permutations. 3- Evolution rate varies : 20,30,40

Experiment 3 - Synthetic 1- Generated files of random permutations of length We computed the time needed to obtain the correctly sorted permutations. 3- Evolution rate varies in increments of 100. Observation: Saturation state is reached as evolution rate approaches 1000

Experiment 1 - Real Considered Herpes simplex virus (HSV), Epstein-Barr virus (EBV), and Cytomegalovirus (CMV) gene orders (Hannenhalli et al. 1995) as well as the identity gene order (A) Observations:Our reversal results matched those obtained in optimal evolutionary scenario recovered by MGR-MEDIAN.

Experiment 2 - Real 1- Considered Campanulaceae species 2- Obtained reversals for Cyanathus (11 reversals), Triodanus (13 reversals), and Symphanra (12 reversals) versus Tobacco but failed to sort Platyncodon, Legousia and Codonopsis Observation: The ones we sorted were sorted with same number of reversals as GRIMM

Experiment 3 - Real 1- Considered Human-Mouse gene order from MGR (mouse genome and human is identity) 40 reversals reversals Identity GRIMM sorts the permutation in 41 reversals

Conclusions We implemented a technique that integrates the bit-matrix oriented sorting technique together with the greedy breakpoint reversal technique. The technique proposed was tested on both real and synthetic data and was able to sort signed permutations in a fair number of the test data We think that such integration can yield good results beside being a simple and relatively fast technique However, the oriented sort algorithm fails to sort permutations that have hurdles, in those cases we have to apply the breakpoint approach

Future Work We really think that the technique we implemented can provide good results, we think that further experiments can strengthen our claim We started implementing the algorithm proposed in Kaplan, H., Shamir, R., and Tarjan R. ’99} but didn’t succeed to complete the implementation. We think that having this technique implemented under that same conditions as ours can provide a good source of comparative results, and can give a better confidence about what we propose. Applying the technique in different datasets including exon order rather than gene order Considering different species and trying to compute reversal distance and use it to confirm phylogenetic trees

Oriented Pairs An oriented pair (, ) is a pair of consecutive integers, that isAn oriented pair (, ) is a pair of consecutive integers, that is with opposite signs Example: ( –2 4 7) Oriented pairs are Oriented pairs are = (0 … ) (1,-2)(3, 2)

Reversal Distance Estimation This reversal distance is very in-accurate. Bafna and Pevzner, 1996 showed that another hidden parameter ”hurdles” estimated reversal distance with much greater accuracy.

Proper reversal For every permutation and reversal Given an arbitary reversal denote (increase in the size of cycle decomposition) Then for every permutationand reversal We call reversal proper if = 1

Orientedpairs Oriented pairs Oriented pairs are useful because they indicate reversals that create consecutive elements of the permutation. Example: The pair (1, -2) induces the reversal: ( –2 4 7) ( –5 –6 4 7)