Genome Rearrangements CIS 667 April 13, 2004. Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.

Slides:



Advertisements
Similar presentations
Goal: a graph representation of the topology of a gray scale image. The graph represents the hierarchy of the lower and upper level sets of the gray level.
Advertisements

CSE 211 Discrete Mathematics
CSE 211 Discrete Mathematics
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Lecture 5 Graph Theory. Graphs Graphs are the most useful model with computer science such as logical design, formal languages, communication network,
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Introduction to Graphs
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Phylogenetic reconstruction
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Greedy Algorithms And Genome Rearrangements
Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – ©Shai Lubliner.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Math Foundations Week 12 Graphs (2). Agenda Paths Connectivity Euler paths Hamilton paths 2.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
Chapter 4: Straight Line Drawing Ronald Kieft. Contents Introduction Algorithm 1: Shift Method Algorithm 2: Realizer Method Other parts of chapter 4 Questions?
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
MCA 520: Graph Theory Instructor Neelima Gupta
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
1 Combinatorial Problem. 2 Graph Partition Undirected graph G=(V,E) V=V1  V2, V1  V2=  minimize the number of edges connect V1 and V2.
EMIS 8374 Optimal Trees updated 25 April slide 1 Minimum Spanning Tree (MST) Input –A (simple) graph G = (V,E) –Edge cost c ij for each edge e 
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Connectivity and Paths 報告人:林清池. Connectivity A separating set of a graph G is a set such that G-S has more than one component. The connectivity of G,
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Introduction to Graphs. This Lecture In this part we will study some basic graph theory. Graph is a useful concept to model many problems in computer.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Chapter 9: Graphs.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
1 GRAPH Learning Outcomes Students should be able to: Explain basic terminology of a graph Identify Euler and Hamiltonian cycle Represent graphs using.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
Çizge Algoritmaları.
Graph theory Definitions Trees, cycles, directed graphs.
Chapter 5. Optimal Matchings
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
Double Cut and Join with Insertions and Deletions
Greedy Algorithms And Genome Rearrangements
Concepts of Computation
Presentation transcript:

Genome Rearrangements CIS 667 April 13, 2004

Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary relations among species  Differences in sequences in (one or more) genes resulted from point mutations (insert, delete, substitute)  These are not the only type of changes that can occur in the genome

Genome Rearrangements Repair of broken chromosomes is an important process  Mistakes can occur, however Mistakes can also occur during crossover These mistakes cause changes in gene order  A large piece of chromosome can be moved or copied to another location  It can also move from one chromosome to another  We call these movements genome rearrangments

Crossover

Chromosome Repair

Genome Rearrangements These have important (usually fatal) consequences for the organism and its evolution Alignments do not capture genome rearrangments  Two species may have nearly the same gene sequences, but in a different order (why would the two species then be different?)

Genome Rearrangements We need some other way to compare entire genomes (i.e. compare at a higher level) Rather than simple point mutations a genome is obtained from another by a number of a special kind of rearrangements: Reversals  Use the number of reversals needed to transform one genome into another to measure evolutionary distance

The Method Use combinatorial optimization techniques in an attempt to infer a most economical sequence of rearrangement operations to account for differences among the genomes  Compare with character-based methods for phylogenetics (parsimony)

Reversals Consider the genome of a species as a sequence of blocks  A block is some sequence of the genome (possibly containing more than one gene) transcribed as a unit  Blocks are oriented since they can be transcribed from either strand of DNA  Give homologous blocks the same label

Reversals Relation between chloroplast genomes of alfalfa and garden pea:

Reversals Reversal operation for oriented blocks:  Inverts the order of affected blocks and changes their orientation (arrow)  Affects a contiguous segment of blocks What sequence of reversal operations could have changed alfalfa into garden pea?  Would like to have a polynomial time algorithm to find the shortest sequence

Genome Comparison vs. Gene Comparison In the late 1980s, J. Palmer and his colleagues studied the mitochondrial genomes of cabbage and turnips  The gene sequences are very similar (some genes are 99% equal)  Gene order, however, differs dramatically  Genome rearrangements are now considered to be a common mode of molecular evolution

Genome Comparison vs. Gene Comparison Extreme conservation of genes on X chromosomes across mammalian species provides an opportunity to study the evolutionary history of X chromosome independently of the rest of the genomes According to Ohno´s law, the gene content of X chromosome has barely changed throughout mammalian development in the last 125 million years. However, the order of genes on X chromosomes has been disrupted several times.

Human and Mouse X Chromosomes

Genome Comparison vs. Gene Comparison The traditional molecular evolutionary technique is a gene comparison to construct a phylogenetic tree In the ”cabbage and turnip” case this is hardly suitable, since rate of point mutations in their mitochondrial genes is so low that their genes are almost identical Genome comparison (i.e. comparison of gene orders) is the method of choice in the case of very slowly evolving genomes Another area is the case where genomes evolve very rapidly (genes not very similar)

Genome Comparison Only about (178  39) genome rearrangements have happened since human and mouse diverged 80 million years ago  Mouse and human genomes can be viewed as a collection of about 200 fragments which are shuffled in mice as compared to humans  A comparative mouse-human genetic map gives the position of a human gene given the location of a related mouse gene

Man-Mouse Comparative Physical Map

Definitions A signed permutation  over the set of labels L = {1, 2, …, n} is a permutation such that  (i) = +  or – , where a  L Example: +3, –2, –1 is a signed permutation over L = {1, 2, 3}  Note that no label may appear twice in the permutation A reversal [i,j] is an operation that transforms one signed permutation into another, reversing the order or a contiguous portion and flipping the signs

Definitions  ’ =  [i,j] =  (1), …,  (i – 1), –  (j), …, –  (i),  (j + 1), …,  (n) We are interested in the problem of sorting by reversals: Given two signed permutations  and , find the minimum number of reversals  1, …,  t that will transform  into  1 …  t =  The reversal distance d   = t

Definitions Note that the reversal operation does not directly correspond to the biological operations (inversion, translocation, fission, fusion) Given  and , can we always transform  into  using only the reversal operation? If so, how many reversals are required in the worst case?

Breakpoints A breakpoint is a point between consecutive labels in the initial permutation that must necessarily be separated by at least one reversal to reach the target permutation  The two consecutive labels are not consecutive in the target, or their orientations are not the same in a relative sense

Breakpoints To formalize the idea of breakpoint, we introduce the extended version of  Let  =  (1), …,  (n) Then the extended version of  is (L,  (1), …,  (n), R) For example let extended  be (L, –2, –3, +1, +6, –5, –4, R) and let extended  be (L, +1, +2, +3, +4, +5, +6, R) The breakpoints are: (L,–2), (–2,–3), (–3,+1), (+1,+6), (6,–5), (–4,R)

Breakpoints The number of breakpoints of a permutation  is denoted by b(  )  In the example, = 6 Can you characterize the situations where L is involved in a breakpoint? When R is involved in a breakpoint?

A Lower Bound A reversal can remove at most two breakpoints  Cuts the permutation in exactly two places  So, if    …  t  then  b(  ) – b(   )  2  b(   ) – b(     )  2  …  b(   …   ) – b(   …   )  2  So b(  )  2t. If t = d(  ), b(  )/2  d(  )

Reality and Desire Diagram The lower bound found is not very tight We can derive a better l.b. based on a structure called the reality-desire diagram of a permutation with respect to another To draw the diagram, we will represent +a with the tuple (-a +a) and -a with the tuple (+a -a)  The orientation is given by the rightmost member of the tuple

Reality and Desire Diagram A permutation is a sequence of adjacent tuples:  +3, –2, –1, +4, –5 can be represented as: L---(–3 +3)---(+2 –2)---(+1 –1)---(–4 +4)---(+5 –5)---R  L---(–1 +1)---(–2 +2)---(–3 +3)---(–4 +4)---(– 5 +5)---R

Reality and Desire Diagram Now we will draw a graph to represent  (L, +3, -2, -1, +4, -5, R)  The reality diagram: L R

Reality and Desire Diagram Suppose that  is the identity (L, +1, +2, +3, +4, +5)  We will add desire edges to the previous graph to represent  L R

Reality and Desire Diagram  is the reality  is what is desired The diagram (a multigraph) shows both reality and desire  Call it RD(  ) We can rearrange the nodes of the graph to make it easier to understand

Reality and Desire Diagram L R Reality Desire

Properties of RD(  ) Each vertex has degree 2  Each node is incident to one edge from A, the set of reality edges, and B, the set of desire edges The connected components of the graph are alternating cycles (edges alternate between reality - blue - and desire - red) Each cycle has an even number of edges, half reality and half desire

Properties of RD(  ) The number of cycles of RD(  is denoted by c   Note that c   = n + 1 since  has no breakpoints  All cycles are two parallel edges between the same pair of nodes  We have 2n + 2 nodes, so n + 1 cycles  This is the only permutation for which c   = 1

Properties of RD(  ) So transforming  into  can be seen as transforming RD(  ) into a graph with as many cycles as possible - n + 1 Now we need to see how a reversal affects the cycles of RD(  )  Note that a reversal is characterized by the two points where it cuts the current permutation, which each correspond to a reality edge

Reversals and RD(  ) Let  be a reversal defined by two reality edges (s,t) and (u,v), then RD(  ) differs from RD(  ) as follows:  Reality edges (s,t) and (u,v) are replaced by (s,u) and (t,v)  Vertices u, …, t are reversed Desire edges remain unchanged See example on following slide

Example L RR L Some nodes/edges omitted

Orientation of Cycles How many cycles are affected by a reversal? First we define convergent and divergent edges  Two reality edges on the same cycle converge if they are traversed in the same direction (clockwise or counterclockwise on the circle in the diagram) on the cycle  Otherwise they diverge

Orientation of Cycles L R Convergent: (+3,+2) (-1,-4) Divergent: (L,-3) (+3,+2)

Reversals and #Cycles Let  be a reversal acting on two reality edges e and f  If e and f belong to different cycles, c(  ) = c(  ) – 1  If e and f belong to the same cycle and converge, c(  ) = c(  )  If e and f belong to the same cycle and diverge, c(  ) = c(  ) + 1

First Case If e and f belong to different cycles, c(  ) = c(  ) – 1

Second Case  If e and f belong to the same cycle and converge, c(  ) = c(  )

Third Case  If e and f belong to the same cycle and diverge, c(  ) = c(  ) + 1

Reversals and #Cycles Note that the number of cycles changes by at most one with each reversal  Use that to find another lower bound for reversal distance  Suppose we have     …  t =  we know that c(  n + 1 and we have: c(   ) - c(  )  1 c(     ) - c(   )  1 … c(     …  t ) - c(     …  t-1 )  1  Adding and cancelling terms we get n c(  )  t If     …  t is optimal then t = d(  ), n c(  )  d(  )

Interleaving Graph This new lower bound is better than the old one - b(  )/2  For most signed permutations, it is close to the actual distance, however it does not always work (we can’t always choose two divergent edges) We can classify the cycles of RD(  ) as good or bad:  A cycle is good if it has two divergent reality edges  Otherwise it is bad

Interleaving Graph The classification only applies to proper cycles (those with at least four edges)  Those with three edges don’t need to be touched since reality = desire If we have only good cycles in a permutation, then the lower bound previously given is an equality  We sort, increasing the number of cycles by one per reversal

Interleaving Graph If a desire edge from one cycle crosses some desire edge from another cycle we say that the two cycles interleave  Interleaved cycles allow us to change a bad cycle into a good one while breaking another cycle  This good cycle can then broken in the next step  To find interleaving cycles, we construct an interleaving graph

Interleaving Graph

Nodes in the interleaving graph are cycles Edge between two nodes if the cycles interleave The connected components of the graph are called bad components if they consist entirely of bad cycles Component otherwise is a good component

Interleaving Graph What is the interleaving graph of the previous example? Suppose that F and C are good cycles.  Which components of the interleaving graph are good and which are bad?

Sorting Good Components We need to choose two divergent edges in the same cycle to define a reversal that increases the number of cycles Example A reversal characterized by two divergent edges of the same cycle is a sorting reversal if and only if it does not lead to the creation of bad components

Bad Components Using this criterion to sort all of the good components, we must now sort the bad ones Give a hierarchy of bad components We say a component B separates components A and C if all chords in RD(a) that link a terminal in A to a terminal in C cross a desire edge of B

Diagram with no Good Components

Bad Components Reversal through reality edges in different components A and C will result in every component B that separates A and C being twisted  A bad component becomes good when twisted  A good component can stay good or become bad when twisted  So twist only when no good components

Hierarchy of Bad Components A hurdle is a bad component that does not separate any other two bad components  If a bad component separates others, then it is a nonhurdle A hurdle A protects a nonhurdle B when removal of A would cause B to become a hurdle  B is protected by A when every time B separates two bad components, A is one of them

Hierarchy of Bad Components A hurdle A is called a superhurdle if it protects some other nonhurdle B  Otherwise it is called a simple hurdle Bad Components NonhurdlesHurdles Simple hurdlesSuper hurdles

Fortress A signed permutation a is called a fortress iff RD(  ) has an odd number of hurdles and all of them are super hurdles

Reversal Distance The reversal distance of oriented permutations is given by: d(  ) = n c(  ) + h(  ) + f(  )  c(  ) - number of cycles (proper and non)  h(  ) - number of hurdles  f(  ) -  a fortress? (1 else 0)  n c(  ) good components and bad components which become good during sort  h(  ) - bad components require extra reversal  f(  ) - extra reversal for fortress

Algorithm If we don’t have a good cycle we must use either a reversal on two convergent edges or a reversal on edges in different cycles  In first case, number of cycles is constant  In second case, number of cycles decreases by one  Choose case one on a hurdle  Transforms bad component into good  Number of cycles remains constant

Algorithm Getting rid of a non-hurdle doesn’t change the number of hurdles or fortress status, so distance remains the same If we reverse a superhurdle, the nonhurdle it protects becomes a hurdle so h remains constant Call reversal on some cycle in a hurdle hurdle cutting

Algorithm In order not to increase f(  ), use hurdle cutting only when h(  ) is odd Using reversal on edges in two different cycles increases c(  )  However d(  ) will decrease if we can decrease h(  ) by two  Choose edges from two different hurdles - this is called hurdle merging The two hurdles as well as any nonhurdle separating them become good components

Algorithm We have to be careful that hurdle merging doesn’t transform a nonhurdle into a hurdle  A and B are called opposite hurdles when we find the same number of hurdles walking the circle clockwise from A to B as we do walking counterclockwise  This can only happen if h(  ) is even  Choosing opposite hurdles, we don’t turn a nonhurdle into a hurdle

Algorithm To avoid creating a fortress where we don’t have one, we choose the opposite hurdles when they exist If h(  ) is odd and we have a simple hurdle, do hurdle cutting to avoid fortress If neither case if possible, we already have a fortress so f(  ) doesn’t increase with any hurdle merging

Algorithm Algorithm Sorting Reversal input: distinct permutations  and  output: a sorting reversal for  with target  if there is a good component in RD  (  ) then pick 2 divergent edges e and f in this component, making sure the corresponding reversal does not create any bad components return the reversal characterized by e and f else if h(  ) is even then return merging of two opposite hurdles else if h(  ) is odd and there is a simple hurdle then return a reversal cutting this hurdle else // fortress return merging of any two hurdles