A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Graph Algorithms
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Chapter 8 Topics in Graph Theory
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
Introduction to Graphs
Applications of Euler’s Formula for Graphs Hannah Stevens.
1 Partition Into Triangles on Bounded Degree Graphs Johan M. M. van Rooij Marcel E. van Kooten Niekerk Hans L. Bodlaender.
Approximation Algorithms Chapter 5: k-center. Overview n Main issue: Parametric pruning –Technique for approximation algorithms n 2-approx. algorithm.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Chapter 23 Minimum Spanning Trees
Data Transmission and Base Station Placement for Optimizing Network Lifetime. E. Arkin, V. Polishchuk, A. Efrat, S. Ramasubramanian,V. PolishchukA. EfratS.
Section 2.1 Euler Cycles Vocabulary CYCLE – a sequence of consecutively linked edges (x 1,x2),(x2,x3),…,(x n-1,x n ) whose starting vertex is the ending.
CSE 421 Algorithms Richard Anderson Dijkstra’s algorithm.
What is the next line of the proof? a). Let G be a graph with k vertices. b). Assume the theorem holds for all graphs with k+1 vertices. c). Let G be a.
1 Data Structures DFS, Topological Sort Dana Shapira.
Computational Geometry Seminar Lecture 1
Vertex Cut Vertex Cut: A separating set or vertex cut of a graph G is a set SV(G) such that S has more than one component. Connectivity of G ((G)): The.
Is the following graph Hamiltonian- connected from vertex v? a). Yes b). No c). I have absolutely no idea v.
Definition Dual Graph G* of a Plane Graph:
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Linear Programming and Parameterized Algorithms. Linear Programming n real-valued variables, x 1, x 2, …, x n. Linear objective function. Linear (in)equality.
Vertex Cut Vertex Cut: A separating set or vertex cut of a graph G is a set SV(G) such that G-S has more than one component. d f b e a g c i h.
C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
Graph Theory Chapter 6 Planar Graphs Ch. 6. Planar Graphs.
Planar Graphs: Euler's Formula and Coloring Graphs & Algorithms Lecture 7 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Spring 2015 Lecture 10: Elementary Graph Algorithms
C&O 355 Mathematical Programming Fall 2010 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.
On realizing shapes in the theory of RNA neutral networks Speaker: Leszek Gąsieniec, U of Liverpool, UK Joint work with: Peter Clote, Boston College, USA.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
A Simpler 1.5-Approximation Algorithm for sorting by transposition Tzvika Hartman.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Sorting by Cuts, Joins and Whole Chromosome Duplications
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Fall 2015 COMP 2300 Discrete Structures for Computation Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1.
Unit – V Graph theory. Representation of Graphs Graph G (V, E,  ) V Set of vertices ESet of edges  Function that assigns vertices {v, w} to each edge.
Introduction to Graphs. This Lecture In this part we will study some basic graph theory. Graph is a useful concept to model many problems in computer.
Introduction to Graphs And Breadth First Search. Graphs: what are they? Representations of pairwise relationships Collections of objects under some specified.
Homework - hints Problem 1. Node weights  Edge weights
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
CSEP 521 Applied Algorithms Richard Anderson Winter 2013 Lecture 3.
Introduction to Graph Theory
Great Theoretical Ideas in Computer Science for Some.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
COMPSCI 102 Introduction to Discrete Mathematics.
12. Lecture WS 2012/13Bioinformatics III1 V12 Menger’s theorem Borrowing terminology from operations research consider certain primal-dual pairs of optimization.
Xuding Zhu National Sun Yat-sen University Circular chromatic index.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Excursions in Modern Mathematics Sixth Edition
Algorithms and Networks
What is the next line of the proof?
Chapter 5. Optimal Matchings
Lecture 3: Genome Rearrangements and Duplications
Problem Solving 4.
A Unifying View of Genome Rearrangement
Richard Anderson Autumn 2016 Lecture 7
Richard Anderson Winter 2009 Lecture 6
Double Cut and Join with Insertions and Deletions
V12 Menger’s theorem Borrowing terminology from operations research
Existence of 3-factors in Star-free Graphs with High Connectivity
Discrete Mathematics for Computer Science
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Richard Anderson Autumn 2015 Lecture 7
Richard Anderson Lecture 5 Graph Theory
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department of Mathematics 1

A Simplified View of DCJ-Indel Distance Phillip Compeau Abstract Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. Goals: 1.“Hardwire” DCJ sorting into DCJ-indel sorting. 2.Characterize solution space for DCJ-indel sorting. DCJ solution space known (Braga and Stoye, 2010). 2

A Simplified View of DCJ-Indel Distance Phillip Compeau Section 1: Preliminaries 3 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau The Discrete Genome Genome (Π): formed of two matchings genes g(Π): each numbered gene has a head and a tail. adjacencies (a(Π)): a blue matching on V(g(Π)) 4 Γ Π

A Simplified View of DCJ-Indel Distance Phillip Compeau The Discrete Genome Chromosome: component of Π (alternating path or cycle) Linear or circular depending on path or cycle of Π Telomere: path endpoint of Π; has null adjacency {v, Ø } 5 Γ Π

A Simplified View of DCJ-Indel Distance Phillip Compeau Double-cut-and-join operation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies. DCJ Distance (d DCJ (Π, Γ)): minimum # of DCJs required to transform Π into Γ (having the same genes). The Double-Cut-and-Join Operation 6

A Simplified View of DCJ-Indel Distance Phillip Compeau The DCJ Incorporates Many Operations 7

A Simplified View of DCJ-Indel Distance Phillip Compeau The Breakpoint Graph B(Π, Γ) is formed from the adjacencies of Π and Γ. B(Π, Γ) also comprises (alternating) red-blue paths and cycles. 8

A Simplified View of DCJ-Indel Distance Phillip Compeau DCJ Distance Formula Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula: N = # of genes c(Π, Γ) = # of cycles in B(Π, Γ) p even (Π, Γ) = # of even paths in B(Π, Γ) 9

A Simplified View of DCJ-Indel Distance Phillip Compeau Indels and the DCJ-Indel Distance Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes). Assumption: we can’t remove a gene common to Π and Γ DCJ-Indel Distance (d ind DCJ (Π, Γ)): Minimum # of DCJs and indels required to transform Π into Γ. Braga et al., 2010: Solve DCJ-indel sorting in linear time. Lots of cases…can we simplify it? 10 a b Ø Ø a b d c a b c Ø a b c d

A Simplified View of DCJ-Indel Distance Phillip Compeau Section 2: Encoding Indels as DCJs 11 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau Ma et al., 2009: View deletion as formation and removal of a circular chromosome. Idea: Indel = DCJ creating circular chromosome Wait…what about the deletion of circular chromosomes? Deletion  DCJ Creating Circular Chromosome 12 a b Ø Ø a b d c a b c Ø a b c d a d bc ab c Ø a b bcad DCJ

A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. 13 a b c d bc ad DCJ 3 Operations

A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. 14 a b c d 1 Operation a b c d bc ad DCJ 3 Operations

A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. Question: Can we delete all circular singletons first? 15

A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. Question: Can we delete all circular singletons first? YES! 16

A Simplified View of DCJ-Indel Distance Phillip Compeau Handling Circular Singletons Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ. Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then d ind DCJ (Π*, Γ) = d ind DCJ (Π, Γ) – 1. Let sing(Π, Γ) = # of circular singletons of Π and Γ. Corollary 2: If Π 0 and Γ 0 are formed by removing all circular singletons from Π and Γ, then d ind DCJ (Π, Γ) = d ind DCJ (Π 0, Γ 0 ) + sing(Π, Γ) 17

A Simplified View of DCJ-Indel Distance Phillip Compeau A Novel View of DCJ-Indel Distance WLOG we may henceforth assume that sing(Π, Γ) = 0. A completion of Π is a genome Π’ such that: g(Π’) = g(Π) U g(Γ) a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) New chromosomes of Π’ are circular: the indels of Π’ Theorem: 18

A Simplified View of DCJ-Indel Distance Phillip Compeau A Novel View of DCJ-Indel Distance An optimal completion achieves the optimum below. A completion of Π is a genome Π’ such that: g(Π’) = g(Π) U g(Γ) a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) New chromosomes of Π’ are circular: the indels of Π’ Theorem: 19

A Simplified View of DCJ-Indel Distance Phillip Compeau Section 3: DCJ-Indel Sorting 20 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau Open Vertices π-open vertex: vertex not found in Π (must be matched in Π’) path endpoint in B(Π, Γ) must be π-open/γ-open or telomere (or both) Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ) Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices. 21

A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). 22

A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). Picture: 23 π π π π π π π π Cycle B(Π’, Γ’)B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau 2-Bracelet Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). Remaining components of B(Π*, Γ*): bracelet: cycle linking {π, γ}-paths chain: path linking π-paths/γ-paths via intermediate {π, γ}- paths 24 ππ γ γ ππγγπ πππ 3-Chain 2-Chain

A Simplified View of DCJ-Indel Distance Phillip Compeau Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains. Picture: Necessary Conditions for B(Π*, Γ*) 25 π π π π γ γ P1P1 P2P2 π π π π γ γ P1P1 P2P2 Cycle B(Π’, Γ’)B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths. Picture: 26 ππ π π P 1 odd P 2 odd P 3 even P 4 even B(Π’, Γ’) ππ π π Even Path B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Ø Ø Ø Ø Ø Ø Ø Ø Vs.

A Simplified View of DCJ-Indel Distance Phillip Compeau Sorting Algorithm 1.Remove all circular singletons of Π and Γ. 2.Lemma 1  Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*). 3.Form a maximum set of 2-bracelets (only chains remaining). 4.Form a maximum set of even 2-chains by linking pairs of π- paths (γ-paths) having opposite parity (Lemma 3). 5.If p π, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path. 6.Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining. 27

A Simplified View of DCJ-Indel Distance Phillip Compeau Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula: where δ = 1 only if p π, γ is odd and either: 1. p π odd > p π even, p γ odd > p γ even ; or 2. p π odd < p π even, p γ odd < p γ even Otherwise, δ = 0. DCJ-Indel Distance 28 ind

A Simplified View of DCJ-Indel Distance Phillip Compeau Section 4: The Solution Space of DCJ-Indel Sorting 29 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau Encompassing all Possible Cases The solution space is known for DCJ-sorting (Braga and Stoye, 2010). Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash. 30

A Simplified View of DCJ-Indel Distance Phillip Compeau Handling Circular Singletons The circular singletons of Π must be removed in sing(Π) steps. We have two options: 1.Delete all the circular singletons of Π. 2.Perform k “fusion” DCJs followed by sing(Π) – k chromosome deletions. This poses a straightforward (yet tedious) counting problem. 31

A Simplified View of DCJ-Indel Distance Phillip Compeau Adding Necessary Conditions on B(Π*, Γ*) Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity. Proposition 2: If p π, y is even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains. Proofs are slightly more involved… 32

A Simplified View of DCJ-Indel Distance Phillip Compeau Finishing the Job Four cases, depending on path statistics. 1.p π, γ is odd: a)p π odd > p π even, p γ odd > p γ even (or vice-versa); δ = 1 b)p π odd > p π even, p γ odd < p γ even (or vice-versa); δ = 0 2.p π, γ is even: a)p π odd > p π even, p γ odd > p γ even (or vice-versa); δ = 0 b)p π odd > p π even, p γ odd < p γ even (or vice-versa); δ = 0 These cases are tedious but straightforward and can be handled similarly. 33

A Simplified View of DCJ-Indel Distance Phillip Compeau Section 5: Conclusion 34 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

A Simplified View of DCJ-Indel Distance Phillip Compeau Future Work Correspondence with Braga et al., 2010? Varying the indel cost? Charge indel cost ≤ DCJ cost, take minimum total cost. Most of the simplifying sorting lemmas hold, but actually computing the minimum cost appears difficult in this model. The problem is solved! (under framework of Braga et al., 2010) 35

A Simplified View of DCJ-Indel Distance Phillip Compeau Questions? 36

A Simplified View of DCJ-Indel Distance Phillip Compeau Shameless Plug A novel education website that teaches bioinformatics through programming exercises. Have “professor” environment for assigning programming exercises to your bioinformatics classes. 37