Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.

Similar presentations


Presentation on theme: "A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department."— Presentation transcript:

1 A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department of Mathematics 1

2 A Simplified View of DCJ-Indel Distance Phillip Compeau Abstract Braga et al., 2010: Solved problem of DCJ-indel sorting in linear time. Goals: 1.“Hardwire” DCJ sorting into DCJ-indel sorting. 2.Characterize solution space for DCJ-indel sorting. DCJ solution space known (Braga and Stoye, 2010). 2

3 A Simplified View of DCJ-Indel Distance Phillip Compeau Section 1: Preliminaries 3 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

4 A Simplified View of DCJ-Indel Distance Phillip Compeau The Discrete Genome Genome (Π): formed of two matchings genes g(Π): each numbered gene has a head and a tail. adjacencies (a(Π)): a blue matching on V(g(Π)) 4 Γ Π

5 A Simplified View of DCJ-Indel Distance Phillip Compeau The Discrete Genome Chromosome: component of Π (alternating path or cycle) Linear or circular depending on path or cycle of Π Telomere: path endpoint of Π; has null adjacency {v, Ø } 5 Γ Π

6 A Simplified View of DCJ-Indel Distance Phillip Compeau Double-cut-and-join operation (DCJ; Yancopoulos et al., 2005): “cuts” genome in two places and rejoins adjacencies. DCJ Distance (d DCJ (Π, Γ)): minimum # of DCJs required to transform Π into Γ (having the same genes). The Double-Cut-and-Join Operation 6

7 A Simplified View of DCJ-Indel Distance Phillip Compeau The DCJ Incorporates Many Operations 7

8 A Simplified View of DCJ-Indel Distance Phillip Compeau The Breakpoint Graph B(Π, Γ) is formed from the adjacencies of Π and Γ. B(Π, Γ) also comprises (alternating) red-blue paths and cycles. 8

9 A Simplified View of DCJ-Indel Distance Phillip Compeau DCJ Distance Formula Bergeron et al., 2006: If Π and Γ share the same genes, then the DCJ distance is given by the following formula: N = # of genes c(Π, Γ) = # of cycles in B(Π, Γ) p even (Π, Γ) = # of even paths in B(Π, Γ) 9

10 A Simplified View of DCJ-Indel Distance Phillip Compeau Indels and the DCJ-Indel Distance Indel: The insertion or deletion of a chromosome or chromosomal interval (consecutive genes). Assumption: we can’t remove a gene common to Π and Γ DCJ-Indel Distance (d ind DCJ (Π, Γ)): Minimum # of DCJs and indels required to transform Π into Γ. Braga et al., 2010: Solve DCJ-indel sorting in linear time. Lots of cases…can we simplify it? 10 a b Ø Ø a b d c a b c Ø a b c d

11 A Simplified View of DCJ-Indel Distance Phillip Compeau Section 2: Encoding Indels as DCJs 11 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

12 A Simplified View of DCJ-Indel Distance Phillip Compeau Ma et al., 2009: View deletion as formation and removal of a circular chromosome. Idea: Indel = DCJ creating circular chromosome Wait…what about the deletion of circular chromosomes? Deletion  DCJ Creating Circular Chromosome 12 a b Ø Ø a b d c a b c Ø a b c d a d bc ab c Ø a b bcad DCJ

13 A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. 13 a b c d bc ad DCJ 3 Operations

14 A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #1: Two deleted circular chromosomes are created from a single DCJ. 14 a b c d 1 Operation a b c d bc ad DCJ 3 Operations

15 A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. Question: Can we delete all circular singletons first? 15

16 A Simplified View of DCJ-Indel Distance Phillip Compeau Apparent Exceptions Apparent Exception #2: A deleted circular chromosome is never involved in a DCJ Circular singleton of Π: A circular chromosome of Π that shares no genes with Γ. Question: Can we delete all circular singletons first? YES! 16

17 A Simplified View of DCJ-Indel Distance Phillip Compeau Handling Circular Singletons Proposition: When transforming Π into Γ via a minimum collection of DCJs and indels, no gene belonging to a circular singleton of Π can ever appear in the same chromosome as a gene of Γ. Corollary 1: If Π* is formed from Π by removing a circular singleton from Π, then d ind DCJ (Π*, Γ) = d ind DCJ (Π, Γ) – 1. Let sing(Π, Γ) = # of circular singletons of Π and Γ. Corollary 2: If Π 0 and Γ 0 are formed by removing all circular singletons from Π and Γ, then d ind DCJ (Π, Γ) = d ind DCJ (Π 0, Γ 0 ) + sing(Π, Γ) 17

18 A Simplified View of DCJ-Indel Distance Phillip Compeau A Novel View of DCJ-Indel Distance WLOG we may henceforth assume that sing(Π, Γ) = 0. A completion of Π is a genome Π’ such that: g(Π’) = g(Π) U g(Γ) a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) New chromosomes of Π’ are circular: the indels of Π’ Theorem: 18

19 A Simplified View of DCJ-Indel Distance Phillip Compeau A Novel View of DCJ-Indel Distance An optimal completion achieves the optimum below. A completion of Π is a genome Π’ such that: g(Π’) = g(Π) U g(Γ) a(Π’) = a(Π) U perfect matching on V(Π’) – V(Π) New chromosomes of Π’ are circular: the indels of Π’ Theorem: 19

20 A Simplified View of DCJ-Indel Distance Phillip Compeau Section 3: DCJ-Indel Sorting 20 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

21 A Simplified View of DCJ-Indel Distance Phillip Compeau Open Vertices π-open vertex: vertex not found in Π (must be matched in Π’) path endpoint in B(Π, Γ) must be π-open/γ-open or telomere (or both) Define {π, π}-paths, {π, γ}-paths, π-paths in B(Π, Γ) Idea: Construct B(Π*, Γ*) from B(Π, Γ) by matching vertices. 21

22 A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). 22

23 A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). Picture: 23 π π π π π π π π Cycle B(Π’, Γ’)B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Vs.

24 A Simplified View of DCJ-Indel Distance Phillip Compeau 2-Bracelet Necessary Conditions for B(Π*, Γ*) Lemma 1: If (Π*, Γ*) is an optimal completion of (Π, Γ), then every {π, π}-path ({γ, γ}-path) of length 2k – 1 in B(Π, Γ) embeds into a cycle of length 2k in B(Π*, Γ*). Remaining components of B(Π*, Γ*): bracelet: cycle linking {π, γ}-paths chain: path linking π-paths/γ-paths via intermediate {π, γ}- paths 24 ππ γ γ ππγγπ πππ 3-Chain 2-Chain

25 A Simplified View of DCJ-Indel Distance Phillip Compeau Lemma 2: B(Π*, Γ*) can contain only 2-bracelets, 2-chains, and 3-chains. Picture: Necessary Conditions for B(Π*, Γ*) 25 π π π π γ γ P1P1 P2P2 π π π π γ γ P1P1 P2P2 Cycle B(Π’, Γ’)B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Vs.

26 A Simplified View of DCJ-Indel Distance Phillip Compeau Necessary Conditions for B(Π*, Γ*) Lemma 3: B(Π*, Γ*) cannot have one 2-chain joining two odd π-paths and another 2-chain joining two even π-paths. The same holds for γ-paths. Picture: 26 ππ π π P 1 odd P 2 odd P 3 even P 4 even B(Π’, Γ’) ππ π π Even Path B(Π’’, Γ’) d DCJ (Π’’, Γ’) < d DCJ (Π’, Γ’) Ø Ø Ø Ø Ø Ø Ø Ø Vs.

27 A Simplified View of DCJ-Indel Distance Phillip Compeau Sorting Algorithm 1.Remove all circular singletons of Π and Γ. 2.Lemma 1  Close every {π, π}-path ({γ, γ}-path) into a cycle by adding a single new adjacency to Π* (Γ*). 3.Form a maximum set of 2-bracelets (only chains remaining). 4.Form a maximum set of even 2-chains by linking pairs of π- paths (γ-paths) having opposite parity (Lemma 3). 5.If p π, γ is odd, then link the remaining {π, γ}-path with any remaining π-path and γ-path. 6.Arbitrarily link pairs of remaining π-paths, all of which have the same parity. Do the same for any γ-paths remaining. 27

28 A Simplified View of DCJ-Indel Distance Phillip Compeau Theorem: The preceding algorithm solves DCJ-indel sorting in linear time, and it implies a DCJ-indel distance formula: where δ = 1 only if p π, γ is odd and either: 1. p π odd > p π even, p γ odd > p γ even ; or 2. p π odd < p π even, p γ odd < p γ even Otherwise, δ = 0. DCJ-Indel Distance 28 ind

29 A Simplified View of DCJ-Indel Distance Phillip Compeau Section 4: The Solution Space of DCJ-Indel Sorting 29 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

30 A Simplified View of DCJ-Indel Distance Phillip Compeau Encompassing all Possible Cases The solution space is known for DCJ-sorting (Braga and Stoye, 2010). Thus, we only need to find all optimal completions, and the specific operations will fall out in the wash. 30

31 A Simplified View of DCJ-Indel Distance Phillip Compeau Handling Circular Singletons The circular singletons of Π must be removed in sing(Π) steps. We have two options: 1.Delete all the circular singletons of Π. 2.Perform k “fusion” DCJs followed by sing(Π) – k chromosome deletions. This poses a straightforward (yet tedious) counting problem. 31

32 A Simplified View of DCJ-Indel Distance Phillip Compeau Adding Necessary Conditions on B(Π*, Γ*) Proposition 1: Every π-path embedding into a 3-chain of an optimal completion must have the same parity. Proposition 2: If p π, y is even, then B(Π*, Γ*) must contain a maximum collection of even 2-chains. Proofs are slightly more involved… 32

33 A Simplified View of DCJ-Indel Distance Phillip Compeau Finishing the Job Four cases, depending on path statistics. 1.p π, γ is odd: a)p π odd > p π even, p γ odd > p γ even (or vice-versa); δ = 1 b)p π odd > p π even, p γ odd < p γ even (or vice-versa); δ = 0 2.p π, γ is even: a)p π odd > p π even, p γ odd > p γ even (or vice-versa); δ = 0 b)p π odd > p π even, p γ odd < p γ even (or vice-versa); δ = 0 These cases are tedious but straightforward and can be handled similarly. 33

34 A Simplified View of DCJ-Indel Distance Phillip Compeau Section 5: Conclusion 34 1.Preliminaries 2.Encoding Indels as DCJs 3.DCJ-Indel Sorting 4.The Solution Space of DCJ-Indel Sorting 5.Conclusion

35 A Simplified View of DCJ-Indel Distance Phillip Compeau Future Work Correspondence with Braga et al., 2010? Varying the indel cost? Charge indel cost ≤ DCJ cost, take minimum total cost. Most of the simplifying sorting lemmas hold, but actually computing the minimum cost appears difficult in this model. The problem is solved! (under framework of Braga et al., 2010) 35

36 A Simplified View of DCJ-Indel Distance Phillip Compeau Questions? 36

37 A Simplified View of DCJ-Indel Distance Phillip Compeau Shameless Plug www.rosalind.info A novel education website that teaches bioinformatics through programming exercises. Have “professor” environment for assigning programming exercises to your bioinformatics classes. 37


Download ppt "A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department."

Similar presentations


Ads by Google