Presentation is loading. Please wait.

Presentation is loading. Please wait.

Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy.

Similar presentations


Presentation on theme: "Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy."— Presentation transcript:

1 Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy

2 Motivation Genome rearrangements: phylogenetic information. Common assumption: one copy of each gene - permutations. In practice: assumption not defensible. Need to consider general strings. Usually complicated, simplifying assumptions.

3 Our work Consider general strings but simplify the rearrangement operation. Study the simple interchange rearrangement. abacbacabbabacbacabb interchange Results: interchange distance is NP-hard for general strings in unit-cost model BUT polynomial time computable in length-weighted model. Use two cost models: unit-cost model and length-weighted cost model.

4 Unit-cost model The interchange distance is the number of interchanges. Thm: Computing the interchange distance between two general strings is NP-hard. Proof: in two steps. show equivalence to maximum edge-disjoint cycle decomposition of digraphs (max-DCD problem). prove that max-DCD problem is NP-hard.

5 Equivalence to max-DCD Fact: [Amir et al., SODA06] The interchange distance of a permutation  of length m (to the identity permutation) is m-c(  ). Example: Consider 3 6 4 1 7 2 5 It has 3 permutation cycles: (1 4 3) (2 6) (5 7) So, its distance is 7-3=4. 3 6 4 1 7 2 5 3 6 1 4 7 2 5 1 6 3 4 7 2 5 1 2 3 4 7 6 5 6 2 3 4 5 17 Note: there’s unique cycle decomposition of the digraph.

6 Equivalence to max-DCD… What happens in general strings? Example: s 1 =a b a c b c s 2 =b a c b c a a bc a bc s 1 =a b a c b c s 2 =b a c b c a Note: Cycle decomposition of the digraph is not unique. Which is better? We want maximum number of cycles: max-DCD problem.

7 The max-DCD problem What do we know about max-DCD problem? For directed graphs Consider only graphs with no cycles of length 2. The undirected version is NP-hard [Caprara,’99].

8 The max-DCD problem… Lemma: In digraphs with no cycles of length 2, the problem of finding a decomposition into directed triangles is polynomially reducible to max-DCD. Proof: Let G be a digraph with no cycles of length 2. Clearly, if |max-DCD(G)|<|E|/3 there’s no decomposition into triangles. If |max-DCD(G)|=|E|/3 an optimal decomposition must be a decomposition into triangles. What do we know about triangles decomposition? The undirected version is NP-hard. Corollary from NP-hardness of edge partition into cliques of size k [Holyer,’81].

9 The max-DCD problem… [Holyer,’81] shows a reduction from 3-SAT to edge partition into cliques of size k. Uses a construction for general k. (2,-2,0)(0,0,0)(1,-1,0)(3,-3,0) (3,-2,-1)(1,0,-1)(2,-1,-1)(4,-3,-1) (4,-2,-2)(2,0,-2)(3,-1,-2)(5,-3,-2) (5,-2,-3)(3,0,-3)(4,-1,-3)(6,-3,-3) Holyer’s construction for k=3 (undirected triangles):

10 The max-DCD problem… We show Holyer’s proof works also for directed triangles. (2,-2,0)(0,0,0)(1,-1,0)(3,-3,0) (3,-2,-1)(1,0,-1)(2,-1,-1)(4,-3,-1) (4,-2,-2)(2,0,-2)(3,-1,-2)(5,-3,-2) (5,-2,-3)(3,0,-3)(4,-1,-3)(6,-3,-3) Idea: add directions to the construction while preserving its basic properties. This concludes the proof of hardness in unit-cost model.

11 Length-weighted cost model The weighted-interchange distance is the sum of the interchanges weights. The weight of an interchange of elements in positions i,j is |i-j|. Thm: Computing the WI-distance between two general strings is polynomial time computable. Proof: in two steps. prove the result for permutations. show how to apply to general strings.

12 WI-distance in permutations Definition: The L 1 -distance is min   |j-  (j)|. Lemma: Let x,y be permutations of length m. Then, WI-distance(x,y)  L1-distance(x,y)/2. Lemma: Let x,y be permutations of length m. Then, WI-distance(x,y)  L1-distance(x,y)/2. Proof: Consider the following algorithm: while there are unsorted pairs in x find a good pair i,j. interchange elements i and j. Proof: Best situation example4 3 2 1

13 WI-distance in permutations… What is a good pair? Elements i,j such that interchanging them is “useful” for both (I.e., i  k  i  k). Example: Consider 4 3 1 2 4,1 or 3,1 are good pairs BUT 4,2 is not a good pair. Note: The cost of interchanging good pairs never exceeds half of the L1-cost. Claim: Every unsorted permutation has a good pair. Thm: Let x,y be permutations of length m. Then, WI-distance(x,y)=L1-distance(x,y)/2.

14 WI-distance in permutations… Fact: [Amir et al., SODA06] The L1-distance between two general strings can be computed in polynomial time. So, if we compute the L1-distance in polynomial time… This gives the result for permutations. What about general strings? Do we need to try all pairings of same letters? How do we pair the symbols? Example: Text: ABCBAABBC Pattern: CCAABABBB

15 WI-distance in general strings Fact: [Amir et al., SODA06] For the L1-distance we know an optimal pairing. Example: An optimal pairing Text: ABCBAABBC Pattern: CCAABABBB Thm: Let x,y be general strings of length m. Then, WI-distance(x,y)=L1-distance(x,y)/2. Result: The WI-distance is polynomial time computable for general strings. Proof: Consider all pairings, each defines permutations for which the result holds, and use the L1-optimal pairing.

16 Conclusions The general strings situation probably difficult in unit-cost model for all well-known rearrangement operations. possible direction: length-weighted model. Note: Length-weighted cost models are considered biologically meaningful by some researchers (e.g. [Bender et al., ’04]). So, this direction might be applicable as well as computable.


Download ppt "Interchange and Weighted-Interchange Rearrangement Distances in Strings Joint work of: Amihood Amir, Tzvika Hartman, Oren Kapah and Avivit Levy."

Similar presentations


Ads by Google