Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – 236818 ©Shai Lubliner.

Similar presentations


Presentation on theme: "Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – 236818 ©Shai Lubliner."— Presentation transcript:

1

2 Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – 236818 ©Shai Lubliner

3 2 Turning Cabbage Into Turnip…

4 3 What is it good for ? Genome Rearrangements Duplications Translocations Inversions (Reversals…) Important evolutionary scenarios. Phylogeny Analysis

5 4 Our Starting Point Given two genomic sequences, we are able to locate homologous genes, on strands ‘+’ or ‘-’. (We do not start from scratch…) A smooth transition to permutations…

6 5 Formalities Begin … Permutation:π = ( π 1, π 2,…, π n ) { π 1,…, π n } ≡ {1,…,n} Reversal: ρ(i,j) (i and j are indexes) (1,2,3,4,5,6)·ρ(3,5) = (1,2,5,4,3,6) d(π) ≡ Minimal distance in reversals between π and (1,…,n).

7 6 Breakpoints b(π) ≡ Number of breakpoints in permutation π. An imediate bound on d(π): d(π) ≥ b(π)/2 (A reversal may eliminate 2 breakpoints at most.) Breakpoints:|i–j| = 1 → i ~ j π 1 ~ π 2 ≡ adjacency π 1 !~ π 2 ≡ breakpoint

8 7 Breakpoint Graphs G(π): Breakpoint Graph for permutation π = ( π 1, π 2,…, π n ): Vertices: { π 0, π 1,…, π n, π n+1 } ≡ {0,1,…,n,n+1} ( |V| = n+2 ) Edges: Black -{ ( π i, π j ) | π i !~ π j ∧ i~j } Gray -{ ( π i, π j ) | π i ~ π j ∧ i!~j }

9 8 Breakpoint Graphs (2) Alternating Cycles: Every two consecutive edges are not of same color. l(C) ≡ Length of alternating cycle ≡ #(Black edges in C) l(C) = 2 ⇔ C is short. l(C) > 2 ⇔ C is long. A simple permutation: For which all cycles are short. c(π) ≡ Maximum number of edge-disjoint alternating cycle decomposition of G(π).

10 9 A better bound on d(π) Bafna and Pevzner (1993): d(π) ≥ b(π) - c(π) Still, this bound is not tight enough.

11 10 Signed Permutations DNA has two strands, denoted + and -. A gene on the + strand has a + orientation. A reversal changes a gene’s orientation. d(π) for a signed π ≡ minimal distance in reversals between π and (+1, …,+n).

12 11 Breakpoint Graphs For The Signed Case Transforming a signed permutation into an unsigned one: π = (…,+x,…) → π‘ = (…,2x-1,2x,…) π = (…,-x,…) → π‘ = (…,2x,2x-1,…) Example : +1 -3 -2 +4 ↓ 1 2 6 5 4 3 7 8 π‘ is called the Image of π. ( Im(π) ) The breakpoint graph is defined for π'.

13 12 Example Each Vertex’s degree is ≤ 2. (0 or 2) Cycle decomposition is unique. d(π) ≥ d(π') (every reversal in π can be mimicked in π‘)

14 13 A limit on reversals in Im(π) In π‘=Im(π) we allow only: ρ(2i+1,2j). Mimicking ρ(i+1,j) in π. d(π) = d(π') …

15 14 Seeking reversals that matter For a=(π i-1,π i ), b=(π j-1,π j ), ρ(i,j): a and b are black → ρ acts on a and b. a and b in same cycle c → ρ acts on c. Given a reversal ρ: Δb ≡ Δb(π,ρ) ≡ b(πρ) - b(ρ). (increase in breakpoints) Δc ≡ Δc(π,ρ) ≡ c(πρ) - c(ρ). (increase in cycle decomposition size)

16 15 Seeking reversals that matter (2) Bafna and Pevzner (1993): Δ(b-c) ≡ Δb(π,ρ) - Δc(π,ρ) ≥ -1 Δ(b-c) = -1 → ρ is proper. An oriented gray edge: Such that ρ which acts on two of it’s adjacent black edges is proper. A cycle is oriented if it has an oriented gray edge. Don’t p anic…

17 16 … Pause for breath ρ(3,14) is proper. (it eliminates C1) Therefore: (π‘ 2,π‘ 14 )=(2,3) and (π‘ 3,π‘ 15 )=(6,7) are oriented. C 1 is oriented. Can you locate the rest…?

18 17 Interleaving cycles [i,j] and [k,l] interleave ⇔ i<k<j<l (or k<i<l<j) Gray edges (π i,π j ) and (π k,π l ) interleave ⇔ [i,j] and [k,l] interleave. Cycles C 1 and C 2 interleave if they have interleaving gray edges. In the previous example: C 2 and C 3 interleave, C 1 and C 2 do not.

19 18 Interleaving Graphs H π ≡ H(C π,I π ) Vertices : C π ≡ { C | C is a cycle in G(π) } ( C is oriented in G(π) → C is oriented in H π ) Edges : I π ≡ { (C 1,C 2 ) | C 1 and C 2 interleave in G(π) } A connected component CC in H π is oriented ⇔ CC has an oriented vertex

20 19 Interleaving Graph Example C1C1 C3C3 C2C2

21 20 Hurdles Component U seperates components U’ and U’’ Component U does not seperate components U’ and U’’ In both examples, U’ and U’’ are contained by U

22 21 Hurdles (2) ∝ : The containment partial order on the set of unoriented components, U π. A Minimal hurdle: Minimal in ∝. Greatest Hurdle: Greatest in ∝. Does not separate any two minimal hurdles. A Hurdle: A minimal hurdle or the greatest hurdle.

23 22 Hurdles (3) h(π) ≡ Number of hurdles in permutation π. Given a reversal ρ: Δh ≡ Δh(π,ρ) ≡ h(πρ) - h(ρ). Theorem 1: ( improving the bound on d(π) ) For arbitrary (signed) permutation π, d(π) ≥ b(π) - c(π) + h(π). ( for any reversal: Δ(b-c+h) ≥ -1)

24 23 Avoiding Long Cycles Reminder: l(C) > 2 (more than 2 black edges) ⇔ C is long. Motivation: Long cycles are hard to analyze and cope with. A solution: Reducing the problem through splitting long cycles.

25 24 Splitting Cycles (g,b)-split: Transforming π into a generalized permutation π*. Transforming G(π) into G*(π)=G(π*). (Lemma 1)

26 25 Generalized Permutations Real numbers instead of Integers. “Identity” permutation π = (π 1,π 2,…,π n ): π i < π i+1 for all 1 < i < n-1. Breakpoint Graph: Basically the same. Black edges between adjacent non-consecutive elements. Gray edges between non-adjacent consecutive elements.

27 26 (g,b)-padding Let black edge b=(π i+1,π i ) and gray edge g=(π j,π k ) be in cycle C=(…,π i+1,π i,…,π j,π k,…) in G(π). Δ ≡ π k - π j ( for integers: +1 or -1 ) v ≡ π j + ⅓Δ w ≡ π k - ⅓Δ (g,b)-padding of π=(π 1,…,π i,π i+1,…,π n ): π* = φ(π) = (π 1,…,π i,v,w,π i+1,…,π n )

28 27 (g,b)-padding (2) v and w are adjacent and consecutive in π*. If π=Im(π’) then exists π’’ such that π*=Im(π’’). φ is safe ⇔ g and b are non-incident and h(π)=h(π*) A safe (g,b)-padding: b=(15,9) g=(17,16)

29 28 (g,b)-padding (3) φ is safe ⇔ b(π*)-c(π*)+h(π*) = b(π)-c(π)+h(π) Let φ break a cycle C in G(π) into C 1,C 2 in G(π*): C is oriented ⇔ C 1 or C 2 is oriented (Lemma 2) Cycle D interleaves with C ⇔ D interleaves with C 1 or C 2 (Lemma 3) Theorem 2: If C is a long cycle in G(π) then there is a safe (g,b)-padding acting on C.

30 29 The point in padding π A generalized sorting of π: – A sequence π=π(0), π(1), …, π(k)=σ. – σ is the generalized identity permutation. – Transforming π(i) into π(i+1) through a reversal or a (g,b)-padding. Every generalized sorting of π mimics a (genuine) sorting of π with same number of reversals. (Lemma 5)

31 30 Reversals that matter (3) Reminder: For a reversal, Δ(b-c+h) ≥ -1 (Theorem 1 …) Safe reversal: For which Δ(b-c+h) = -1 Let us say that we have a simple permutation with an oriented cycle. We will now see that in such a permutation there is a safe reversal. Can you sense where we are getting at…?

32 31 World of Simple Permutations In the next slides π is a simple permutation. Let C be a cycle in G(π): V(C) ≡ { C’ | C’ interleaves with C } E(C) ≡ { (C 1,C 2 ) | C 1,C 2 ∈ V(C) ∧ C 1,C 2 interleave in π } Ê(C) ≡ { (C 1,C 2 ) | C 1,C 2 ∈ V(C) ∧ C 1,C 2 do not interleave in π } Let reversal ρ act on an oriented cycle: ρ removes E(C) and adds Ê(C) to H π. ρ changes orientation of D ⇔ D ∈ V(C). (Lemma 6)

33 32 Example C3’C3’ C2C2 C3”C3” C1C1 C2C2 C3”C3” A safe reversal – ρ(9,18) C1C1

34 33 World of Simple Permutations (3) Let K be an oriented component in H π, and ρ a reversal. If ρ breaks K into several connected components, we mark them: K 1 (ρ),K 2 (ρ),… Theorem 3: For an oriented component K in H π there is a safe reversal ρ for which K 1 (ρ),K 2 (ρ),… are all oriented. We now know how to “get rid” of oriented cycles.

35 34 Back to Hurdles… After clearing all oriented cycles, we are left with only unoriented ones. Our efforts now are to find a safe reversal in the absence of oriented cycles in G(π). But does one necessarily exist ?

36 35 Cover Graph Reminder: ∝ : The containment partial order on the set of unoriented components, U π. Ω π : –Vertices: U π ∪ { û } (û is an “artificial” maximum in ∝ ) –Edges: { (u,v) | u,v ∈ U π ∪ { û } ∧ u ∝ v }

37 36 Cover Graph (2) If there is no greatest hurdle: Number of leaves in Ω π = h(π) Else: Number of leaves in Ω π = h(π) + 1

38 37 Cutting Hurdles Every reversal ρ on a cycle in hurdle K cuts off the leaf K from Ω π ( Ω πρ = Ω π \ K ). (Lemma 9) Great!! We can drop them like flies… Or can we…?

39 38 Super Hurdles Super Hurdle: Deletion of that hurdle from U π transforms a non-hurdle U ∈ U π into a hurdle. Simple Hurdle: Else… A reversal acting on a cycle of a simple hurdle is safe. ( Lemma 10) We need a way to find safe reversals, even when simple hurdles are absent.

40 39 Cover Graph (3) Let L and M be hurdles in π: PATH(L,M) : All components (vertices) in the path from L to M in Ω π. LCA(L,M) : (defined for minimal hurdles L and M) Least common ancestor of L and M in Ω π. LCA*(L,M) : (defined for minimal hurdles L and M) Least common ancestor of L and M in Ω π, which does not separate them.

41 40 Hurdles Merging Let G=(V,E) be a graph, w ∈ V, W ⊂ V: Contraction of W into w in G, yields a new graph: Vertices: V \ (W\w) Edges: { ( p(x),p(y) ) | (x,y) ∈ E } p(v): v ∈ W → p(v) = w, else → p(v) = v Let L and M be different hurdles in π, and ρ a reversal acting on black edges of L and M. ρ acts on Ω π as the contraction of PATH(L,M) into LCA*(L,M). ( Lemma 11)

42 41 Hurdles Merging Example

43 42 Safe Reversals found again Let L and M be hurdles: L max ≡ Rightmost position in L. L < M ⇔ L max < M max Let us order all hurdles in π: U(1) < … < U(l)≡L < … < U(m)≡M < … < U( h(π) ) BETWEEN(L,M)≡ { U(i) | l < i < m } OUTSIDE(L,M)≡ { U(i) | i m }

44 43 Safe Reversals found again (2) Let ρ be a reversal merging hurdles L and M in π. BETWEEN(L,M), OUTSIDE(L,M) ≠ Ф → ρ is safe. ( Lemma 12 ) If h(π)>3 then there exists a safe reversal merging two hurdles in π. ( Lemma 13 ) If h(π)=2 then there exists a safe reversal merging the two hurdles in π. If h(π)=1then there exists a safe reversal cutting the only hurdle in π. ( Lemma 14 ) Let ρ merge hurdles L and M, and let U≠L,M be a hurdle. U is a super hurdle in π ⇔ U is a super hurdle in πρ ( Lemma 15 )

45 44 Fortresses By now we can pretty much tackle most problems. Still, the case when h(π)=3 needs further attention. 3-fortress : A permutation π for which Ω π is a homeomorph of a 3-star, with 3 super hurdles.

46 45 Fortresses Example A 3-fortress : Not a fortress : Simple hurdle Super hurdle A safe reversal

47 46 Fortresses (3) ρ destroys a 3-fortress → ρ is not safe ( Lemma 16 ) Therefore, for a 3-fortress we have to make at least one unsafe reversal. π is a 3-fortress → d(π) = b(π)–c(π)+h(π)+1 ( Lemma 17 ) You may also sense that if we manage to avoid formation of a 3-fortress, then d(π) = b(π)–c(π)+h(π). Consider a permutation π with an odd number n of super hurdles…

48 47 Fortresses (4) Fortress: A permutation π with an odd number of super hurdles. Theorem 4: For any permutation π: π is a fortress → d(π) = b(π)–c(π)+h(π)+1 else → d(π) = b(π)–c(π)+h(π)

49 48 P !!! We are now (finally…) able to introduce an algorithm which solves the problem of sorting signed permutations by reversals. The unsigned problem is NP-complete. Surprisingly (at the time), the signed case proved to have a polynomial solution. Hopefully, you can now sense why…

50 49 Algorithm

51 50 Algorithm Complexity Reminder: In G(π), ∀ v ∈ V: d(v) ≤ 2. Therefore, |E| ≤ 2|V|. Notion of complexity(π): (g,b)-padding is needed for each C ∈ C π with l(C)>2 (long). Hence: complexity(π) = Σ C ∈ Cπ ( l(C)-2 ) Let a (g,b)-padding φ break a cycle C into C 1 and C 2. ( l(C 1 )-2 )+( l(C 2 )-2 ) = ( l(C)-2 ) – 1. For the “original” π, complexity(π) = O(n). So number of paddings is O(n), with 2 new vertices added each time. Therefore, |V| = O(n), in every iteration.

52 51 Algorithm Complexity (2) Clearly: b(π), c(π), h(π) = O(n). Reminder : After (g,b)-padding, b(π)-c(π)+h(π) is unchanged. In each iteration, except for one, we perform a (g,b)-padding or a safe reversal. Therefore, there are O(n) iterations. The “heaviest” iteration is when we need to find a safe reversal in an oriented component. It could be done in O(n³). Hence, total time complexity – O(n 4 ).

53 52 Algorithm Without Padding f(π): π is a fortress → f(π) = 1 else → f(π) = 0 Valid reversal: Δ(b-c+h+f) = -1 ⇔ reversal is valid Algorithm ( O(n 5 ) ):

54 53 Algorithm Without Padding (2) Tested on real biological data. Helped to establish evolutionary scenarios for data which had been previously considered “too hard to analyze”. For most biological data: h(π)=0.


Download ppt "Sorting Signed Permutations By Reversals (The Hannenhalli – Pevzner Theory) Seminar in Bioinformatics – 236818 ©Shai Lubliner."

Similar presentations


Ads by Google