Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI2950-C Lecture 6 Genome Rearrangements and Duplications

Similar presentations


Presentation on theme: "CSCI2950-C Lecture 6 Genome Rearrangements and Duplications"— Presentation transcript:

1 CSCI2950-C Lecture 6 Genome Rearrangements and Duplications

2 Outline Recap Multichromosomal Rearrangements
Sorting By Reversals & Breakpoint Graphs Multichromosomal Rearrangements Duplications: Segmental and Whole-Genome Probabilistic Genome Rearrangements

3 Signed Permutations But genes (and DNA) have directions… so we should consider signed permutations 5’ 3’ p =

4 Sorting by reversals: 5 steps
hour

5 Sorting by reversals: 4 steps

6 Sorting by reversals: 4 steps
What is the reversal distance for this permutation? Can it be sorted in 3 steps?

7 Breakpoint graph 1-dimensional construction
Transform p = < 2, -4, -3, 5, -8, -7, -6, 1 > into g = < 1, 2, 3, 4, 5, 6, 7, 8 > by reversals. Vertices: i ® ia ib i ® ib ia and 0b, 9a Edges: match the ends of consecutive blocks in p, g Superimpose matchings

8 Breakpoint graph Breakpoints
Each reversal goes between 2 breakpoints, so d ³ # breakpoints / 2 = 6/2 = 3. Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f(π) where c(π) = # cycles; h,f are rather complicated, but can be computed from graph in polynomial time. Here, d = – = 4 Breakpoints are not independent. Breakpoint graph shows dependencies between the breakpoints.

9 Oriented and Unoriented Cycles
ρ x x+1 y y+1 x y x+1 y+1 Proper reversal acts on black edges: c(ρ π) – c (π) = 1 Unoriented Cycles E No proper reversal acting on an unoriented cycle These are “impediments” in sorting by reversals.

10 Safe Reversals Oriented Cycles Unoriented Cycles
Let Δc = c(ρ π) – c (π) Δh(ρ π) – h(π) A reversal p is safe if Δc – Δh = 1. Oriented Cycles ρ x x+1 y y+1 x y x+1 y+1 Proper reversal acts on black edges: c(ρ π) – c (π) = 1 Unoriented Cycles 2 1 3 -1 -2 3 c(π) = 2, h(π) = 1 c(π) = 2, h(π) = 0

11 Algorithm Outline Reversal_Sort(π) While π not sorted
if π has a “long cycle” Select ρ [a padding of π] else if π has an oriented component Select a safe reversal in component else if π has a hurdle Select ρ [Hurdle merging or cutting] else if π is a fortress Select ρ [superhurdle merging] π  π . ρ endwhile

12 Breakpoint graph Þ rearrangement scenario

13 Cell Division and Mutation
Single nucleotide change A major contributor to the development of cancer are somatic mutations that occur during cell division Will focus on structural and later copy number, which is not to say that single are not as important. What is the effect of structural changes Copy number Structural

14 Types of Rearrangements
Reversal Translocation Fusion 5 6 Fission

15 Multichromosomal rearrangements Translocation
( ) (–6 – –2) ( –2) (–6 –1 4 10) By concatenating chromosomes, this may be mimicked by a single reversal:

16 Multichromosomal rearrangements Translocation
Most concatenates don’t work! The first reversal just flipped a whole chromosome to position it correctly. This is an artifact of our genome representation; it is not a biological event. We want to avoid such artifacts.

17 Multichromosomal rearrangements Translocation
Most concatenates don’t work! These concatenates required 3 reversals instead of 1! The second reversal just flipped a whole chromosome to position it correctly; this is an artifact of our genome representation, not a biological event. We want to avoid such extra steps and artifacts.

18 Multichromosomal rearrangements Fission and fusion
( ) ( ) (1 2) (3 4 5) By concatenating chromosomes, this may be mimicked by a single reversal: Evolution: Human chromosome 2 is the fusion of two chromosomes from other hominoids (chimpanzees, orangutans, gorillas).

19 Multichromosomal rearrangements Fission and fusion
( ) ( ) (1 2) (3 4 5) By concatenating chromosomes, this may be mimicked by a single reversal: Flipping the whole chromosome (3 4 5) gives a different representation (–5 –4 –3) of the same chromosome. Chromosome ends ( ) ( ) must be tracked too.

20 Multichromosomal rearrangements Concatenates
Concatenate together all the chromosomes of a genome into a single sequence. These concatenates represent the same genome: ( ) (8 3) (–6 – –2) (8 3) (2 –7 –11 1 6) ( ) Permuting the order of chromosomes and flipping chromosomes do not count as biological events. Chromosome ends ( ) ( ) ( ) are included and are distinguishable.

21 Multichromosomal rearrangements Results
Theorem (Tesler 2002): Let d = minimum total number of reversals, translocations, fissions, and fusions among all rearrangement scenarios between two genomes. By carefully choosing concatenates of the genomes, we can usually mimic a most parsimonious scenario by a d-step reversal scenario on the concatenates with no chromosome flips or chromosome permutations. There are pathological cases requiring a (d + 1)-step reversal scenario with one chromosome flip. Total time O(( n + N )2).

22 Multichromosomal rearrangements Results
n = # of blocks, N = # of chromosomes Distance is the minimum number of reversals, fissions, fusions, translocations. Solution method: use suitable concatenates to obtain an equivalent “sorting by reversals” problem. The H-P algorithm has a nonconstructive step that required a lot of work to fix. It pertains to choosing concatenates to avoid flips and chromosome permutations. (Tesler 2002) does this constructively.

23 GRIMM Web Server Real genome architectures are represented by signed permutations Efficient algorithms to sort signed permutations have been developed GRIMM web server computes the reversal distances between signed permutations:

24 GRIMM Web Server http://www-cse.ucsd.edu/groups/bioinformatics/GRIMM
22 dense pages to fix gaps

25 Other Types of Rearrangements
Transpositions Duplication Transposition Duplications are very frequent in cancer genomes.

26 Duplications HARD!!! (NP-hard?) What problem to solve?
Given G  {1, .., n}N . i = (1 2 … n) (“permutation with duplicates”) Find reversals 1, 2, …, t, duplications 1, …, s, and permutation  such that  (1, …, t, 1, …, s) i = G and s + t is minimal ??? HARD!!! (NP-hard?)

27 Duplications (2) What problem to solve?
Given: G  {1, .., n}N , H =  G for permutation , (“permutation with duplicates”) Find: Reversals 1, 2, …, t such that 1 …t G = H and t is minimal Signed reversal distance with duplicates NP-hard (Chen, et al. 2005) If 1-1 mapping of repeated elements (orthologs) in G to H then problem reduces to reversal distance.

28 El-Mabrouk and Sankoff (2002)
Duplications (3) What problem to solve? Given: G {1, .., n}N (permutation with duplicates) Find: Permutation  , reversals 1, 2, …, s, and duplications 1, … t such that 1, …, s1, …, t  = G and t minimal. Solution when at most two duplicates per gene and restricted class of duplications El-Mabrouk and Sankoff (2002)

29 Whole Genome Duplication
Genome is doubled – extra copy of each element. Subsequently undergoes reversals. Genome Halving Problem. Given a duplicated genome P, recover the ancestral pre-duplicated genome R minimizing the reversal distance from the perfect duplicated genome R  R to the duplicated genome P. (El-Mabrouk and Sankoff )

30 Whole Genome Duplication
Genome is doubled – extra copy of each element. Subsequently undergoes reversals. If copies of each element labeled uniquely, then problem reduces to reversal distance problem.

31 Reversal Distance and Duplications
Let d(G,H) = reversal distance b/w G and H Problem of computing d(P, R  R) is unsolved minR d(P, R  R) solvable in polynomial time

32 Breakpoint Graph p g G( p,g ) 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t
2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0b 2a 2b 4b 4a 3b 3a 5a 5b 8b 8a 7b 7a 6b 6a 1a 1b 9a

33 Genome Halving: Exhaustive
Doubled genome with 2n genes Compute reversal distance on all 2n labeling of genes.

34 Genome Halving Weak Genome Halving Problem. For a given duplicated genome P, find a perfect duplicated genome R  R and a labeling of gene copies that maximizes the number of black-gray cycles c(G) in the breakpoint graph G(P,R  R) of the labeled genomes P and R  R. (Alekseyev and Pevzner 2006) Theorem (Hannenhalli-Pevzner 1995): d(π) = n + 1 – c(π) + h(π) + f where c = # cycles; h = # hurdles f = 1 if π is fortress.

35 Contracted Breakpoint Graph
Breakpoint graph construction p 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5t 5h 8h 8t 7h 7t 6h 6t 1t 1h 9t g 1 2 3 4 5 6 7 8 9 0h 1t 1h 2t 2h 3t 3h 4t 4h 5t 5h 6t 6h 7t 7h 8t 8h 9t G( p,g ) 2 -4 -3 5 -8 -7 -6 1 9 0h 2t 2h 4h 4t 3h 3t 5h 5t 8h 8t 7h 7t 6h 6t 1t 1h 9t Implicit were obverse edges (xt, xh)  is black-obverse alternating path  is gray-observe alternating path

36 Contracted Breakpoint Graph
With duplicates, pair of vertices with same label. Contract these identical vertices

37 Contracted Breakpoint Graph
P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e G’(P,R  R) Each gray edge is pair of parallel edges

38 Cycle Decompositions In H-P theory, c(π) = # of cycles in maximal cycle decomposition was key parameter. Strategy: analyze cycle decompositions of contracted breakpoint graph

39 Cycle Decompositions Genomes P and Q
G(P,Q) breakpoint graph for some labeling Black-gray cycle decomposition ??? G’(P,Q) contracted breakpoint graph Induced black-gray cycle decomposition Labeling Problem. Given a black-gray cycle decomposition of the contracted breakpoint graph G′(P,Q) of duplicated genomes P and Q, find labeling of P and Q that induces this cycle decomposition. Does not always have a solution.

40 Maximal black-gray cycle decomposition
P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e Contracted breakpoint graph G’ BG graph corresponding to G’ Maximal black gray cycle decomposition of G’ G’(P,R  R) BG graph corresponding to G’ Maximal black-gray cycle decomposition

41 P as black-observe cycle
Cycle Decomposition P = −a−b+g+d+f+g+e−a+c−f−c−b−d−e R = −a−b−d−g+f−c−e P as black-observe cycle c) Maximal black-gray cycle decomposition C of G’ (e) Superimpose two graphs – gives breakpoint graph inducing cycle decomposition in c

42 Genome Halving Algorithm: Outline
Input: Doubled genome P Construct BO (black-obverse) graph for P by gluing identical edges Introduce gray edges “optimally” to create BOG (black-observe-gray) graph G’ with single gray-observe cycle (!!!) R = gray-observe cycle in G’ Find maximal black-gray cycle decomposition of G’ and labeling of Q = R  R

43 Alternative Rearrangement Metrics
Thus far, distance posed as minimum number of rearrangements transforming one permutation to identity. Parsimony assumption in evolution. Score S(ρ) for a rearrangement ρ. Parsimony: S(ρ) = 1 for all ρ. S(ρ1, ρ2 …, ρt) = Σ S(ρi) = t Length-weighted reversals S(ρ) = l(ρ)α, where l(ρ) = length of reversed subsequence (Bender, et al. 2008) Many of the resulting optimization problems are NP hard

44 Probabilistic Genome Rearrangements
Pr[rearrangement ρ] = p. Compute Pr[rearrangement sequence ρ1…ρn] Inversions occur according to Poisson process (York, et al. (2002)) L inversions: Pr[L | λ] = e-λ λL / L! n(n+1)/2 possible inversions. Each occurs with equal probability Ω = {inversion sequences} For X = ρ1… ρLx ε Ω, Pr[X | λ] = (e-λ λLx / Lx!) ( n (n+1)/2)-Lx

45 Probabilistic Genome Rearrangements
Pr[X, λ | π] = Pr [X, λ, π] / Pr[π] = Pr[π | X, λ] Pr[X | λ] Pr[λ] / Pr[π] = (1) ((e-λ λLx / Lx!) ( n (n+1)/2)-Lx) (1/ λmax) / Pr[π] Problem: How to evaluate this distribution? Solution: Iteratively sample from Ω × (0, λmax]. (X0, λ0)  (X1, λ1)  (X2, λ2)  … After a long time, reach stationary distribution. Markov chain Monte Carlo

46 MCMC Genome Rearrangements
How to update? (Xi, λi)  (Xi+1, λi+1) Alternate updates of λ and X (Metropolis-Hastings algorithm) (Xi, λi)  (Xi, λi+1)  (Xi+1, λi+1) Pr[ λ | X, π] α Pr[X | λ] Pr[λ] α e-λ λLx Pr[λ]

47 MCMC Genome Rearrangements: Updating X
(Xi, λi+1)  (Xi+1, λi+1) Choose a section to replace with probability q(l,j), l = length, pj = starting permutation Generate new subpath from pα to pβ Use breakpoint graph G(pα, pβ) to choose an inversion sequence where Δ(c) = 1 with high probability

48 MCMC Genome Rearrangements

49 MCMC Genome Rearrangements
Can we use this approach for other genome rearrangement operations? Translocations, duplications, etc.

50 References G. Tesler: “Efficient algorithms for multichromosomal genome rearrangements.” J. Comput. Syst. Sci. 65(3): (2002) Xin Chen, Jie Zheng, Zheng Fu, Peng Nan, Yang Zhong, Stefano Lonardi, Tao Jiang: Assignment of Orthologous Genes via Genome Rearrangement. IEEE/ACM Trans. Comput. Biology Bioinform. 2(4): (2005) N. El-Mabrouk: “Reconstructing an ancestral genome using minimum segments duplications and reversals.” J. Comput. Syst. Sci. 65(3): (2002) N. El-Mabrouk, David Bryant, David Sankoff: “Reconstructing the pre-doubling genome.” RECOMB 1999: M. Alekseyev & P. Pevzner: “Colored de Bruijn Graphs and the Genome Halving Problem.” IEEE/ACM Trans. Comput. Biology Bioinform. 4(1): (2007) Bender, et al. “Improved bounds on sorting by length-weighted reversals.” J. of Computer and System Sciences 74 (2008) 744–774. York, et al. “Bayesian Estimation of the Number of Inversions in the History of Two Chromosomes” J. of Computational Biol. (2002)


Download ppt "CSCI2950-C Lecture 6 Genome Rearrangements and Duplications"

Similar presentations


Ads by Google