Genome Rearrangement and Duplication Distance

Genome Rearrangement and Duplication Distance
Crystal L. Kahn 9/18/08

Genome Rearrangement Over course of evolution, genomes undergo large structural changes Chromosomal fissions, fusions, inversions, transpositions Genome rearrangement is an area of computational biology that uses parsimony* methods to compute “distances” between pairs of genomes Characterize similarity between genomes by quantifying number of operations required to transform one into another Not interested in point mutations (SNPs) -- different than edit distance * Maximum likelihood methods can also be used

Genome Rearrangements
Humans and mice have similar genomes, but their genes are ordered differently ~245 rearrangements ~ 300 large synteny blocks

History of Chromosome X
Rat Consortium, Nature, 2004 Rearrangement Events: Reversals Fusions Fissions Translocation

Genome Rearrangement Models
Types of rearrangement operations that have been considered: Reversal (Inversion) [HP, STOC95], [Bader et al., WADS01] Translocation [Hannenhalli, DAM95] Duplication transposition [El-Mabrouk, JCSS02] Ultimate goal: generic genome rearrangement model that allows any type of rearrangement G1 G1 G2 Duplications common in cancer G2

Duplication Distance: DX(Z,Y)
Input strings X, Y, Z (X non-ambiguous) Def: duplication operation, Z°s,t,p(X) X Z s t p Problem: Compute DX(Z,Y) = min number duplication operations to transform Z into Y Theorem: O(n4) algorithm, n = |Y|

Definitions T = abcdefg  = bcd  = ace String: sequence of characters
Substring: contiguous sequence of characters Subsequence: sequence of characters, not necessarily contiguous Note: a substring is a subsequence, but not necessarily vice versa T = abcdefg  = bcd  = ace

Key Insight W.L.O.G., let Z = Ø
X a b c d e f g h i j k l m n o p q r s “overlapping” Y a b c d j k c d e f l o p q a b c d c d j e k f l o p q Observation: overlapping subsequences interfere with each other Lemma: a set of subsequences that are substrings of X and that cover all the characters of Y can be converted into a sequence of duplicate operations iff they are mutually non-overlapping “Feasible set”

Finding min-cardinality feasible set for Ys,t
Let  be element of feasible set that includes index s 2 Cases:  includes index t  does not include index t Y s t  Ys,t Y s t  Ys,t

Let d(Ys,t) = DX(Ø,Ys,t) where Case 1 Ys,t and Case 2 Ys,t

Assume, by induction, already computed
Ys,t Assume, by induction, already computed Substring of X “internal substrings” of  placements of Xs,t in Ys,t Xs,t = abcd Ys,t = abcbccabcd Ys,t \  Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Ys,t = abcbccabcd Possibly exponential number of “placements” as,t computed with second recurrence in O(n2) time

Assume, by induction, already computed
Ys,t Assume, by induction, already computed bs,t computed in O(n) time

Running Time n = |Y| For a substring Ys,t:
Computing as,t takes O(n2) time Computing bs,t takes O(n) time Total of O(n2) substrings of Y Total running time: O(n4)

Duplication Transposition vs. Duplication
s t p n G ° s,t,p s t (p-1) p n G Duplication transposition: “paste” into same string s < t < p s t n G ° s,t,p(G) 1 s t (p-1) p n G p n Duplication: “paste” into another string

Duplication can be more complicated…
s t n G p n G s (p-1) p t n G ° s,t,p(G) s < p < t

Duplication Transposition Distance in Semi-Ambiguous Genomes
[El-Mabrouk, JCSS02] incorrectly computes duplication transposition distance Implication in paper is that: Given X non-ambiguous and Y semi-ambiguous, DT(X,Y) = # maximal repeated segments of Y Counterexample: X = abcdefg Y = abdecdbcefg Y0 = abcdefg Y1 = abcdbcefg Y2 = abdcdbcefg Y3 = abdecdbcefg

A Lower Bound for Duplication Transposition Distance
Lemma: If Y has at most 2 copies of every character, X is non-ambiguous, and X is a subsequence of Y, then DX(X,Y)  DT(X,Y) There is still no known algorithm for duplication transposition distance

Conclusions Duplication Distance is a simple model for genome rearrangement and can be computed efficiently. In a special case, it provides a lower bound to duplication transposition distance Thank you! Questions?

New Model for Cancer Mutation: Amplisomes
Can show that minimum amplisome distance can be reframed as: min [DG(A,Ø) + DA(T,A)] where min is taken over all possible choices of A A Duplication Distance is subproblem

Tumor Amplisomes (Maurer, et al. 1987; Wahl, 1989…) Other terms:
Episome Amplicon Double-minute 20

DX(X,Y) ≤ DT(X,Y) when Y is semi-ambiguous Why is semi-ambiguity necessary?
Semi-ambiguity ensures that all copied substrings are substrings of original X (not some intermediate) -- so for every DT operation, there exists a duplicate operation that produces the same result Example: X = A Y = AAAAAAAA DT(X,Y) = 3 DX(X,Y) = 7

Genome Rearrangement and Duplication Distance

Similar presentations

Presentation on theme: "Genome Rearrangement and Duplication Distance"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Genome Rearrangement and Duplication Distance

Similar presentations

Presentation on theme: "Genome Rearrangement and Duplication Distance"— Presentation transcript:

Similar presentations

About project

Feedback