Presentation is loading. Please wait.

Presentation is loading. Please wait.

Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li.

Similar presentations


Presentation on theme: "Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li."— Presentation transcript:

1 Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li

2 Genome Comparison of Bacteria Kim et al., Kim et al., Nat. Biotechnol., 2004]

3 Gene Order & Function in Bacteria Gene order in bacteria is weakly conserved. [Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996] Some genes cluster together even in unrelated species. Genes inside a cluster are functionally associated. [Conserved clusters of functionally related genes in two bacterial genomes. Tamames et al.; J Mol Evol. 1997]

4 Gene Order & Function in Bacteria

5

6 Formalization of Gene Clusters Genomes: permutations π 1, π 2,…, π k Genes:numbers 1,…,n π1π1 π2π2 π3π3 π4π4 1 2 3 4 5 6 7 8 8 7 6 4 5 2 1 3 3 1 2 5 8 7 6 4 6 7 4 2 1 3 8 5

7 Intervals For permutation  of [n] = {1, 2, …, n}, an interval (=gene cluster) is a set {  (i),  (i+1), …,  (j)} for 1  i < j  n. Any permutation of [n] has n(n-1)/2 intervals. 1354267

8 Common Intervals For a family F = (  0,  1, …,  k-1 ) of permutations, a common interval of F (=conserved gene cluster) is a subset S  [n], iff S is interval in all  i. We say S  C F. 13542672451376 00 11

9 Common Intervals For a family F = (  0,  1, …,  k-1 ) of permutations, a common interval of F (=conserved gene cluster ) is a subset S   [n], iff S is interval in all  i. We say S  C F. 13542672451376 00 11

10 Common Intervals For a family F = (  0,  1, …,  k-1 ) of permutations, a common interval of F (=conserved gene cluster ) is a subset S  [n], iff S is interval in all  i. We say S  C F. 13542672451376 00 11

11 Lemma Let F = (  0,  1, …,  k-1 ) and c, d  C F. If c  d   then c  d  C F. 13542672451376 00 11

12 Lemma Let F = (  0,  1, …,  k-1 ) and c, d  C F. If c  d   then c  d  C F. We call c  d reducible. 13542672451376 00 11 reducible interval irreducible

13 Analysis We have K  n(n-1)/2 common intervals, and I<n irreducible intervals. Find all K common intervals of k  2 permutations of [n]: O(kn + K) time & O(n) space

14 Common Intervals of Trees Let T,T 1,…,T k be trees with vertex set [n]. Definition: S  [n] is interval of T iff T[S] connected, and |S|>1 S  [n] is common interval of T 1,…,T k, iff S is interval in all trees. Tree intervals generalize intervals of permutations.

15 Miscellaneous Example: common intervals of T 1, T 2 : { [2], [3], [4], [5] } (Common) Intervals in trees are induced subtrees. 4321 5 T1T1 5412 3 T2T2

16 Structure of Tree Intervals Tree intervals have the Helly property, i.e. for any family of tree intervals (T i ) i  I  the assumption T p  T q  for every p,q  I implies  i  I  T i 

17 Extreme Cases n-vertex stars S n-1 # non-trivial induced subtrees: 2 n-1 -1

18 The Common Interval Graph Given T = (T 1,…,T k ) and corresponding common intervals C T. The common interval graph G T = (V,E) is the graph with V = C T E = {(c,d) | c,d  C F, c  d , c  d}

19 Example V=[n], T=(P n, S n-1 ) We have C T = { [2],[3],…,[n] }, G T = K(C T ). [2] [3] [4] [n] 1 2 3 4321 4 GTGT

20 Common Interval Graphs cont’d A graph is called chordal, if it does not contain an induced cycle C n on n>3 vertices. Proposition: Common interval graphs of trees are chordal graphs.

21 Irreducible Common Intervals For a common interval c  C T and a subset V  C T we say that V generates c, iff i.for each d  V, d  c ii.c = Ud iii.G T [V] is connected. If there is no such V then c is irreducible. The irred. intervals generate all common intervals. 1 53 24 67

22 Finding Irreducible Intervals We have K < 2 n-1 common intervals, and I<n irreducible intervals. Find all irreducible common intervals of k trees on n vertices: O(kn 2 ) time & O(kn) space

23 Finding Irreducible Intervals Irreducible intervals are minimal common intervals containing an adjacent vertex pair. y x l z m x y l zm y x l z m x y l zm

24 Graph Intervals G=(V,E), undirected, connected graph, V=[n] S  V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S. 1 3 2 4 1 3 2 4 convex NOT!

25 Common Intervals of Graphs Let G=(G 1,…,G k ) family of connected undirected graphs, with vertex set [n]. Definition: S  [n] is common interval of G, iff S is interval in all graphs. Graph intervals generalize tree intervals. 1 3 2 4 2 3 4 1 G0G0 G1G1

26 Some Differences The union of convex sets is NOT always convex.

27 Some Differences 3 21 The common convex hull of an adjacent vertex pair is NOT always irreducible. 3 21 G1G1 G2G2

28 Finding Irreducible Graph Intervals Sketch: Given G=(G 0, G 1, …, G k-1 ) For each edge (i,j)  E i* do S(i,j) := {i,j} For each (k,l)  S(i,j) Add vertices ‘between’ k and l to S(i,j) Remove reducible intervals

29 Extreme Cases Permutations (identical permutations): C  n(n-1)/2I < n Trees (identical star-trees): C < 2 n-1 I < n Graphs (complete graphs): C < 2 n I  n(n-1)/2

30 Example: InterDom Database of protein domain interactions. Gene fusions Protein-protein interactions (DIP & BIND) Protein complexes (PDB)

31 Comparing Two Networks

32 Comparing Three Networks G : Gene fusion P : PDB B : BIND D : DIP

33 Irreducible Intervals size of irreducible interval

34 Biological Meaningful? RAS family domain protein kinase ankyrin repeat PH domain regulator of chromosome condensation

35 THANK Y U!!!


Download ppt "Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li."

Similar presentations


Ads by Google