Common Intervals in Sequences, Trees, and Graphs Steffen Heber and Jiangtian Li
Genome Comparison of Bacteria Kim et al., Kim et al., Nat. Biotechnol., 2004]
Gene Order & Function in Bacteria Gene order in bacteria is weakly conserved. [Gene order is not conserved in bacterial evolution. Mushegian, Koonin; Trends Genet. 1996] Some genes cluster together even in unrelated species. Genes inside a cluster are functionally associated. [Conserved clusters of functionally related genes in two bacterial genomes. Tamames et al.; J Mol Evol. 1997]
Gene Order & Function in Bacteria
Formalization of Gene Clusters Genomes: permutations π 1, π 2,…, π k Genes:numbers 1,…,n π1π1 π2π2 π3π3 π4π
Intervals For permutation of [n] = {1, 2, …, n}, an interval (=gene cluster) is a set { (i), (i+1), …, (j)} for 1 i < j n. Any permutation of [n] has n(n-1)/2 intervals
Common Intervals For a family F = ( 0, 1, …, k-1 ) of permutations, a common interval of F (=conserved gene cluster) is a subset S [n], iff S is interval in all i. We say S C F 00 11
Common Intervals For a family F = ( 0, 1, …, k-1 ) of permutations, a common interval of F (=conserved gene cluster ) is a subset S [n], iff S is interval in all i. We say S C F 00 11
Common Intervals For a family F = ( 0, 1, …, k-1 ) of permutations, a common interval of F (=conserved gene cluster ) is a subset S [n], iff S is interval in all i. We say S C F 00 11
Lemma Let F = ( 0, 1, …, k-1 ) and c, d C F. If c d then c d C F 00 11
Lemma Let F = ( 0, 1, …, k-1 ) and c, d C F. If c d then c d C F. We call c d reducible 00 11 reducible interval irreducible
Analysis We have K n(n-1)/2 common intervals, and I<n irreducible intervals. Find all K common intervals of k 2 permutations of [n]: O(kn + K) time & O(n) space
Common Intervals of Trees Let T,T 1,…,T k be trees with vertex set [n]. Definition: S [n] is interval of T iff T[S] connected, and |S|>1 S [n] is common interval of T 1,…,T k, iff S is interval in all trees. Tree intervals generalize intervals of permutations.
Miscellaneous Example: common intervals of T 1, T 2 : { [2], [3], [4], [5] } (Common) Intervals in trees are induced subtrees T1T T2T2
Structure of Tree Intervals Tree intervals have the Helly property, i.e. for any family of tree intervals (T i ) i I the assumption T p T q for every p,q I implies i I T i
Extreme Cases n-vertex stars S n-1 # non-trivial induced subtrees: 2 n-1 -1
The Common Interval Graph Given T = (T 1,…,T k ) and corresponding common intervals C T. The common interval graph G T = (V,E) is the graph with V = C T E = {(c,d) | c,d C F, c d , c d}
Example V=[n], T=(P n, S n-1 ) We have C T = { [2],[3],…,[n] }, G T = K(C T ). [2] [3] [4] [n] GTGT
Common Interval Graphs cont’d A graph is called chordal, if it does not contain an induced cycle C n on n>3 vertices. Proposition: Common interval graphs of trees are chordal graphs.
Irreducible Common Intervals For a common interval c C T and a subset V C T we say that V generates c, iff i.for each d V, d c ii.c = Ud iii.G T [V] is connected. If there is no such V then c is irreducible. The irred. intervals generate all common intervals
Finding Irreducible Intervals We have K < 2 n-1 common intervals, and I<n irreducible intervals. Find all irreducible common intervals of k trees on n vertices: O(kn 2 ) time & O(kn) space
Finding Irreducible Intervals Irreducible intervals are minimal common intervals containing an adjacent vertex pair. y x l z m x y l zm y x l z m x y l zm
Graph Intervals G=(V,E), undirected, connected graph, V=[n] S V is interval (convex), iff the induced subgraph G[S] is connected, and includes every shortest path with end-vertices in S convex NOT!
Common Intervals of Graphs Let G=(G 1,…,G k ) family of connected undirected graphs, with vertex set [n]. Definition: S [n] is common interval of G, iff S is interval in all graphs. Graph intervals generalize tree intervals G0G0 G1G1
Some Differences The union of convex sets is NOT always convex.
Some Differences 3 21 The common convex hull of an adjacent vertex pair is NOT always irreducible G1G1 G2G2
Finding Irreducible Graph Intervals Sketch: Given G=(G 0, G 1, …, G k-1 ) For each edge (i,j) E i* do S(i,j) := {i,j} For each (k,l) S(i,j) Add vertices ‘between’ k and l to S(i,j) Remove reducible intervals
Extreme Cases Permutations (identical permutations): C n(n-1)/2I < n Trees (identical star-trees): C < 2 n-1 I < n Graphs (complete graphs): C < 2 n I n(n-1)/2
Example: InterDom Database of protein domain interactions. Gene fusions Protein-protein interactions (DIP & BIND) Protein complexes (PDB)
Comparing Two Networks
Comparing Three Networks G : Gene fusion P : PDB B : BIND D : DIP
Irreducible Intervals size of irreducible interval
Biological Meaningful? RAS family domain protein kinase ankyrin repeat PH domain regulator of chromosome condensation
THANK Y U!!!