Presentation is loading. Please wait.

Presentation is loading. Please wait.

Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng.

Similar presentations


Presentation on theme: "Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng."— Presentation transcript:

1 Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng Hsu 2 1 Dept Computer Sci. & Info. Management, Providence University, Taichung, Taiwan. 2 Institute of Information Science Academia Sinica, Taipei, Taiwan

2 Yaw-Ling Lin, Providence, Taiwan2 Motivation – Where the problems come from?

3 Yaw-Ling Lin, Providence, Taiwan3 Two-Component System Two-component systems (2CS): –Sensor histidine kinase –response regulator The major controlling machinery in order for bacteria to encounter a diverse and often hostile environment

4 Yaw-Ling Lin, Providence, Taiwan4 2CS in Pseudomonas aeruginosa PAO1 http://www.pseudomonas.com/ “Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen.” Nature. 2000 Aug 31;406(6799):947-8. by Stover CK, Pham XQ, Erwin AL, et al. Genome: 6.3M bp predicted genes: 5570 123 genes were classified as 2CSs.

5 Yaw-Ling Lin, Providence, Taiwan5 2CS in PAO1

6 Yaw-Ling Lin, Providence, Taiwan6 2CS in PAO1

7 Yaw-Ling Lin, Providence, Taiwan7 2CS in PAO1

8 Yaw-Ling Lin, Providence, Taiwan8 2CS in PAO1 There are 123 annotated 2CS genes in PAO1. Use systemic analysis of the evolutionary relationships between the sensor kinase and response regulator of a 2CS. Construct phylogenic trees using Clustal-W for 54 sensor kinases and 59 response regulators.

9 Yaw-Ling Lin, Providence, Taiwan9 2CS in PAO1 -- Sensor Tree

10 Yaw-Ling Lin, Providence, Taiwan10 2CS: Regulator Tree

11 Yaw-Ling Lin, Providence, Taiwan11 Subtrees Analysis of 2CS

12 Yaw-Ling Lin, Providence, Taiwan12 Co-evolution subtree Analysis Sensor TreeRegulator Tree versus

13 Yaw-Ling Lin, Providence, Taiwan13 Different Trees Different phylogenetic trees inference methods : -Maximum parsimony -Maximum likelihood -Distance matrix fitting -Quartet based methods C omparing the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees. How to find the largest set of items on which the trees agree ?

14 Yaw-Ling Lin, Providence, Taiwan14 Previous Results Measuring the similarity / difference between trees: -Symmetric difference [Robinson 1979] -Robinson and Foulds (RF) metric [Robinson 1981] -Nearest-neighbor interchange [Waterman 1978] -Subtree transfer distance [Allen 2001] -Quartet metric [Estabrook 1985] Inferring the consensus tree: maximum agreement subtree problem (MAST) ; a.k.a the maximum homeomorphic agreement subtree

15 Yaw-Ling Lin, Providence, Taiwan15 MAST: Maximum Agreement Subtree Problem: given a set of rooted trees whose leaves are drawn from the same set of items of size n, find the largest subset of these items so that the portions of the trees restricted to the subset are isomorphic. [Amir and Keselman 1997]: NP-hard even for 3 unbounded degree trees. [Hein 1995]: the MAST for 3 trees with unbounded degree is hard to be approximated. [Amir et al 1997] Polynomial time algorithms for three or more bounded degree trees, but the time complexity is exponential in the bound for the degree.

16 Yaw-Ling Lin, Providence, Taiwan16 MAST: Maximum Agreement Subtree [Farach and Thorup 1997]: O(n 1. 5 log n) time algorithm for two arbitrary degree trees. [Cole et al 2002]: MAST of two binary trees can be found in O(n log n) time; MAST of two degree d trees can be found in time.

17 Yaw-Ling Lin, Providence, Taiwan17 Problem Definition A phylogenetic tree with n leaves is a (rooted) tree such that all the leaf nodes are uniquely labelled from 1 to n. The descendent subtree of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex. Given a set of n-leaf phylogenetic trees, we wish to explore the descendent subtrees relationships within these trees.

18 Yaw-Ling Lin, Providence, Taiwan18 Normalized cluster distance between two sets Symmetric set difference: Normalized cluster distance:

19 Yaw-Ling Lin, Providence, Taiwan19 All Pairs Subtrees Comparison – A naïve O(n 3 ) algorithm

20 Yaw-Ling Lin, Providence, Taiwan20 All Pairs Subtrees Comparison – Property

21 Yaw-Ling Lin, Providence, Taiwan21 All Pairs Subtrees Comparison – an O(n 2 ) algorithm

22 Yaw-Ling Lin, Providence, Taiwan22 Lowest Common Ancestor

23 Yaw-Ling Lin, Providence, Taiwan23 Confluent subtree

24 Yaw-Ling Lin, Providence, Taiwan24 Confluent subtree – Illustration

25 Yaw-Ling Lin, Providence, Taiwan25 Consructing confluent subtree

26 Yaw-Ling Lin, Providence, Taiwan26 Nearest subtree

27 Yaw-Ling Lin, Providence, Taiwan27 Nearest subtree: reasoning

28 Yaw-Ling Lin, Providence, Taiwan28 Nearest subtree: Algorithm

29 Yaw-Ling Lin, Providence, Taiwan29 Leaf-agree / Isomorphic Subtrees

30 Yaw-Ling Lin, Providence, Taiwan30 leaf-agreement – Two Trees

31 Yaw-Ling Lin, Providence, Taiwan31 All-agreement: Illustration X Y z x y y’=Lca(Y) T1T1 X z’=Lca(x’, y’) Y x’=Lca(X) T2T2

32 Yaw-Ling Lin, Providence, Taiwan32 All-agreement Method

33 Yaw-Ling Lin, Providence, Taiwan33 leaf-agreement – k Trees

34 Yaw-Ling Lin, Providence, Taiwan34 Isomorphic Descendent Subtrees

35 Yaw-Ling Lin, Providence, Taiwan35 Isomorphic Descendent Subtrees (2)

36 Yaw-Ling Lin, Providence, Taiwan36 Conclusion Computing all pairs normalized cluster distances between all paired subtrees of two trees can be computationally optimally done in O(n 2 ) time Finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time. Finding all descendent subtrees consisting of the same set of leaves in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees. Finding all isomorhpic descendent subtrees in a set of (unbounded-degree) trees is solvable in time linear to the size of the input trees.

37 Yaw-Ling Lin, Providence, Taiwan37 Future Research Clustering analysis of 2CS for functional prediction of uncharacterized genes Co-evolutionary analysis of 2CS (Rooted / unrooted) phylogenetic trees comparison: when edges are labeled with (likelihood, log-odds) distances.

38 Yaw-Ling Lin, Providence, Taiwan38 The End

39 Yaw-Ling Lin, Providence, Taiwan39 What Date is Today? Magic Number: –4/4, 6/6, 8/8, 10/10, 12/12 –7/11, 9/5 [also 11/7, 5/9] –3/0? [implying 2/28, 2/0 = 1/31] Extension: –365 = 52 * 7 + 1 –Leap Year? 2003: 5 ; 2004: 7 ; 2005: 1 ; 2005:2


Download ppt "Descendent Subtrees Comparison of Phylogenetic Trees with Applications to Co-evolutionary Classifications in Bacterial Genome Yaw-Ling Lin 1 Tsan-Sheng."

Similar presentations


Ads by Google