Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝.

Similar presentations


Presentation on theme: "A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝."— Presentation transcript:

1 A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

2 CSIE, National Chi Nan University2 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

3 CSIE, National Chi Nan University3 Introduction Evolutionary tree Comparing trees Comparing trees is not easy -Phylogenetic tree, wikipedia

4 CSIE, National Chi Nan University4 Mixture tree taxa Time S.-C. Chen and B. G. Lindsay, “Building Mixture Trees from Binary Sequence Data,” Biometrika, 2006.

5 CSIE, National Chi Nan University5 Problem definition 11 98 1 3 5 7 ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 The leaves are associating taxas There is a time parameter on every internal node

6 CSIE, National Chi Nan University6 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

7 CSIE, National Chi Nan University7 Related work Path difference metric d p (T 1, T 2 ) = ||d(T 1 ) – d(T 2 )|| 2 d(T i ) is a vector that contains all pair leaves distance of T i. M. A. Steel and D. Penny, “Distributions of Tree Comparison Metrics – Some New Results,” Syst. Biol. 42(2):126-141, 1993.

8 CSIE, National Chi Nan University8 Related work Nodal metric In full binary trees, the complexity is O(n 3 ). In complete binary trees, the complexity is O(n 2 log n). John Bluis and Dong-Guk Shin, “Nodal Distance Algorithm: Calculating a Phylogenetic Tree Comparison Metric,” Proc. of the 3rd IEEE Symposium on BioInformatics and BioEngineering, 87- 94, 2003

9 CSIE, National Chi Nan University9 Related work Matching distance P. W. Diaconis and S. P. Holmes, “Matchings and Phylogenetic Trees.," Proc. Natl Acad Sci U S A, Vol. 95, No. 25, pp. 14600~14602, 1998. The algorithm for matching distance G. Valiente, A Fast Algorithmic Technique for Comparing Large Phylogenetic Trees," SPIRE, pp. 370~375, 2005.

10 CSIE, National Chi Nan University10 Matching Representation 12 34 56 0 0 0 0 0 7 8 910 11 {1,2}{5,6} {3,7}{4,8}{9,10}

11 CSIE, National Chi Nan University11 Matching distance {1,2}{5,6}{3,7}{4,8}{9,10} {1,3}{4,6}{2,7}{5,8}{9,10} The distance is 2 34 56 8 9 10 7 12 25 46 8 9 7 13 11 T1T1 T2T2 T1T1 T2T2

12 CSIE, National Chi Nan University12 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusion and Future work

13 CSIE, National Chi Nan University13 Mixture distance and algorithms Definition: p T i (x, y) is time parameter of the LCA of leaves x, y 9 1 3 ABCD v1v1 v3v3 v2v2 9 2 3 A B C D v1v1 v3v3 v2v2

14 CSIE, National Chi Nan University14 Distance conditions The distance from an object to itself is zero. The distance from A to B is the same as the distance from B to A. The Triangle Inequality holds true. - J. Felsenstein, Inferring phylogenies. Sunderland, MA: Sinauer Associates, 2004.

15 CSIE, National Chi Nan University15 Distance conditions Distance(T 1, T 2 ) + Distance(T 2, T 3 )  Distance(T 1, T 3 ) a, b and c  R + ∪ {0} |a – b| + |b – c|  |a – c|

16 CSIE, National Chi Nan University16 Algorithm C(n, 2) Algorithmic idea: grouping Full binary tree 9 1 3 ABCD v1v1 v2v2 8 4 1 A B CD v1v1 v2v2 v3v3 v3v3 AB: |8 – 1| = 7 AC: |8 – 9| = 1 AD: |8 – 9| = 1 BC: |4 – 9| = 5 BD: |4 – 9| = 5 CD: |1 – 3| = 2 Distance = 21

17 CSIE, National Chi Nan University17 9 78 2 3 4 5 ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 9 68 1 3 4 5 H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 Algorithm

18 CSIE, National Chi Nan University18 9 H GFA B CDE T2T2 Red:1 Green:1 9 7 8 2 3 4 5 ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green:1 Red:1 Green:0 Red:0 Green:1 Red:1 Green:0 68 1 3 4 5 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:1 Green:1 Red:2 Green:2 T1T1 |p T 1 (v 1 ) - p T 2 (v 6 )| × (1 × 1+0 × 0) = |9 - 4| × (1*1+0*0) = 5 |p T 1 (v 1 ) - p T 2 (v 7 )| × (0 × 0+1 × 1) = |9 - 5| × (0*0+1*1) = 4 |p T 1 (v 1 ) - p T 2 (v 3 )| × (1 × 1+1 × 1) = |9 - 8| × (1*1+1*1) = 2

19 CSIE, National Chi Nan University19 T2T2 9 68 1 3 4 5 H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green:1 9 78 2 3 4 5 ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 Red:1 Green:0 Red:0 Green:0 Red:0 Green:2 Red:2 Green:0 |p T 1 (v 2 ) - p T 2 (v 2 )| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 |p T 1 (v 2 ) - p T 2 (v 3 )| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |p T 1 (v 2 ) - p T 2 (v 1 )| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 Red:2 Green:2

20 CSIE, National Chi Nan University20 Complexity analysis For every internal node of T 1, coloring all leaves needs O(n). Counting distance in T 2 needs O(n). The time complexity is O(n 2 ).

21 CSIE, National Chi Nan University21 The modified algorithm Boost up the basic algorithm Too much empty color information

22 CSIE, National Chi Nan University22 T2T2 9 68 1 3 4 5 H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green:1 9 78 2 3 4 5 ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 Red:1 Green:0 Red:0 Green:0 Red:0 Green:2 Red:2 Green:0 |p T 1 (v 2 ) - p T 2 (v 2 )| × (2 × 0 + 0 × 0) = |7 - 6| × (2 × 0 + 0 × 0) = 0 |p T 1 (v 2 ) - p T 2 (v 3 )| × (0 × 1 + 0 × 1) = |7 - 8| × (0 × 1 + 0 × 1) = 0 |p T 1 (v 2 ) - p T 2 (v 1 )| × (2 × 2 + 0 × 0) = |7 - 9| × (2 × 2 + 0 × 0) = 8 Red:2 Green:2 Empty color information

23 CSIE, National Chi Nan University23 T2T2 9 68 1 3 4 5 H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 9 8 1 A B C D v1v1 v3v3 v4v4

24 CSIE, National Chi Nan University24 The modified algorithm Finding LCA in constant time with O(n) preprocessing MA Bender, MIF Colton, The LCA Problem Revisited, Proc. LATIN, 2000 2-way merge problem R.C.T. Lee, S. S. Tseng, R.C. Chang and Y. T. Tsai, Introduction to the Design and Analysis of Algorithms. McGraw-Hill Education, 2005

25 CSIE, National Chi Nan University25 9 78 2 3 4 5 H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 9 68 1 3 4 5 ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 12 3 45 6 7 89 10 1112 13 14 15 12 4 58 91112

26 CSIE, National Chi Nan University26 9 78 2 3 4 5 H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 1 2 4 5 8911 12 1, 211, 12 5,8 4, 9 1 3v4v4 |1 – 2|  (1  1 + 0  0) = 1 9 68 1 3 4 5 ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 12 3 45 6 7 89 10 1112 13 14 15 12

27 CSIE, National Chi Nan University27 9 78 2 3 4 5 H G F AB C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 4 5 89 1112 1, 2 11, 12 5,8 4, 9 1, 2, 11, 124, 5, 8, 9 1, 2, 4, 5, 8, 9, 11, 12 |9 – 7|  (2  2 – 0  0) = 8 9 68 1 3 4 5 ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 12 3 45 6 7 89 10 1112 13 14 15 9 1 5 v1v1 v4v4 3 13 v7v7 1112 12 12 15 H GAB

28 CSIE, National Chi Nan University28 Complexity analysis To reconstruct subtree of T 1 is in linear time Counting distance in reconstructed subtree needs O(m). The height of complete binary tree is O(logn) The total complexity is O(nlogn) in complete binary tree.

29 CSIE, National Chi Nan University29 Outline Introduction Problem definition Related works The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

30 CSIE, National Chi Nan University30 Mixture-matching distance Distance = i is matching distance between T 1 and T 2. P T m denotes the product of all time parameter in T m

31 CSIE, National Chi Nan University31 9 78 2 3 4 5 H GFABCD E T2T2 9 68 1 3 4 5 A B C DE F G H T1T1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 12 4 5 8 911 10 3 6 7 12 13 14 15 {1, 2} {3, 4} {5, 6} {7, 8} {9,10} {11, 12} {13, 14} {1, 2} {3, 6} {4, 5} {7, 8} {9,12} {10, 11} {13, 14} Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 T1T1 T2T2

32 CSIE, National Chi Nan University32 0 1 ∞ The same No different leaves i i transposition Distance Distance = 1 - (25920 / 60480) + 2 ≒ 2.571 The time complexity is O(n) Distance =

33 CSIE, National Chi Nan University33 Outline Introduction Problem definition Related works The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

34 CSIE, National Chi Nan University34 Conclusions MetricConsiderence Time complexity Full binary tree Complete binary tree Path difference metricStructureN/A Nodal distanceStructureO(n3)O(n3)O(n 2 logn) Mixture distance Structure and time parameter O(n2)O(n2)O(nlogn) Matching distanceStructureO(n)O(n) Mixture-matching distance Structure and time parameter O(n)O(n)

35 CSIE, National Chi Nan University35 Future work Improve the time complexity Extend to k - ary trees Add mutation point

36 Thanks for Your Listening.


Download ppt "A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝."

Similar presentations


Ads by Google