A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝.

Slides:



Advertisements
Similar presentations
On-line Construction of Suffix Trees Chairman : Prof. R.C.T. Lee Speaker : C. S. Wu ( ) June 10, 2004 Dept. of CSIE National Chi Nan University.
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
UNC Pixel Planes designed by Prof. Henry Fuchs et al.
22C:19 Discrete Structures Induction and Recursion Fall 2014 Sukumar Ghosh.
Frequent Closed Pattern Search By Row and Feature Enumeration
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Brandon Andrews CS6030.  What is a phylogenetic tree?  Goals in a phylogenetic tree generator  Distance based method  Fitch-Margoliash Method Example.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Constant-Time LCA Retrieval
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
1 A Linear-Time Algorithm for the Perfect Phylogeny Haplotyping (PPH) Problem Zhihong Ding, Vladimir Filkov, Dan Gusfield Department of Computer Science.
CS2420: Lecture 13 Vladimir Kulyukin Computer Science Department Utah State University.
Chapter 6: Transform and Conquer Trees, Red-Black Trees The Design and Analysis of Algorithms.
CISC667, F05, Lec15, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (II) Distance-based methods.
Distance between tree topologies. D C H Splits A B E F G {A}{BCDEFGH} {B}{ACDEFGH} {AB}{CDEFGH} {C}{ABDEFGH} {CD}{ABEFGH} {ABCD}{EFGH} Each split represents.
9/1/ Ultrametric phylogenies By Sivan Yogev Based on Chapter 11 from “Inferring Phylogenies” by J. Felsenstein.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
2-3 Trees Extended tree.  Tree in which all empty subtrees are replaced by new nodes that are called external nodes.  Original nodes are called internal.
Chapter 2 Graph Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Introduction Of Tree. Introduction A tree is a non-linear data structure in which items are arranged in sequence. It is used to represent hierarchical.
PHYLOGENETIC TREES Dwyane George February 24,
AVL Trees Neil Ghani University of Strathclyde. General Trees Recall a tree is * A leaf storing an integer * A node storing a left subtree, an integer.
In a rectangular grid, each cell is denoted by a pair (level, column) A rectangular grid : (0, 1)
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Computer Algorithms Submitted by: Rishi Jethwa Suvarna Angal.
Binary Tree Representation Of Trees Problems with trees. 2- and 3-nodes waste space. Overhead of moving pairs and pointers when changing among.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Faster Algorithm for String Matching with k Mismatches (II) Amihood Amir, Moshe Lewenstin, Ely Porat Journal of Algorithms, Vol. 50, 2004, pp
Data Structures Balanced Trees 1CSCI Outline  Balanced Search Trees 2-3 Trees Trees Red-Black Trees 2CSCI 3110.
OUTLINE Phylogeny UPGMA Neighbor Joining Method Phylogeny Understanding life through time, over long periods of past time, the connections between all.
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Constant-Time LCA Retrieval Presentation by Danny Hermelin, String Matching Algorithms Seminar, Haifa University.
On the Scalability of Computing Triplet and Quartet Distances Morten Kragelund Holt Jens Johansen Gerth Stølting Brodal 1 Aarhus University.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Comp. Genomics Recitation 8 Phylogeny. Outline Phylogeny: Distance based Probabilistic Parsimony.
Mining Evolutionary Model MEM Rida E. Moustafa And Edward J. Wegman George Mason University Phone:
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.
Measuring Closeness of Search Engine - Identification of Outliers - Visualization of Closeness Wang Hua 王化 情報科学科四年.
Binary decision diagrams (BDD’s) Compact representation of a logic function ROBDD’s (reduced ordered BDD’s) are a canonical representation: equivalence.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
1 More Trees Trees, Red-Black Trees, B Trees.
CHAPTER 11 TREES INTRODUCTION TO TREES ► A tree is a connected undirected graph with no simple circuit. ► An undirected graph is a tree if and only.
Molecular Evolution Distance Methods Biol. Luis Delaye Facultad de Ciencias, UNAM.
Distance-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Chapter 6 – Trees. Notice that in a tree, there is exactly one path from the root to each node.
More Trees. Outline Tree B-Tree 2-3 Tree Tree Red-Black Tree.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Warm Up  Use a ruler to draw a large triangle. Measure the angles of the triangle. Make a conclusion about the sum of the measure of the angles.
Computing the all-pairs quartet distance on a set of evolutionary trees Martin Stissing 1 Thomas Mailund 1,2 Christian Nørgaard Storm Pedersen 1 Gerth.
Quartet distance between general trees Chris Christiansen Thomas Mailund Christian N.S. Pedersen Martin Randers.
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Binary Search Tree Neil Tang 01/28/2010
CS2210:0001Discrete Structures Induction and Recursion
CS 2511 Fall 2014 Binary Heaps.
Data Mining Association Analysis: Basic Concepts and Algorithms
تصنيف التفاعلات الكيميائية
CS 581 Tandy Warnow.
String Data Structures and Algorithms
Speaker: Chuang-Chieh Lin National Chung Cheng University
Grade School Revisited: How To Multiply Two Numbers
Binary Search Tree Neil Tang 01/31/2008
Multi-phase process mining
Phylogeny.
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
GRAPHS.
Presentation transcript:

A Study on Measuring Distance between Two Trees 阮夙姿 教授 Advisor: 阮夙姿 教授 林陳輝 Presenter : 林陳輝

CSIE, National Chi Nan University2 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

CSIE, National Chi Nan University3 Introduction Evolutionary tree Comparing trees Comparing trees is not easy -Phylogenetic tree, wikipedia

CSIE, National Chi Nan University4 Mixture tree taxa Time S.-C. Chen and B. G. Lindsay, “Building Mixture Trees from Binary Sequence Data,” Biometrika, 2006.

CSIE, National Chi Nan University5 Problem definition ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 The leaves are associating taxas There is a time parameter on every internal node

CSIE, National Chi Nan University6 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

CSIE, National Chi Nan University7 Related work Path difference metric d p (T 1, T 2 ) = ||d(T 1 ) – d(T 2 )|| 2 d(T i ) is a vector that contains all pair leaves distance of T i. M. A. Steel and D. Penny, “Distributions of Tree Comparison Metrics – Some New Results,” Syst. Biol. 42(2): , 1993.

CSIE, National Chi Nan University8 Related work Nodal metric In full binary trees, the complexity is O(n 3 ). In complete binary trees, the complexity is O(n 2 log n). John Bluis and Dong-Guk Shin, “Nodal Distance Algorithm: Calculating a Phylogenetic Tree Comparison Metric,” Proc. of the 3rd IEEE Symposium on BioInformatics and BioEngineering, , 2003

CSIE, National Chi Nan University9 Related work Matching distance P. W. Diaconis and S. P. Holmes, “Matchings and Phylogenetic Trees.," Proc. Natl Acad Sci U S A, Vol. 95, No. 25, pp ~14602, The algorithm for matching distance G. Valiente, A Fast Algorithmic Technique for Comparing Large Phylogenetic Trees," SPIRE, pp. 370~375, 2005.

CSIE, National Chi Nan University10 Matching Representation {1,2}{5,6} {3,7}{4,8}{9,10}

CSIE, National Chi Nan University11 Matching distance {1,2}{5,6}{3,7}{4,8}{9,10} {1,3}{4,6}{2,7}{5,8}{9,10} The distance is T1T1 T2T2 T1T1 T2T2

CSIE, National Chi Nan University12 Outline Introduction Problem definition Related work The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusion and Future work

CSIE, National Chi Nan University13 Mixture distance and algorithms Definition: p T i (x, y) is time parameter of the LCA of leaves x, y ABCD v1v1 v3v3 v2v A B C D v1v1 v3v3 v2v2

CSIE, National Chi Nan University14 Distance conditions The distance from an object to itself is zero. The distance from A to B is the same as the distance from B to A. The Triangle Inequality holds true. - J. Felsenstein, Inferring phylogenies. Sunderland, MA: Sinauer Associates, 2004.

CSIE, National Chi Nan University15 Distance conditions Distance(T 1, T 2 ) + Distance(T 2, T 3 )  Distance(T 1, T 3 ) a, b and c  R + ∪ {0} |a – b| + |b – c|  |a – c|

CSIE, National Chi Nan University16 Algorithm C(n, 2) Algorithmic idea: grouping Full binary tree ABCD v1v1 v2v A B CD v1v1 v2v2 v3v3 v3v3 AB: |8 – 1| = 7 AC: |8 – 9| = 1 AD: |8 – 9| = 1 BC: |4 – 9| = 5 BD: |4 – 9| = 5 CD: |1 – 3| = 2 Distance = 21

CSIE, National Chi Nan University ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T2 Algorithm

CSIE, National Chi Nan University18 9 H GFA B CDE T2T2 Red:1 Green: ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green:1 Red:1 Green:0 Red:0 Green:1 Red:1 Green: v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:1 Green:1 Red:2 Green:2 T1T1 |p T 1 (v 1 ) - p T 2 (v 6 )| × (1 × 1+0 × 0) = |9 - 4| × (1*1+0*0) = 5 |p T 1 (v 1 ) - p T 2 (v 7 )| × (0 × 0+1 × 1) = |9 - 5| × (0*0+1*1) = 4 |p T 1 (v 1 ) - p T 2 (v 3 )| × (1 × 1+1 × 1) = |9 - 8| × (1*1+1*1) = 2

CSIE, National Chi Nan University19 T2T H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green: ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 Red:1 Green:0 Red:0 Green:0 Red:0 Green:2 Red:2 Green:0 |p T 1 (v 2 ) - p T 2 (v 2 )| × (2 × × 0) = |7 - 6| × (2 × × 0) = 0 |p T 1 (v 2 ) - p T 2 (v 3 )| × (0 × × 1) = |7 - 8| × (0 × × 1) = 0 |p T 1 (v 2 ) - p T 2 (v 1 )| × (2 × × 0) = |7 - 9| × (2 × × 0) = 8 Red:2 Green:2

CSIE, National Chi Nan University20 Complexity analysis For every internal node of T 1, coloring all leaves needs O(n). Counting distance in T 2 needs O(n). The time complexity is O(n 2 ).

CSIE, National Chi Nan University21 The modified algorithm Boost up the basic algorithm Too much empty color information

CSIE, National Chi Nan University22 T2T H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 Red:0 Green: ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T1 Red:1 Green:0 Red:0 Green:0 Red:0 Green:2 Red:2 Green:0 |p T 1 (v 2 ) - p T 2 (v 2 )| × (2 × × 0) = |7 - 6| × (2 × × 0) = 0 |p T 1 (v 2 ) - p T 2 (v 3 )| × (0 × × 1) = |7 - 8| × (0 × × 1) = 0 |p T 1 (v 2 ) - p T 2 (v 1 )| × (2 × × 0) = |7 - 9| × (2 × × 0) = 8 Red:2 Green:2 Empty color information

CSIE, National Chi Nan University23 T2T H G F A B C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T A B C D v1v1 v3v3 v4v4

CSIE, National Chi Nan University24 The modified algorithm Finding LCA in constant time with O(n) preprocessing MA Bender, MIF Colton, The LCA Problem Revisited, Proc. LATIN, way merge problem R.C.T. Lee, S. S. Tseng, R.C. Chang and Y. T. Tsai, Introduction to the Design and Analysis of Algorithms. McGraw-Hill Education, 2005

CSIE, National Chi Nan University H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T ABC D E F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T

CSIE, National Chi Nan University H GFABCD E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T , 211, 12 5,8 4, 9 1 3v4v4 |1 – 2|  (1   0) = ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T

CSIE, National Chi Nan University H G F AB C D E v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T2T , 2 11, 12 5,8 4, 9 1, 2, 11, 124, 5, 8, 9 1, 2, 4, 5, 8, 9, 11, 12 |9 – 7|  (2  2 – 0  0) = ABCDE F G H v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 T1T v1v1 v4v v7v H GAB

CSIE, National Chi Nan University28 Complexity analysis To reconstruct subtree of T 1 is in linear time Counting distance in reconstructed subtree needs O(m). The height of complete binary tree is O(logn) The total complexity is O(nlogn) in complete binary tree.

CSIE, National Chi Nan University29 Outline Introduction Problem definition Related works The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

CSIE, National Chi Nan University30 Mixture-matching distance Distance = i is matching distance between T 1 and T 2. P T m denotes the product of all time parameter in T m

CSIE, National Chi Nan University H GFABCD E T2T A B C DE F G H T1T {1, 2} {3, 4} {5, 6} {7, 8} {9,10} {11, 12} {13, 14} {1, 2} {3, 6} {4, 5} {7, 8} {9,12} {10, 11} {13, 14} Distance = 1 - (25920 / 60480) + 2 ≒ T1T1 T2T2

CSIE, National Chi Nan University ∞ The same No different leaves i i transposition Distance Distance = 1 - (25920 / 60480) + 2 ≒ The time complexity is O(n) Distance =

CSIE, National Chi Nan University33 Outline Introduction Problem definition Related works The metric and algorithms Mixture distance Basic algorithm The modified algorithm Mixture - matching distance Conclusions and Future work

CSIE, National Chi Nan University34 Conclusions MetricConsiderence Time complexity Full binary tree Complete binary tree Path difference metricStructureN/A Nodal distanceStructureO(n3)O(n3)O(n 2 logn) Mixture distance Structure and time parameter O(n2)O(n2)O(nlogn) Matching distanceStructureO(n)O(n) Mixture-matching distance Structure and time parameter O(n)O(n)

CSIE, National Chi Nan University35 Future work Improve the time complexity Extend to k - ary trees Add mutation point

Thanks for Your Listening.