Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Longest Common Subsequence
DYNAMIC PROGRAMMING ALGORITHMS VINAY ABHISHEK MANCHIRAJU.
Improved Approximation Algorithms for the Spanning Star Forest Problem Prasad Raghavendra Ning ChenC. Thach Nguyen Atri Rudra Gyanit Singh University of.
SUNY Oneonta Data Structures and Algorithms Visualization Teaching Materials Generation Group Binary Search Tree A running demonstration of binary search.
CPSC 335 Dynamic Programming Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Chapter 7 Dynamic Programming.
3 -1 Chapter 3 String Matching String Matching Problem Given a text string T of length n and a pattern string P of length m, the exact string matching.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Diego Milano, Monica Scannapieco and Tiziana Catarci Università di Roma “La Sapienza” Dipartimento di Informatica e Sistemistica
§ 8 Dynamic Programming Fibonacci sequence
1 Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a 1, a 2, …, a n } and a size.
CS2420: Lecture 13 Vladimir Kulyukin Computer Science Department Utah State University.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Tree structured representation of music for polyphonic music information retrieval David Rizo Departament of Software and Computing Systems University.
CPSC 668Set 10: Consensus with Byzantine Failures1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.
A Simple Algorithm for the Constrained Sequence Problems Francis Y.L. Chin, Alfredo De Santis, Anna Lisa Ferrara, N.L. Ho and S.K. Kim Information Processing.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
1 Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a 1, a 2, …, a n } and a size.
Protein Sequence Classification Using Neighbor-Joining Method
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Tree edit distance1 Tree Edit Distance.  Minimum edits to transform one tree into another Tree edit distance2 TED.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 20: Binary Trees.
Important Problem Types and Fundamental Data Structures
Lecture 5 Dynamic Programming. Dynamic Programming Self-reducibility.
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Approximate XML Joins Huang-Chun Yu Li Xu. Introduction XML is widely used to integrate data from different sources. Perform join operation for XML documents:
Algorithm Paradigms High Level Approach To solving a Class of Problems.
Post-processing long pairwise alignments 陳啟煌 93/4/28 Zheng Zhang et al., Bioinformatics Vol.15 no
CSED101 INTRODUCTION TO COMPUTING TREE 2 Hwanjo Yu.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Dynamic Programming: Edit Distance
COSC 5341 High-Performance Computer Networks Presentation for By Linghai Zhang ID:
Zhang & Liang, Computer Graphics Using Java 2D and 3D (c) 2007 Pearson Education, Inc. All rights reserved. 1 Chapter 11 Animation.
Computer Sciences Department1.  Property 1: each node can have up to two successor nodes (children)  The predecessor node of a node is called its.
Tree isomorphism Bogdan Kalashnikov FI-2
1 Joe Meehean. A A B B D D I I C C E E X X A A B B D D I I C C E E X X  Terminology each circle is a node pointers are edges topmost node is the root.
Measuring the Structural Similarity of Semistructured Documents Using Entropy Sven Helmer University of London, Birkbeck VLDB’07, September 23-28, 2007,
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Dynamic Programming Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan
Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes Changqing Li, Tok Wang Ling, Min Hu.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Indian Institute of Technology Kharagpur PALLAB DASGUPTA Graph Theory: Trees Pallab Dasgupta, Professor, Dept. of Computer Sc. and Engineering, IIT
Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.
Tree structured and combined methods for comparing metered polyphonic music Kjell Lëmstrom David Rizo Valero José Manuel Iñesta CMMR’08 May 21, 2008.
Mehdi Kargar Department of Computer Science and Engineering
Decision Trees DEFINITION: DECISION TREE A decision tree is a tree in which the internal nodes represent actions, the arcs represent outcomes of an action,
Taku Aratsu1, Kouichi Hirata1 and Tetsuji Kuboyama2
Distance Functions for Sequence Data and Time Series
Order maintenance problem
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
BNFO 602 Phylogenetics Usman Roshan.
Dynamic Programming Computation of Edit Distance
Comparative RNA Structural Analysis
The Longest Common Subsequence Problem
Unit-4: Dynamic Programming
Multiple Genome Rearrangement
Md. Abul Kashem, Chowdhury Sharif Hasan, and Anupam Bhattacharjee
CIS595: Lecture 5 Acknowledgement:
Double Cut and Join with Insertions and Deletions
Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a1, a2, …, an} and a size s(a)
Chap 3 String Matching 3 -.
Lecture 5 Dynamic Programming
Order maintenance problem
Data Structures Using C++ 2E
Dynamic Programming Kun-Mao Chao (趙坤茂)
Class 11 Max Flows Obtain a network, and use the same network to illustrate the shortest path problem for communication networks, the max flow.
Presentation transcript:

Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting

The Forest Edit Distance

Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F E F a h fm a me z v uy

Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F  Relabel (3,5) E F a h fm a me z v uy  y

Edit distance of two ordered, labeled forests Edit operations between E and F  Relabling node i in E by the label of node j in F  Cost of the operation:  (3,5) E F a h fm a me z v uy  p

Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E E F a h fm a me z v uy

Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Delete (2,-) E F a h fm a me z v uy

Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Delete (2,-) E F a h m a me z v uy

Edit distance of two ordered, labeled forests Edit operations between E and F  Delete node i from E  Cost of the operation:  (2,-) E F a h m a me z v uy

Edit distance of two ordered, labelled forests Edit operations between E and F  Delete node j from F  The cost of operation:  (-,j) E F a h fm a me z v uy

Edit distance of two ordered, labelled forests The edit distance  (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' E F a h fm a me z v uy a h fm a me z v uy

Edit distance of two ordered, labelled forests The edit distance  (E,F) between E and F is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' E F a h fm a me z v uy a h fm a me z v uy  e

Edit distance of two ordered, labelled forests The Guided edit distance  (E,F,G) between E and F with respect to a third forest G is the minimum cost of edit operations that transform E to E' and F to F' such that E' = F' include G as a subforest E F a h fm a me z v uy a m a mee 3 12 a me G

Application 1: RNA comparisons Cherry small circular viroid-Like RNA GI: between base 287 and base 337. T he Hammerhead motif of the RNA is printed in bold.

Application 2: Comparing XML documents XML documents with same Document Type Descriptor should be aligned with this DTD to get more accurate results

The algorithms  (E,F)  Tai 1979:  Zhang and Shasha 1989: where  Klein 1998:  (E,F,G) :  This paper:

Special Cases a a c c b a c c a c c f f

a a c c b a c c a c c f f Longest Constraint Common Subsequence Constrained Sequence Alignment

The algorithms Constrained Longest Common Subsequent  Tsai 2003: Constrained Sequence Alignment  Chin et al. : This paper: where Since G has one leaf, the time becomes

Our algorithm for computing  (E,F,G) Dynamic Programming

The sub-problems Post-order numbering (naming) of the nodes

The sub-problems : A "consecutive" sub-forest

The sub-problems : A "consecutive" sub-forest

The sub-problems E FG

The sub-problems E FG

is equal to the minimum of the followings:

E FG 

E FG 

E FG 

E FG

E FG

E FG

E FG

E FG

E FG

E FG

The order for solving the sub-problems for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf then find

The time complexity

Sparsify the dynamic program using a clever trick of Zhang and Shasha

key-root: if it is the root, or has a left-slibling E FG 2 1

E FG 2 1 No. of key-roots ≤ no. of leaves

To compute  (E,F,G)=  (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf find

To compute  (E,F,G)=  (E|| 1..|E|,F|| 1..|F|,G|| 1..|G| ) for i=1 to |E| for j=1 to |F| for h=1 to |G| for k=1 to (|G|-h+1) if k is a leaf and i and j are key-roots find

The new running time

Thank you