1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

Slides:



Advertisements
Similar presentations
Graph Mining Laks V.S. Lakshmanan
Advertisements

Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline.
gSpan: Graph-based substructure pattern mining
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Mining Graphs.
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
CS 171: Introduction to Computer Science II
Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N. Kok ECML/PKDD-2003, Cavtat.
Association Analysis (7) (Mining Graphs)
Binary Decision Diagrams. ROBDDs Slide 2 Example Directed acyclic graph non-terminal node terminal node What function is represented by the graph?
Automated Extraction and Parameterization of Motions in Large Data Sets SIGGRAPH’ 2004 Lucas Kovar, Michael Gleicher University of Wisconsin-Madison.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
1 Seminar in Bioinformatics An efficient algorithm for detecting frequent subgraphs in biological networks Paper by: M. Koyuturk, A. Grama and W. Szpankowski.
Using Maximal Embedded Subtrees for Textual Entailment Recognition Sophia Katrenko & Pieter Adriaans Adaptive Information Disclosure project Human Computer.
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
“On an Algorithm of Zemlyachenko for Subtree Isomorphism” Yefim Dinitz, Alon Itai, Michael Rodeh (1998) Presented by: Masha Igra, Merav Bukra.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal, Surajit Chaudhuri, Gautam Das Cathy Wang
MINATO ZDD Project Efficient Enumeration of the Directed Binary Perfect Phylogenies from Incomplete Data Toshiki Saitoh (ERATO) Joint work with Masashi.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Connectivity1 Connectivity and Biconnectivity connected components cutvertices biconnected components.
Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words Dmitry Davidov, Ari Rappoport The Hebrew University.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Frequent Structure Mining Robert Howe University of Vermont Spring 2014.
Assessing the significance of (data mining) results Data D, an algorithm A Beautiful result A (D) But: what does it mean? How to determine whether the.
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
Date:2004/03/05 Mining Frequent Episodes for relating Financial Events and Stock Trends Anny Ng and Ada Wai-chee Fu PAKDD 2003 報告者: Ming Jing Tsai.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Chapter 6 – Trees. Notice that in a tree, there is exactly one path from the root to each node.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
2004/5/281 Approximate Counting of Frequent Query Patterns over XQuery Stream Liang Huai Yang, Mong Li Lee, Wynne HSU DASFAA 2004 Speaker:Ming Jing Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Mining Closed Relational Graphs with Connectivity Constraints Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05 ’ 報告者:蔡明瑾 2005/12/09.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Mining Frequent Subgraphs
Mining Complex Data COMP Seminar Spring 2011.
Mining Frequent Subgraphs
Binary Decision Diagrams
Comparative RNA Structural Analysis
Efficient Subgraph Similarity All-Matching
Mining Frequent Subgraphs
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Finding Frequent Itemsets by Transaction Mapping
Approximate Graph Mining with Label Costs
Presentation transcript:

1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai

2 Introduction  mining tree pattern T in a single graph Incremental in the number of nodes Unordered, rooted  For each tree T, all conjunctive queries are generated  SQL

3 Tree query pattern example  Selected node(constant):0,8  Existential node: ∃  Distinguished node: x

4 matching  A query Q matchs in a graph G  Homomorphism h (i,j) ∈ Q, (h(i), h(j)) ∈ G  Verify value on x to distinguish them Don ’ t care existential nodes on different values

5 ∃ 08 Q G Frequency = 3(4,5,8)

6 Generate all trees  Increasing number of nodes  Canonically ordered Level sequence  i th number is the depth of the i th node in preorder Lexicagraph:Maximal one Level sequence >

7 queries  Levelwise  Fix a tree T, and find all queries based on T whose frequency in G is at lease k  Q{∏, ∑, λ} ∏: existential nodes ∑: selected nodes λ: label of selected nodes

8

9  To generate candidate in an efficient manner,using of candidacy tables and frequency tables

10 CanTab ∏, ∑  parents  Each candidacy table can be computed by taking the natural join of its parent ’ s(∏’, ∑’) frequency tables  CanTabφ,{x} as the table with a single column x,holding all nodes of the graph G being mined

11  ∏=x2,formulate expression->SQL  ∑={x 1,x 3 } Candidacy table Frequency table

12 Equivalent queries  To avoid query Q2 equivalent to an earlier query Q1  Containment mapping Q1 to Q2 is a homomorphism the distinguished variables of Q1 is mapping one-to-one to those of Q2 So as selected nodes  Case1:Q1 has fewer nodes than Q2  Case2:Q1 and Q2 have the same number of nodes

13 Case1 redundancy checking  Q2 contains redundant subtrees such that removing them yields an equivalent query  Redundancy a subtree C in the form of a linear chain of existential nodes such that parent of C has another subtree that is at least as deep as C Q1Q2

14 Case 2 canonical forms  Q1 and Q2 are tree isomorphism  Canonical forms Existential nodes-> ∃ Selceted nodes ->c Distinguished nodes->X C, ∃ ∃,C ∃,X C,X X,C X,X C, ∃ ∃,C ∃,X C,X X,C X,X

15 experiment  Pentium4 2.8GHz  1GB main memory  Linux 2.6  C++ embedded SQL  Relational database:DB2 UDB v8.2

16 Real dataset  A food web, a protein intersactions graph, and a citation graph  k: frequency threshold  Size: maximal size of trees in the run  It all takes several hours

17 Food web  154 species dependent on Scotch Broom  Label 20 occurs in many frequent patterns->Orthotylus adenocarpi( 什麼都 吃的植物害蟲 ) Frequency 176

18 Protein interaction graph  1870 種 Saccharomyces cerevisiae 發酵酵 母菌 ( 幫助麵包發酵 )  A small number of highly connected nodes occur

19 Citation graph  Kdd cup 2003  2500 papers high-energy physics  350,000 cross-references  Frequency 1655

20 Synthetic data,web graphs  Tree size 5  Minsup 4,10,25

21 Uniform random graphs  Dense, uniform minsup: 10,25 edges:47,264,997