An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.

Slides:



Advertisements
Similar presentations
CS 253: Algorithms Chapter 22 Graphs Credit: Dr. George Bebis.
Advertisements

Algorithms for computing Canonical labeling of Graphs and Sub-Graph Isomorphism.
Analysis of Algorithms CS 477/677
 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
gSpan: Graph-based substructure pattern mining
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Graph-02.
Introduction to Graph Mining
Mining Graphs.
Association Analysis (7) (Mining Graphs)
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Statistical Analysis of Transaction Dataset Data Visualization Homework 2 Hongli Li.
Mining Graphs with Constrains on Symmetry and Diameter Natalia Vanetik Deutsche Telecom Laboratories at Ben-Gurion University IWGD10 workshop July 14th,
9.3 Representing Graphs and Graph Isomorphism
Mining Scientific Data Sets Using Graphs George Karypis Department of Computer Science & Engineering University of Minnesota (Michihiro Kuramochi & Mukund.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
© 2006 Pearson Addison-Wesley. All rights reserved14 A-1 Chapter 14 Graphs.
 期中测验时间:  11 月 4 日  课件 集合,关系,函数,基数, 组合数学.  Ⅰ Introduction to Set Theory  1. Sets and Subsets  Representation of set:  Listing elements, Set builder.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Lecture 10: Graphs Graph Terminology Special Types of Graphs
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 9 (Part 2): Graphs  Graph Terminology (9.2)
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
Mohammad Hasan, Mohammed Zaki RPI, Troy, NY. Consider the following problem from Medical Informatics Healthy Diseased Damaged Tissue Images Cell Graphs.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
1 12/2/2015 MATH 224 – Discrete Mathematics Formally a graph is just a collection of unordered or ordered pairs, where for example, if {a,b} G if a, b.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
Chapter 5 Graphs  the puzzle of the seven bridge in the Königsberg,  on the Pregel.
Patterns around Gnutella Network Nodes Sui-Yu Wang.
Graphs Basic properties.
Graph Theory. undirected graph node: a, b, c, d, e, f edge: (a, b), (a, c), (b, c), (b, e), (c, d), (c, f), (d, e), (d, f), (e, f) subgraph.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
+ GRAPH Algorithm Dikompilasi dari banyak sumber.
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
Week 11 - Wednesday.  What did we talk about last time?  Graphs  Paths and circuits.
Chap 7 Graph Def 1: Simple graph G=(V,E) V : nonempty set of vertices E : set of unordered pairs of distinct elements of V called edges Def 2: Multigraph.
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, George Karypis University of Minnesota,
Mining Closed Relational Graphs with Connectivity Constraints Xifeng Yan, X. Jasmine Zhou and Jiawei Han SIGKDD 05 ’ 報告者:蔡明瑾 2005/12/09.
Gspan: Graph-based Substructure Pattern Mining
CS 201: Design and Analysis of Algorithms
Mining in Graphs and Complex Structures
Lecture 19: CONNECTIVITY Sections
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Algorithms and networks
Mining Frequent Subgraphs
Connected Components Minimum Spanning Tree
Graph Database Mining and Its Applications
Mining Frequent Subgraphs
Algorithms and networks
Elementary Graph Algorithms
Chapter 14 Graphs © 2006 Pearson Addison-Wesley. All rights reserved.
Mining Frequent Subgraphs
Graphs G = (V, E) V are the vertices; E are the edges.
GRAPHS G=<V,E> Adjacent vertices Undirected graph
Algorithms Lecture # 27 Dr. Sohail Aslam.
Finding Frequent Itemsets by Transaction Mapping
Give the parent, queue, BFI (breadth first index), and level arrays when BFS is applied to this graph starting at vertex 0. Process the neighbours of each.
Approximate Graph Mining with Label Costs
Presentation transcript:

An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

Introduction  Structural pattern Biology, chemistry Chemical compounds  graph vertex – item edge – relation between items  Undirected connected labeled graph b a x a y x

Graph Isomorphism b a x a x y a b x a y x  G1(V1,E1) and G2(V2,E2) are topologically identical to each other.  There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa. v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 =

Canonical labeling  Adjacency list b a x a x y v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0bv0b v1av1a v2av2a x x x y x y code = baaxxy a b x a y x v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0av0a v1bv1b v2av2a x y x x y x code = abaxyx ||

Canonical labeling  Different permutation of vertices lead to different canonical label.  |v|!  Largest codes

Vertex invariants  Properties don ’ t change across isomorphism mappings. Vertex degree Vertex label siblings b a x a x y

Vertex Degrees and Labels  Adjacency Matrix  Partitioning verteices by degrees and labels that every partition contains vertices with same degree and label

Degree : p0={v0,v1,v3}:2 Degree+label : p0={ v1,v2}:(2,a),p1={v0}:(2,b) Vertex Degrees and Labels b a x a x y v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0bv0b v1av1a v2av2a x x x y x y code = baaxxy

Vertex Degrees and Labels b a x a x y v0v0 v1v1 v2v2 v1v1 v2v2 v0v0 v1av1a v2av2a v0bv0b y x y x x x code = aabyxx p0={ v1,v2}:2,a,p1={v0}:2,b 原本: 3! 現在: 2!x 1!

Running example minsup = g0g1g2 Tid_list{0,1,2}{0,2}{0,1}{2} cl Frequent 1_subgraph

Running example minsup =2 tid{0,1,2} cl010 child {0,2} 021 {0,1} Possible tid {0,1,2} c0 c2 c3 {0,2} {0,1} c1 {0,1,2} c0,c1,c2,c3 c2 c3 ……

c2 c c1 tid {0,2}{0,1,2}{0,1} cl 01201x10000x10203x21133x c4 tid{0,1,2} cl010 child c1,c2,c3 {0,2} 021 {0,1} c2 c3,c4 Frequent 2_subgraph

Frequency computing  Id-list  Intersection two k-subgraph ’ s id-list Frequent->find the support Not frequent -> pruned

Candidate generation  Joining two frequent k-subgraph ->k+1 candidate subgraph Having same k-1 core  Vertex labeling  Multiple cores  Multiple automorphisms

Vertex labeling

Multiple automorphism

Multiple cores

c2c c c4 tid{0,1,2} cl010 child c1,c2,c3 {0,2} 021 {0,1} c2 c3,c q1 tid {0,2} cl 01201x child {0,1,2} 10000x {0,1} 10203x {0,1} 21133x Possible tid {0, 2} q0,q1 q q {0,} q {0, 2} 不符合 downward closure

Experiment  AMD 1.53GHz  2GB main memory  Linux OS  chemical compound: PTE(340),66 atom types and four bond types,27 edges/graph on average DTP(223,644),104 atom types and three bound types and 22 edges/graph on average  Synthetic datasets

PTE and DTP

Synthetic datasets

Synthetic datasets |D|=10000,|S|=200,|L E |=1,minsup=2%