University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.

Slides:



Advertisements
Similar presentations
Graph Mining Laks V.S. Lakshmanan
Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,
Mining for Tree-Query Associations in a Graph Jan Van den Bussche Hasselt University, Belgium joint work with Bart Goethals (U Antwerp, Belgium) and Eveline.
gSpan: Graph-based substructure pattern mining
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
NeMoFinder: Dissecting genome- wide protein-protein intractions with meso-scale network motifs Mike Yuan.
Introduction to Graph Mining
Frequent Structure Mining Prajwal Shrestha Department of Computer Science The University of Vermont Spring 2015.
Mining Graphs.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Frequent Subgraph Pattern Mining on Uncertain Graph Data
Rakesh Agrawal Ramakrishnan Srikant
Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N. Kok ECML/PKDD-2003, Cavtat.
1 Identifying Bug Signatures Using Discriminative Graph Mining Hong Cheng 1, David Lo 2, Yang Zhou 1, Xiaoyin Wang 3, and Xifeng Yan 4 1 Chinese University.
Chen Chen 1, Cindy X. Lin 1, Matt Fredrikson 2, Mihai Christodorescu 3, Xifeng Yan 4, Jiawei Han 1 1 University of Illinois at Urbana-Champaign 2 University.
Structural Web Search Using a Graph-Based Discovery System Nitish Manocha, Diane J. Cook, and Lawrence B. Holder University of Texas at Arlington
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Graph Mining - Motivation, Applications and Algorithms Graph mining seminar of Prof. Ehud Gudes Fall 2008/9.
Mining Tree-Query Associations in a Graph Bart Goethals University of Antwerp, Belgium Eveline Hoekx Jan Van den Bussche Hasselt University, Belgium.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Mining Graphs with Constrains on Symmetry and Diameter Natalia Vanetik Deutsche Telecom Laboratories at Ben-Gurion University IWGD10 workshop July 14th,
Mining Scientific Data Sets Using Graphs George Karypis Department of Computer Science & Engineering University of Minnesota (Michihiro Kuramochi & Mukund.
FAST FREQUENT FREE TREE MINING IN GRAPH DATABASES Marko Lazić 3335/2011 Department of Computer Engineering and Computer Science,
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Advanced Association Rule Mining and Beyond. Continuous and Categorical Attributes Example of Association Rule: {Number of Pages  [5,10)  (Browser=Mozilla)}
Storytelling and Clustering for Cellular Signaling Pathways M. Shahriar Hossain, Monika Akbar, Nicholas F. Polys Department of Computer Science, Virginia.
Frequent Structure Mining Presented By: Ahmed R. Nabhan Computer Science Department University of Vermont Fall 2011.
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
Patterns around Gnutella Network Nodes Sui-Yu Wang.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Knowledge Discovery from Transportation Network Data Paper Review Jiang, W., Vaidya, J., Balaporia, Z., Clifton, C., and Banich, B. Knowledge Discovery.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
Frequent Structure Mining Robert Howe University of Vermont Spring 2014.
Graph Indexing From managing and mining graph data.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
1 Data Mining: Principles and Algorithms Mining Homogeneous Networks Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign.
Data Mining: Principles and Algorithms Graph Pattern Mining Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
Frequent Sub-Structure-Based Approaches for Classifying Chemical Compounds Mukund Deshpande, Michihiro Kuramochi, George Karypis University of Minnesota,
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Gspan: Graph-based Substructure Pattern Mining
Cohesive Subgraph Computation over Large Graphs
Mining in Graphs and Complex Structures
Association rule mining
Probabilistic Data Management
Mining Frequent Subgraphs
September 19, 2018.
Graph Search with Indexing
Data Mining: Concepts and Techniques — Chapter 9 — 9.1. Graph mining
Mining, Indexing and Searching Graphs in Biological Databases
Graph Database Mining and Its Applications
Mining Frequent Subgraphs
Mining and Searching Graphs in Biological Databases
Slides are modified from Jiawei Han & Micheline Kamber
Mining Frequent Subgraphs
Approximate Graph Mining with Label Costs
Presentation transcript:

University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent Subgraph/ Substructure Mining Seminar 2009

University at BuffaloThe State University of New York Outline  Introduction  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

University at BuffaloThe State University of New York Graphs are everywhere

University at BuffaloThe State University of New York Graph Mining Problems  Graph Pattern Mining Frequent subgraph pattern mining Pattern summarization Optimal graph patterns Graph patterns with constraints Approximate graph patterns ….  Graph Classification Graph clustering Important node identification Bridge and hub identification  Other Important Topics Graph compression Graph model Social network analysis.

University at BuffaloThe State University of New York Subgraph pattern Mining  Frequent subgraph A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold  Application of subgraph pattern mining Mining biochemical structures Program control flow analysis Mining XML structures or Web communities Building blocks for graph classifiction, clustering,compression, comparison and correlation analysis.

University at BuffaloThe State University of New York (1) (2) (3) B C A A B A A B C C B C A A A subgraph 331 Support Frequent Subgraph Example

University at BuffaloThe State University of New York Key Challenges in Subgraph Mining  Graph isomorphism to detect if two graphs are identical in structure  Graph representation (Canonical Labeling) A canonical label is a unique code of a given graph. Canonical label should be the same no matter how graphs are represented, as long as graphs have the same topological structure and the same labeling of edges and vertices.  Subgraph candidate generation generate candidate frequent subgraphs from datasets

University at BuffaloThe State University of New York Subgraph Mining Approaches  Apriori-based AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages , Nov PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06)  Pattern growth based Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)

University at BuffaloThe State University of New York Outline  Introduction and Background  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

University at BuffaloThe State University of New York Apriori-based Approach  FSG : Frequent subgraph discovery. In ICDM’01, Nov M.Kuramochi and G. Karypis.  Flattened Representation as Canonical Labeling  Apriori-based method to generate subgraph candidate

University at BuffaloThe State University of New York Graph Representation in FSG  Flattened Representation

University at BuffaloThe State University of New York Graph Representation in FSG  Flatterned Representation Lexicographic order or dictionary order

University at BuffaloThe State University of New York Apriori-based method  Apriori Property If a graph is frequent, all of its subgraphs are frequent.  Candidate Generation Create a set of candidate size k+1 -from given two frequent k- subgraphs -containing the same (k-1)- subgraph -Result in several candidates size k+1

University at BuffaloThe State University of New York Apriori-based method  Graph candidate generated Example

University at BuffaloThe State University of New York Apriori-based method  FlowChart

University at BuffaloThe State University of New York Apriori-based method  Experiment Result - Chemical Compound Dataset, which contains 340 compounds,24 different atoms (vertices)

University at BuffaloThe State University of New York Outline  Introduction  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

University at BuffaloThe State University of New York Motivation of gSpan  Weakness of Apriori-based approach The generation of size (k+1) subgraph candidates from size k frequent subgraph too complicated and complex. Pruning false positive : subgraph isomorphism is an NP complete problem which is costly.  gSpan: Graph-Based Substructure Pattern Mining Change the way to represent a graph (DFS: Depth First Search) Using pattern growth to generate new subgraph candidate.

University at BuffaloThe State University of New York gSpan: Graph-Based Substructure Pattern Mining  DFS (Depth First Search) Code First Step: DFS the graph and use edges on the path to represent the graph. Second Step: DFS Lexicographic Order  Pattern Growth subgraph generation

University at BuffaloThe State University of New York DFS code An edge is presented by 5 tuples.

University at BuffaloThe State University of New York DFS code  Second Step: DFS Lexicographic Order

University at BuffaloThe State University of New York Pattern Growth Approach  Pattern Growth (free extension)

University at BuffaloThe State University of New York Pattern Growth Approach  Duplicate Graphs

University at BuffaloThe State University of New York Pattern Growth Approach  Free extension

University at BuffaloThe State University of New York Pattern Growth Approach  Right most extension

University at BuffaloThe State University of New York Pattern Growth Approach  Exmaples (cont.)

University at BuffaloThe State University of New York gSpan

University at BuffaloThe State University of New York gSpan

University at BuffaloThe State University of New York Pattern Growth Approach  Experimental result using Chemical data 340 molecules 66 atom types and 4 bond types as labels On average only 27 vertices with 28 edges

University at BuffaloThe State University of New York Summary  Graph representation Flattern representation vs. DFS code  Generation of Candidate Patterns apriori vs. pattern growth

University at BuffaloThe State University of New York

University at BuffaloThe State University of New York Pattern-Growth Approach

University at BuffaloThe State University of New York Frequent Graph Pattern Given a graph dataset D, find subgraph g, s.t. Where is the percentage of graphs in D that contain g. Problem 1 : Exponential Pattern Set Problem 2 : Threshold Setting

University at BuffaloThe State University of New York Difference between frequent itemset and frequent subgraph discovery

University at BuffaloThe State University of New York Frequent itemset discovery

University at BuffaloThe State University of New York subgraph Mining Algorithms  Apriori-based approach – AGM/AcGM: Inokuchi, et al. (PKDD’00) – FSG: Kuramochi and Karypis (ICDM’01) – PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) – FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) – FTOSM: Horvath et al. (KDD’06)  Pattern growth approach – Subdue: Holder et al. (KDD’94) – MoFa: Borgelt and Berthold (ICDM’02) – gSpan: Yan and Han (ICDM’02) – Gaston: Nijssen and Kok (KDD’04) – CMTreeMiner: Chi et al. (TKDE’05) – LEAP: Yan et al. (SIGMOD’08)

University at BuffaloThe State University of New York Framework of subraph Mining Algorithms  Search Order breadth vs. depth complete vs. incomplete  Generation of Candidate Patterns apriori vs. pattern growth  Discovery Order of Patterns DFS order path tree graph  Elimination of Duplicate Subgraphs passive vs. active  Support Calculation embedding store or not

University at BuffaloThe State University of New York Frequent Subgraph Examples:

University at BuffaloThe State University of New York Example (cont.)

University at BuffaloThe State University of New York Subgraph Mining Approaches Apriori-based approach AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages , Nov PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06) Pattern growth approach Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)

University at BuffaloThe State University of New York Outline  Introduction and Background  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary DFS code Yan, X. and Han, J gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721

University at BuffaloThe State University of New York Pattern Growth Approach