Presentation is loading. Please wait.

Presentation is loading. Please wait.

University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.

Similar presentations


Presentation on theme: "University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent."— Presentation transcript:

1 University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent Subgraph/ Substructure Mining Seminar 2009

2 University at BuffaloThe State University of New York Outline  Introduction  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

3 University at BuffaloThe State University of New York Graphs are everywhere

4 University at BuffaloThe State University of New York Graph Mining Problems  Graph Pattern Mining Frequent subgraph pattern mining Pattern summarization Optimal graph patterns Graph patterns with constraints Approximate graph patterns ….  Graph Classification Graph clustering Important node identification Bridge and hub identification  Other Important Topics Graph compression Graph model Social network analysis.

5 University at BuffaloThe State University of New York Subgraph pattern Mining  Frequent subgraph A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold  Application of subgraph pattern mining Mining biochemical structures Program control flow analysis Mining XML structures or Web communities Building blocks for graph classifiction, clustering,compression, comparison and correlation analysis.

6 University at BuffaloThe State University of New York (1) (2) (3) B C A A B A A B C C B C A A A subgraph 331 Support Frequent Subgraph Example

7 University at BuffaloThe State University of New York Key Challenges in Subgraph Mining  Graph isomorphism to detect if two graphs are identical in structure  Graph representation (Canonical Labeling) A canonical label is a unique code of a given graph. Canonical label should be the same no matter how graphs are represented, as long as graphs have the same topological structure and the same labeling of edges and vertices.  Subgraph candidate generation generate candidate frequent subgraphs from datasets

8 University at BuffaloThe State University of New York Subgraph Mining Approaches  Apriori-based AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages 313-320, Nov. 2001 PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06)  Pattern growth based Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)

9 University at BuffaloThe State University of New York Outline  Introduction and Background  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

10 University at BuffaloThe State University of New York Apriori-based Approach  FSG : Frequent subgraph discovery. In ICDM’01, Nov. 2001 M.Kuramochi and G. Karypis.  Flattened Representation as Canonical Labeling  Apriori-based method to generate subgraph candidate

11 University at BuffaloThe State University of New York Graph Representation in FSG  Flattened Representation

12 University at BuffaloThe State University of New York Graph Representation in FSG  Flatterned Representation Lexicographic order or dictionary order

13 University at BuffaloThe State University of New York Apriori-based method  Apriori Property If a graph is frequent, all of its subgraphs are frequent.  Candidate Generation Create a set of candidate size k+1 -from given two frequent k- subgraphs -containing the same (k-1)- subgraph -Result in several candidates size k+1

14 University at BuffaloThe State University of New York Apriori-based method  Graph candidate generated Example

15 University at BuffaloThe State University of New York Apriori-based method  FlowChart

16 University at BuffaloThe State University of New York Apriori-based method  Experiment Result - Chemical Compound Dataset, which contains 340 compounds,24 different atoms (vertices)

17 University at BuffaloThe State University of New York Outline  Introduction  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary

18 University at BuffaloThe State University of New York Motivation of gSpan  Weakness of Apriori-based approach The generation of size (k+1) subgraph candidates from size k frequent subgraph too complicated and complex. Pruning false positive : subgraph isomorphism is an NP complete problem which is costly.  gSpan: Graph-Based Substructure Pattern Mining Change the way to represent a graph (DFS: Depth First Search) Using pattern growth to generate new subgraph candidate.

19 University at BuffaloThe State University of New York gSpan: Graph-Based Substructure Pattern Mining  DFS (Depth First Search) Code First Step: DFS the graph and use edges on the path to represent the graph. Second Step: DFS Lexicographic Order  Pattern Growth subgraph generation

20 University at BuffaloThe State University of New York DFS code An edge is presented by 5 tuples.

21 University at BuffaloThe State University of New York DFS code  Second Step: DFS Lexicographic Order

22 University at BuffaloThe State University of New York Pattern Growth Approach  Pattern Growth (free extension)

23 University at BuffaloThe State University of New York Pattern Growth Approach  Duplicate Graphs

24 University at BuffaloThe State University of New York Pattern Growth Approach  Free extension

25 University at BuffaloThe State University of New York Pattern Growth Approach  Right most extension

26 University at BuffaloThe State University of New York Pattern Growth Approach  Exmaples (cont.)

27 University at BuffaloThe State University of New York gSpan

28 University at BuffaloThe State University of New York gSpan

29 University at BuffaloThe State University of New York Pattern Growth Approach  Experimental result using Chemical data 340 molecules 66 atom types and 4 bond types as labels On average only 27 vertices with 28 edges

30 University at BuffaloThe State University of New York Summary  Graph representation Flattern representation vs. DFS code  Generation of Candidate Patterns apriori vs. pattern growth

31 University at BuffaloThe State University of New York

32 University at BuffaloThe State University of New York Pattern-Growth Approach

33 University at BuffaloThe State University of New York Frequent Graph Pattern Given a graph dataset D, find subgraph g, s.t. Where is the percentage of graphs in D that contain g. Problem 1 : Exponential Pattern Set Problem 2 : Threshold Setting

34 University at BuffaloThe State University of New York Difference between frequent itemset and frequent subgraph discovery

35 University at BuffaloThe State University of New York Frequent itemset discovery

36 University at BuffaloThe State University of New York subgraph Mining Algorithms  Apriori-based approach – AGM/AcGM: Inokuchi, et al. (PKDD’00) – FSG: Kuramochi and Karypis (ICDM’01) – PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) – FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) – FTOSM: Horvath et al. (KDD’06)  Pattern growth approach – Subdue: Holder et al. (KDD’94) – MoFa: Borgelt and Berthold (ICDM’02) – gSpan: Yan and Han (ICDM’02) – Gaston: Nijssen and Kok (KDD’04) – CMTreeMiner: Chi et al. (TKDE’05) – LEAP: Yan et al. (SIGMOD’08)

37 University at BuffaloThe State University of New York Framework of subraph Mining Algorithms  Search Order breadth vs. depth complete vs. incomplete  Generation of Candidate Patterns apriori vs. pattern growth  Discovery Order of Patterns DFS order path tree graph  Elimination of Duplicate Subgraphs passive vs. active  Support Calculation embedding store or not

38 University at BuffaloThe State University of New York Frequent Subgraph Examples:

39 University at BuffaloThe State University of New York Example (cont.)

40 University at BuffaloThe State University of New York Subgraph Mining Approaches Apriori-based approach AGM/AcGM: Inokuchi, et al. (PKDD’00) FSG: Kuramochi and Karypis (ICDM’01) M. Kuramochi and G. Karypis. Frequent subgraph discovery. In ICDM’01, pages 313-320, Nov. 2001 PATH#: Vanetik and Gudes (ICDM’02, ICDM’04) FFSM: Huan, et al. (ICDM’03) and SPIN: Huan et al. (KDD’04) FTOSM: Horvath et al. (KDD’06) Pattern growth approach Subdue: Holder et al. (KDD’94) MoFa: Borgelt and Berthold (ICDM’02) gSpan: Yan and Han (ICDM’02) Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721 Gaston: Nijssen and Kok (KDD’04) CMTreeMiner: Chi et al. (TKDE’05) LEAP: Yan et al. (SIGMOD’08)

41 University at BuffaloThe State University of New York Outline  Introduction and Background  Apriori-based Subgrah Mining  Pattern Growth Subgraph Mining  Summary DFS code Yan, X. and Han, J. 2002. gSpan : Graph-Based Substructure Pattern Mining. In Proceedings of the 2002 IEEE international Conference on Data Mining (Icdm’02) (December 09-12, 2002). ICDM. IEEE Computer Society, Washington, DC, 721

42 University at BuffaloThe State University of New York Pattern Growth Approach


Download ppt "University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent."

Similar presentations


Ads by Google