Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.

Similar presentations


Presentation on theme: "An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾."— Presentation transcript:

1 An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾

2 Introduction  Structural pattern Biology, chemistry Chemical compounds  graph vertex – item edge – relation between items  Undirected connected labeled graph b a x a y x

3 Graph Isomorphism b a x a x y a b x a y x  G1(V1,E1) and G2(V2,E2) are topologically identical to each other.  There is a mapping from v1 to v2,such that each edge in E1 is mapped to E2 and vice versa. v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 =

4 Canonical labeling  Adjacency list b a x a x y v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0bv0b v1av1a v2av2a x x x y x y code = baaxxy a b x a y x v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0av0a v1bv1b v2av2a x y x x y x code = abaxyx ||

5 Canonical labeling  Different permutation of vertices lead to different canonical label.  |v|!  Largest codes

6 Vertex invariants  Properties don ’ t change across isomorphism mappings. Vertex degree Vertex label siblings b a x a x y

7 Vertex Degrees and Labels  Adjacency Matrix  Partitioning verteices by degrees and labels that every partition contains vertices with same degree and label

8 Degree : p0={v0,v1,v3}:2 Degree+label : p0={ v1,v2}:(2,a),p1={v0}:(2,b) Vertex Degrees and Labels b a x a x y v0v0 v1v1 v2v2 v0v0 v1v1 v2v2 v0bv0b v1av1a v2av2a x x x y x y code = baaxxy

9 Vertex Degrees and Labels b a x a x y v0v0 v1v1 v2v2 v1v1 v2v2 v0v0 v1av1a v2av2a v0bv0b y x y x x x code = aabyxx p0={ v1,v2}:2,a,p1={v0}:2,b 原本: 3! 現在: 2!x 1!

10 Running example minsup =2 0 1 02 121 0 0 0 3 13 0 1 02 1 0 0 3 3 0 1 0 2 4 0 0 1 1 0 1 0 0 2 1 1 2 3 2 4 1 g0g1g2 Tid_list{0,1,2}{0,2}{0,1}{2} cl010021123 Frequent 1_subgraph

11 Running example minsup =2 tid{0,1,2} cl010 child {0,2} 021 {0,1} 123 0 1 0 0 2 1 1 2 3 0 12 01 0 11 00 0 1 0 1 2 3 Possible tid {0,1,2} c0 c2 c3 {0,2} {0,1} 0 1 0 1 0 0 c1 {0,1,2} c0,c1,c2,c3 c2 c3 ……

12 0 12 01 0 1 0 2 3 c2 c3 0 1 0 1 0 0 c1 tid {0,2}{0,1,2}{0,1} cl 01201x10000x10203x21133x 1 2 3 1 3 c4 tid{0,1,2} cl010 child c1,c2,c3 {0,2} 021 {0,1} 123 0 1 0 0 2 1 1 2 3 c2 c3,c4 Frequent 2_subgraph

13 Frequency computing  Id-list  Intersection two k-subgraph ’ s id-list Frequent->find the support Not frequent -> pruned

14 Candidate generation  Joining two frequent k-subgraph ->k+1 candidate subgraph Having same k-1 core  Vertex labeling  Multiple cores  Multiple automorphisms

15 Vertex labeling

16 Multiple automorphism

17 Multiple cores

18 0 1 2 0 1 0 1 0 2 3 c2c3 0 1 0 1 0 0 c1 1 2 3 1 3 c4 tid{0,1,2} cl010 child c1,c2,c3 {0,2} 021 {0,1} 123 0 1 0 0 2 1 1 2 3 c2 c3,c4 0 12 01 q1 tid {0,2} cl 01201x child {0,1,2} 10000x {0,1} 10203x {0,1} 21133x 1 1 0 0 0 1 2 0 1 2 1 Possible tid {0, 2} q0,q1 q0 0 2 01 q2 1 0 0 {0,} q1 0 2 1 1 0 1 0 {0, 2} 不符合 downward closure

19 Experiment  AMD 1.53GHz  2GB main memory  Linux OS  chemical compound: PTE(340),66 atom types and four bond types,27 edges/graph on average DTP(223,644),104 atom types and three bound types and 22 edges/graph on average  Synthetic datasets

20 PTE and DTP

21 Synthetic datasets

22 Synthetic datasets |D|=10000,|S|=200,|L E |=1,minsup=2%


Download ppt "An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾."

Similar presentations


Ads by Google