Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai.

Similar presentations


Presentation on theme: "1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai."— Presentation transcript:

1 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai

2 2 Introduction  mining tree pattern T in a single graph Incremental in the number of nodes Unordered, rooted  For each tree T, all conjunctive queries are generated  SQL

3 3 Tree query pattern example  Selected node(constant):0,8  Existential node: ∃  Distinguished node: x

4 4 matching  A query Q matchs in a graph G  Homomorphism h (i,j) ∈ Q, (h(i), h(j)) ∈ G  Verify value on x to distinguish them Don ’ t care existential nodes on different values

5 5 ∃ 08 Q G Frequency = 3(4,5,8)

6 6 Generate all trees  Increasing number of nodes  Canonically ordered Level sequence  i th number is the depth of the i th node in preorder Lexicagraph:Maximal one Level sequence 012212 > 012122

7 7 queries  Levelwise  Fix a tree T, and find all queries based on T whose frequency in G is at lease k  Q{∏, ∑, λ} ∏: existential nodes ∑: selected nodes λ: label of selected nodes

8 8

9 9  To generate candidate in an efficient manner,using of candidacy tables and frequency tables

10 10 CanTab ∏, ∑  parents  Each candidacy table can be computed by taking the natural join of its parent ’ s(∏’, ∑’) frequency tables  CanTabφ,{x} as the table with a single column x,holding all nodes of the graph G being mined

11 11  ∏=x2,formulate expression->SQL  ∑={x 1,x 3 } Candidacy table Frequency table

12 12 Equivalent queries  To avoid query Q2 equivalent to an earlier query Q1  Containment mapping Q1 to Q2 is a homomorphism the distinguished variables of Q1 is mapping one-to-one to those of Q2 So as selected nodes  Case1:Q1 has fewer nodes than Q2  Case2:Q1 and Q2 have the same number of nodes

13 13 Case1 redundancy checking  Q2 contains redundant subtrees such that removing them yields an equivalent query  Redundancy a subtree C in the form of a linear chain of existential nodes such that parent of C has another subtree that is at least as deep as C Q1Q2

14 14 Case 2 canonical forms  Q1 and Q2 are tree isomorphism  Canonical forms Existential nodes-> ∃ Selceted nodes ->c Distinguished nodes->X C, ∃ ∃,C ∃,X C,X X,C X,X C, ∃ ∃,C ∃,X C,X X,C X,X

15 15 experiment  Pentium4 2.8GHz  1GB main memory  Linux 2.6  C++ embedded SQL  Relational database:DB2 UDB v8.2

16 16 Real dataset  A food web, a protein intersactions graph, and a citation graph  k: frequency threshold  Size: maximal size of trees in the run  It all takes several hours

17 17 Food web  154 species dependent on Scotch Broom  Label 20 occurs in many frequent patterns->Orthotylus adenocarpi( 什麼都 吃的植物害蟲 ) Frequency 176

18 18 Protein interaction graph  1870 種 Saccharomyces cerevisiae 發酵酵 母菌 ( 幫助麵包發酵 )  A small number of highly connected nodes occur

19 19 Citation graph  Kdd cup 2003  2500 papers high-energy physics  350,000 cross-references  Frequency 1655

20 20 Synthetic data,web graphs  Tree size 5  Minsup 4,10,25

21 21 Uniform random graphs  Dense, uniform minsup: 10,25 edges:47,264,997


Download ppt "1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai."

Similar presentations


Ads by Google