Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141.

Similar presentations


Presentation on theme: "Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141."— Presentation transcript:

1 Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141

2 Outline Ch1 Introduction Ch2 Preliminaries Ch3 Frequent Fragment Ch4 Discriminative Fragment Ch5 gIndex Ch6 Experimental Result Improvement Maintenance 2013/11/142

3 Ch1 Introduction 2013/11/143

4 Ch1 Introduction 2013/11/144

5 Ch1 Introduction Build graph index Path-based index is inefficient. 2013/11/145 Too many paths

6 Ch1 Introduction Build graph index Graph-based index is suitable. 2013/11/146 Only one result

7 Ch2 Preliminaries 2013/11/147

8 Ch2 Preliminaries 2013/11/148

9 Ch2 Preliminaries 2013/11/149

10 Ch2 Preliminaries 2013/11/1410

11 Ch3 Frequent Fragment 2013/11/1411

12 Ch3 Frequent Fragment minSup: 2 indexed 2013/11/1412

13 Ch3 Frequent Fragment If query Q is frequent, We can easily find Q indexed 2013/11/1413

14 Ch3 Frequent Fragment If query Q is not frequent? 2013/11/1414

15 Ch3 Frequent Fragment Find the frequent subgraphs of Q! 2013/11/1415

16 Ch3 Frequent Fragment 2013/11/1416

17 Ch3 Frequent Fragment 2013/11/1417

18 Ch3 Frequent Fragment 2013/11/1418

19 Ch3 Frequent Fragment 2013/11/1419

20 Ch3 Frequent Fragment 2013/11/1420

21 Ch3 Frequent Fragment 2013/11/1421

22 Ch3 Frequent Fragment 2013/11/1422

23 Ch3 Frequent Fragment 2013/11/1423

24 Ch4 Discriminative Fragment 2013/11/1424

25 Ch4 Discriminative Fragment 2013/11/1425

26 Ch4 Discriminative Fragment 2013/11/1426

27 Ch4 Discriminative Fragment 2013/11/1427

28 Ch4 Discriminative Fragment 2013/11/1428

29 Ch5 gIndex 2013/11/1429

30 Ch5 gIndex 5.1 Discriminative fragment selection 5.2 Index construction 5.3 Search 2013/11/1430

31 5.1Discriminative fragment selection 2013/11/1431

32 5.2Index construction 5.2.1Graph Sequentialization 5.2.2gIndex Tree 5.2.3Remark on gIndex Tree Size 5.2.4gIndex Tree Implementation 2013/11/1432

33 5.2.1 Graph Sequentialization Adjacency matrices DFS code 2013/11/1433

34 5.2.2 gIndex Tree C-C C-C-C C-C-C-C C-C C-C-C C-C-C-C … C-C-C-C C-C-C-C-C … … 2013/11/1434

35 5.2.3 Remark on gIndex Tree Size 0 1 2 … K-1 2013/11/1435

36 5.2.4 gIndex Tree Implementation 2013/11/1436

37 5.3 Search 5.3.1 Apriori Pruning 5.3.2 Maximum Discriminative Fragments 2013/11/1437

38 5.3.1 Apriori Pruning If a fragment is not in the gIndex tree, we need not check its super-graphs any more. A hash table H is used to facilitate the Apriori pruning. 2013/11/1438

39 5.3.2 Maximum Discriminative Fragments 2013/11/1439

40 Ch6 Experimental Result 2013/11/1440

41 Experimental Result The performance of gIndex is compared with that of GraphGrep GraphGrep is a path-based approach two kinds of datasets in the experiments -one real dataset -a series of synthetic datasets 2013/11/1441

42 Dataset The real dataset is that of an AIDS antiviral( 抗病毒藥物 ) screen dataset containing chemical compounds the dataset contains 43,905 classified chemical molecules The synthetic data generator was provided by Kuramochi et al. allows the user tospecify the number of graphs (D), their average size(T), the number of seed graphs (S), the average size of seed graphs (I), and the number of distinct labels(L) 2013/11/1442

43 Experiment Background experiments are performed on a 1.5GHZ, 1GB- memory, Intel PC running RedHat 8.0 Both GraphGrep and gIndex are compiled with gcc/g++ 2013/11/1443

44 AIDS Antiviral Screen Dataset 2013/11/1444

45 Experimental Result the index size of gIndex is at least 10 times smaller than that of GraphGrep two salient properties of gIndex: its index size is small and stable 2013/11/1445

46 Experimental Result the size of candidate answer set Cq : | Cq | AVG(|Dq|) : the lower bound of AVG(|Cq|) An algorithm achieving this lower bound actually matches the queries in the graph dataset precisely 2013/11/1446

47 Experimental Result Q4Q4 queries in Q 4 are more likely path-structured (Query answer set size 較少 ) 2013/11/1447

48 Experimental Result (Query answer set size 較多 ) 2013/11/1448

49 Experimental Result 2013/11/1449

50 Experimental Result The scalability of gIndex 2013/11/1450

51 Synthetic Dataset 2013/11/1451

52 Experimental Result it has 10,000 graphs and uses 1,000 seed fragments with 50 distinct labels. On average, each graph has 20 edges and each seed fragment has 10 edges 2013/11/1452

53 Experimental Result 2013/11/1453

54 Improvement Size-increasing support constraint Relationship between minSup & number of candidates: Large minSup -> less frequent fragments & pruning effect Small minSup -> less candidates, but index size dramatically increases So, we must adapt different minSup for each size of fragments Inner Support Previous idea doesn't take multiple embeddings of a feature in one graph into consideration. Inner support: number of embeddings of a subgraph. It helps remove many impossible candidates within an id list, but size of id lists doubles. Advantage of statistics For a large graph database, index generation is time-consuming. Instead, we can construct the index from sample of data. Maintenance 2013/11/1454

55 Maintenance Small number of insertions/deletions affects only id lists. When number of insertions increases, size of candidates indicates quality of current gIndex. When number of deletions increases, may some id lists become empty? How to keep quality of gIndex after a lot of changes? Can we adjust gIndex according to a trendency of latest queries? 2013/11/1455

56 Thank for your attention! 2013/11/1456 Questions?


Download ppt "Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/14 2013/11/141."

Similar presentations


Ads by Google