Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

Similar presentations


Presentation on theme: "© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)"— Presentation transcript:

1 © 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

2 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 2 Graph Patterns Interestingness measures / Objective functions Frequency: frequent graph pattern Discriminative: information gain, Fisher score Significance: G-test …

3 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 3 Frequent Graph Pattern

4 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 4 Optimal Graph Pattern (this work)

5 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 5 Objective Functions Challenge: Not Anti-Monotonic X

6 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 6 Challenge: Non Anti-Monotonic Anti-Monotonic Non Monotonic Non-Monotonic: Enumerate all subgraphs then check their score? Enumerate subgraphs : small-size to large-size

7 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 7 Frequent Pattern Based Mining Framework Exploratory task Graph clustering Graph classification Graph index (SIGMOD’04, ’05) (ISMB’05, ’07) Graph Database Frequent Patterns Optimal Patterns 1. Bottleneck : millions, even billions of patterns 2. No guarantee of quality

8 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 8 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Patterns Direct How?

9 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 9 Upper-Bound

10 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 10 Upper-Bound: Anti-Monotonic (cont.) Rule of Thumb : If the frequency difference of a graph pattern in the positive dataset and the negative dataset increases, the pattern becomes more interesting We can recycle the existing graph mining algorithms to accommodate non-monotonic functions.

11 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 11 Vertical Pruning Large <- small

12 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 12 Horizontal Pruning: Structural Proximity

13 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 13 Structural Proximity: Another Perspective # of frequent patterns >> # of possible frequency pairs Many patterns share the same score

14 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 14 Frequency Envelope

15 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 15 Structural Leap Search

16 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 16 Frequency Association Significant patterns often fall into the high-quantile of frequency Starting with the most frequent patterns

17 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 17 Descending Leap Mine 1. Structural Leap Search with frequency threshold 3. Structural Leap Search 2. Support-Descending Mining F(g*) converges

18 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 18 Results: NCI Anti-Cancer Screen Datasets Name# of CompoundsTumor Description MCF-727,770Breast MOLT-439,765Leukemia NCI-H2340,353Non-Small Cell Lung OVCAR-840,516Ovarian P38841,472Leukemia PC-327,509Prostate SF-29540,271Central Nerve System SN12C40,004Renal SW-62040,532Colon UACC25739,988Melanoma YEAST79,601Yeast anti-cancer Link: http://pubchem.ncbi.nlm.nih.gov Chemical Compounds: anti-cancer or not # of vertices: 10 ~ 200

19 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 19 Efficiency Vertical Pruning Horizontal Pruning

20 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 20 Effectiveness (runtime) frequency descending + leap mine

21 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 21 Effectiveness (accuracy) slightly different

22 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 22 Graph Classification NameOA KernelLEAPOA Kernel (6x)LEAP (6x) Average (AUC)0.700.720.750.77 * OA Kernel: Optimal Assignment Kernel LEAP: LEAP search (6x)

23 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 23 Scalability Means Something ! LEAP OA LEAP(6X) OA(6X) ~20sec ~100sec ~200sec ~8000sec Linear Quadratic

24 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 24 Direct Pattern Mining Framework Exploratory task Graph clustering Graph classification Graph index Graph DatabaseOptimal Graph Patterns Direct

25 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 25 Beyond Graph Patterns Exploratory task Clustering Classification Index itemset/sequence/tree Database Optimal Patterns Direct 1. Direct mining can be applied to itemsets, sequences, and trees 2.Existing algorithms can be recycled to mine patterns with sophisticated measures. 3.Pattern-based methods including indexing and classification are competitive.

26 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 26 Thank you Direct Mining of Discriminative and Essential Graphical and Itemset Features via Model-based Search Tree SIGKDD’08 @ Las Vegas

27 IBM T. J. Watson Research Center Graph Pattern Mining | © 2008 IBM Corporation 27 Graph Classification: Kernel Approach  Kernel-based Graph Classification  Optimal Assignment Kernel (Fröhlich et al. ICML’05)


Download ppt "© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)"

Similar presentations


Ads by Google