Presentation is loading. Please wait.

Presentation is loading. Please wait.

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei.

Similar presentations


Presentation on theme: "Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei."— Presentation transcript:

1 Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure How to find good features from semi-structured raw data for classification

2 Feature Construction Most data mining and machine learning model assume the following structured data: (x 1, x 2,..., x k ) -> y where xis are independent variable y is dependent variable. y drawn from discrete set: classification y drawn from continuous variable: regression When feature vectors are good, differences in accuracy among learners are not much. Questions: where do good features come from?

3 Frequent Pattern-Based Feature Extraction Data not in the pre-defined feature vectors Transactions Biological sequence Graph database Frequent pattern is a good candidate for discriminative features So, how to mine them?

4 FP: Sub-graph A discovered pattern NSC 4960 NSC 191370 NSC 40773 NSC 164863 NSC 699181 (example borrowed from George Karypis presentation)

5 Frequent Pattern Feature Vector Representation P 1 P 2 P 3 Data 1 1 1 0 Data 2 1 0 1 Data 3 1 1 0 Data 4 0 0 1 ……… | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name NN DT SVM LR Mining these predictive features is an NP-hard problem. 100 examples can get up to 10 10 patterns Most are useless

6 Example 192 examples 12% support (at least 12% examples contain the pattern), 8600 patterns returned by itemsets 192 vs 8600 ? 4% support, 92,000 patterns 192 vs 92,000 ?? Most patterns have no predictive power and cannot be used to construct features. Our algorithm Find only 20 highly predictive patterns can construct a decision tree with about 90% accuracy

7 Data in bad feature space Discriminative patterns A non-linear combination of single feature(s) Increase the expressive and discriminative power of the feature space An example XYC 000 111 11 1 1 1 Data is non-linearly separable in (x, y) 0 1 1 x y 1 1

8 New Feature Space Data is linearly separable in (x, y, F ) Mine & Transform Solving Problem Map Data to a Different Space XYC 000 111 11 1 1 1 XY F:x=0, y=0 C 0010 1101 101 1 01 01 0 1 x y 1 1 1 1 F 0 1 1 1 ItemSet: F: x=0,y=0 Association rule F: x=0 y=0

9 Computational Issues Measured by its frequency or support. E.g. frequent subgraphs with sup 10% or 10% examples contain these patterns Ordered enumeration: cannot enumerate sup = 10% without first enumerating all patterns > 10%. NP hard problem, easily up to 10 10 patterns for a realistic problem. Most Patterns are Non-discriminative. Low support patterns can have high discriminative power. Bad! Random sampling not work since it is not exhaustive. Most patterns are useless. Random sample patterns (or blindly enumerate without considering frequency) is useless. Small number of examples. If subset of vocabulary, incomplete search. If complete vocabulary, wont help much but introduce sample selection bias problem, particularly to miss low support but high info gain patterns

10 1. Mine frequent patterns (>sup) Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ DataSet mine Mined Discriminative Patterns 1 2 4 select 2. Select most discriminative patterns; 3. Represent data in the feature space using such patterns; 4. Build classification models. F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… represent | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name NN DT SVM LR Conventional Procedure Feature Construction and Selection Two-Step Batch Method

11 Two Problems Mine step combinatorial explosion Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ DataSe t mine 1. exponential explosion 2. patterns not considered if minsupport isnt small enough

12 Two Problems Select step Issue of discriminative power Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ Mined Discriminative Patterns 1 2 4 select 3. InfoGain against the complete dataset, NOT on subset of examples 4. Correlation not directly evaluated on their joint predictability

13 Direct Mining & Selection via Model- based Search Tree Basic Flow Mined Discriminative Patterns Compact set of highly discriminative patterns 1 2 3 4 5 6 7. Divide-and-Conquer Based Frequent Pattern Mining 2 Mine & Select P: 20% Y 3 Y 6 Y + Y Y 4 N Few Data N N + N 5 N Mine & Select P:20% 7 N … … Y dataset 1 Mine & Select P: 20% Most discriminative F based on IG Feature Miner Classifier Global Support: 10*20%/10000 =0.02%

14 Analyses (I) 1. Scalability (Theorem 1) Upper bound Scale down ratio to obtain extremely low support pat: 2. Bound on number of returned features (Theorem 2)

15 4. Non-overfitting 5. Optimality under exhaustive search Analyses (II) 3. Subspace is important for discriminative pattern Original set: no-information gain if C 1 and C 0 : number of examples belonging to class 1 and 0 P 1 : number of examples in C 1 that contains a pattern α P 0 : number of examples in C 0 that contains the same pattern α Subsets could have info gain:

16 Experimental Studies: Itemset Mining (I) Scalability Comparison Datasets#Pat using MbT sup Ratio (MbT #Pat / #Pat using MbT sup) Adult2528090.41% Chess + ~0% Hypo4234390.0035% Sick48183910.00032% Sonar955070.00775% 2 Mine & Select P: 20% Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N 2 Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N

17 Experimental Studies: Itemset Mining (II) Accuracy of Mined Itemsets 4 Wins 1 loss much smaller number of patterns

18 Experimental Studies: Itemset Mining (III) Convergence

19 Experimental Studies: Graph Mining (I) 9 NCI anti-cancer screen datasets The PubChem Project. URL: pubchem.ncbi.nlm.nih.gov. Active (Positive) class : around 1% - 8.3% 2 AIDS anti-viral screen datasets URL: http://dtp.nci.nih.gov. H1: CM+CA – 3.5% H2: CA – 1%

20 Experimental Studies: Graph Mining (II) Scalability 2 Mine & Select P: 20% Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N 2 Y 3 Y + Y Y Few Data N + N dataset 1 Mine & Select P: 20% Most discriminative F based on IG Global Support: 10*20%/10000 =0.02% 6 Y 5 N Mine & Select P:20% 7 N 4 N

21 Experimental Studies: Graph Mining (III) AUC and Accuracy AUC 11 Wins 10 Wins 1 Loss

22 AUC of MbT, DT MbT VS Benchmarks Experimental Studies: Graph Mining (IV) 7 Wins, 4 losses

23 Summary Model-based Search Tree Integrated feature mining and construction. Dynamic support Can mine extremely small support patterns Both a feature construction and a classifier Not limited to one type of frequent pattern: plug-play Experiment Results Itemset Mining Graph Mining Software and Dataset available from: www.cs.columbia.edu/~wfan

24


Download ppt "Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei."

Similar presentations


Ads by Google