Download presentation
Presentation is loading. Please wait.
Published byGavin Knight Modified over 11 years ago
1
Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph database Frequent pattern is a good candidate for discriminative features, especially for data of complicated structures. Motivation: Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure Why Frequent Patterns? A non-linear conjunctive combination of single features Increase the expressive and discriminative power of the feature space Examples: Exclusive OR problem & Solution XYC 000 011 101 110 0 0 1 1 x y L1 L2 Data is non- linearly separable in (x, y) XYXYC 0000 0101 1001 1110 mine & transform Data is linearly separable in (x, y, xy) 0 0 1 1 map data to higher space Conventional Frequent Pattern-Based Classification: Two-Step Batch Method 1. Mine frequent patterns; 2. Select most discriminative patterns; 3. Represent data in the feature space using such patterns; 4. Build classification models. F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… represent Frequent Patterns 1---------------------- ---------2----------3 ----- 4 --- 5 -------- --- 6 ------- 7------ DataSet mine Mined Discriminative Patterns 1 2 4 select | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name ANN DT SVM LR Basic Flows: Problems of Separated Mine & Select in Batch Method 1.Mine step: Issues of scalability and combinatorial explosion Dilemma of setting minsupport Promising discriminative candidate patterns? Tremendous number of candidate patterns? 2.Select step: Issue of discriminative power 5 Datasets: UCI Machine Learning Repository Scalability Study: Datasets#Pat using MbT supRatio (MbT #Pat / #Pat using MbT sup) Adult2528090.41% Chess+ ~0% Hypo4234390.0035% Sick48183910.00032% Sonar955070.00775% Itemset Mining Accuracy of Mined Itemsets Graph Mining 11 Datasets: 9 NCI anti-cancer screen datasets PubChem Project. Positive class : 1% - 8.3% 2 AIDS anti-viral screen datasets URL: http://dtp.nci.nih.gov. H1: 3.5%, H2: 1% Scalability Study Predictive Quality of Mined Frequent Subgraphs AUC AUC of MbT, DT MbT VS Benchmarks Case Study Motivation Problems Proposed Algorithm Experiments dataset 1 2 5 34 67 Few Data …….. + + Divide-and-Conquer Based Frequent Pattern Mining mine & select Mined Discriminative Patterns 12345671234567 1.Mine and Select most discriminative patterns; 2.Represent data in the feature space using such patterns; 3.Build classification models. F1 F2 F4 Data1 1 1 0 Data2 1 0 1 Data3 1 1 0 Data4 0 0 1 ……… represent | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name ANN DT SVM LR Direct Mining & Selection via Model-based Search Tree Procedures as Feature Miner Or Be Itself as Classifier Analyses: 1.Scalability of pattern enumeration Upper bound Scale down ratio 2.Bound on number of returned features 3.Subspace pattern selection 4.Non-overfitting 5.Optimality under exhaustive search Take Home Message: 1.Highly compact and discriminative frequent patterns can be directly mined through Model based Search Tree without worrying about combinatorial explosion. 2.Software and datasets are available by contacting the authors.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.