Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph.

Similar presentations


Presentation on theme: "Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph."— Presentation transcript:

1 Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph database Frequent pattern is a good candidate for discriminative features, especially for data of complicated structures. Motivation: Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure Why Frequent Patterns? A non-linear conjunctive combination of single features Increase the expressive and discriminative power of the feature space Examples: Exclusive OR problem & Solution XYC x y L1 L2 Data is non- linearly separable in (x, y) XYXYC mine & transform Data is linearly separable in (x, y, xy) map data to higher space Conventional Frequent Pattern-Based Classification: Two-Step Batch Method 1. Mine frequent patterns; 2. Select most discriminative patterns; 3. Represent data in the feature space using such patterns; 4. Build classification models. F1 F2 F4 Data Data Data Data ……… represent Frequent Patterns DataSet mine Mined Discriminative Patterns select | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name ANN DT SVM LR Basic Flows: Problems of Separated Mine & Select in Batch Method 1.Mine step: Issues of scalability and combinatorial explosion Dilemma of setting minsupport Promising discriminative candidate patterns? Tremendous number of candidate patterns? 2.Select step: Issue of discriminative power 5 Datasets: UCI Machine Learning Repository Scalability Study: Datasets#Pat using MbT supRatio (MbT #Pat / #Pat using MbT sup) Adult % Chess+ ~0% Hypo % Sick % Sonar % Itemset Mining Accuracy of Mined Itemsets Graph Mining 11 Datasets: 9 NCI anti-cancer screen datasets PubChem Project. Positive class : 1% - 8.3% 2 AIDS anti-viral screen datasets URL: H1: 3.5%, H2: 1% Scalability Study Predictive Quality of Mined Frequent Subgraphs AUC AUC of MbT, DT MbT VS Benchmarks Case Study Motivation Problems Proposed Algorithm Experiments dataset Few Data …… Divide-and-Conquer Based Frequent Pattern Mining mine & select Mined Discriminative Patterns Mine and Select most discriminative patterns; 2.Represent data in the feature space using such patterns; 3.Build classification models. F1 F2 F4 Data Data Data Data ……… represent | Petal.Width< 1.75 setosa versicolor virginica Petal.Length< 2.45 Any classifiers you can name ANN DT SVM LR Direct Mining & Selection via Model-based Search Tree Procedures as Feature Miner Or Be Itself as Classifier Analyses: 1.Scalability of pattern enumeration Upper bound Scale down ratio 2.Bound on number of returned features 3.Subspace pattern selection 4.Non-overfitting 5.Optimality under exhaustive search Take Home Message: 1.Highly compact and discriminative frequent patterns can be directly mined through Model based Search Tree without worrying about combinatorial explosion. 2.Software and datasets are available by contacting the authors.


Download ppt "Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph."

Similar presentations


Ads by Google