Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

Associative Classification (AC) Mining for A Personnel Scheduling Problem Fadi Thabtah.
DECISION TREES. Decision trees  One possible representation for hypotheses.
From Decision Trees To Rules
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Paper By - Manish Mehta, Rakesh Agarwal and Jorma Rissanen
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Lecture 21 Rule discovery strategies LERS & ERID.
A Study on Feature Selection for Toxicity Prediction*
Decision Tree Algorithm
Ensemble Learning: An Introduction
IMPUTING MISSING VALUES FOR HIERARCHICAL POPULATION DATA Overview of Database Research Muhammad Aurangzeb Ahmad Nupur Bhatnagar.
Lecture 5 (Classification with Decision Trees)
Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
Rotation Forest: A New Classifier Ensemble Method 交通大學 電子所 蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.
PAKDD'15 DATA MINING COMPETITION: GENDER PREDICTION BASED ON E-COMMERCE DATA Team members: Maria Brbic, Dragan Gamberger, Jan Kralj, Matej Mihelcic, Matija.
Basic Data Mining Techniques
An Exercise in Machine Learning
Issues with Data Mining
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Querying Structured Text in an XML Database By Xuemei Luo.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Bab 5 Classification: Alternative Techniques Part 1 Rule-Based Classifer.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Machine Learning: Ensemble Methods
Data Mining – Algorithms: Instance-Based Learning
Rule Induction for Classification Using
Data Science Algorithms: The Basic Methods
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
Association Rule Mining
A Unifying View on Instance Selection
Discriminative Frequent Pattern Analysis for Effective Classification
CSCI N317 Computation for Scientific Applications Unit Weka
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Machine Learning: Lecture 5
Presentation transcript:

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial Intelligence, Cairns, Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining

2 Idea: Motivation Previous work –Association rule mining Run time used to compare mining algorithms Lack of accuracy-based comparisons –Associative classification: Focus on accurate classifiers Think backwards –Using the resulting classifiers as basis for comparisons of confidence-based rule miners Side effect: Comparison of a standard associative classifier to standard techniques

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 3 Overview Motivation Basics: –Definitions –Associative classification (Class) Association Rule Mining –Apriori vs. predictive Apriori (by Scheffer) Pruning Classification Quality measures and Experiments Results Conclusions evaluate the sort order of rules using properties of associative classifiers

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 4 Basics: Definitions A table over n attributes (item attribute-value pair) Class association rule: implication where class attribute X body of rule, Y head of the rule Confidence of a (class) association rule: (support s(X) : the number of database records that satisfy X ) Relative frequency of a correct prediction in the (training) table of instances.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 5 Basics: Mining - Apriori 1.Mines all item sets above minimum support (frequent item sets) 2.Divide frequent item sets in rule body and head. Check if confidence of the rule is above minimum confidence rules sorted according to: confidence generates all (class) association rules with support and confidence larger than predefined values. Adaptations to mine class association rules as described by Liu et al (CBA): divide training set into classes; one for each class mine frequent item set separately in each subset take frequent item set as body and class label as head

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 6 Basics: Mining – predictive Apriori Predictive accuracy of a rule r: support based correction of the confidence value Inherent pruning strategy: –Output the best n rules according to: 1.Expected pred. accuracy among n best 2.Rule not subsumed by a rule with at least the same expected pred. accuracy prefers more general rules rules sorted according to: expected predicted accuracy Adaptations to mine class association rules: –Generate frequent item sets from all data (class attribute deleted) as rule body –Generate rule for each class label

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 7 Number of rules too big for direct use in a classifier Simple strategy: –Bound the number of rules –Sort order of mining algorithm remains CBA: Optional pruning step: pessimistic error-rate-based pruning: –A rule is pruned if removing a single item from a rule results in a reduction of the pessimistic error rate CBA: Obligatory pruning: database coverage method: –Rule that classifies at least one instance correctly (is highest ranked) belongs to intermediate classifier –Delete all covered instances –Take intermediate classifier with lowest number of errors Basics: Pruning

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 8 Overview Motivation Basics: –Definitions –Associative classification (Class) Association Rule Mining –Apriori vs. predictive Apriori (by Scheffer) Pruning Classification Quality measures Results Conclusions Think backwards: Use the properties of different classifiers to obtain accuracy-based measures for a set of (class) association rules

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 9 Classification Input: –Pruned, sorted list of class association rules Two different approaches –Weighted vote algorithm Majority vote Inversely weighted –Decision list classifier, e.g. CBA Use first rule that covers test instance for classification Think backwards: Mining algorithm preferable if resulting classifier is more accurate, compact, and built in an efficient way

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 10 Quality measures and Experiments 1.Accuracy on a test set (2 slides) 2.Average rank of the first rule that covers and correctly predicts a test instance 3.Number of mined rules and number of rules after pruning 4.Time required for mining and for pruning Measures to evaluate confidence-based mining algorithms: 12 UCI datasets: balance,breast-w, ecoli, glass, heart-h, iris, labor led7, lenses, pima, tic-tac-toe,wine One 10 fold cross validation Discretisation Comparative study for Apriori and pred. Apriori:

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 11 1a. Accuracy and Ranking Inversely weighted Emphasises top ranked rules Shows importance of good rule ranking Mining algorithm preferable if resulting classifier is more accurate.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 12 1b. How many rules are necessary to be accurate? Majority vote classifier Similar results for CBA Mining algorithm preferable if resulting classifier is more compact.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 13 Comparison: CBA to standard techniques rules after miningAprioriPred. Apriori ecoli labor pima rules after pruningCBA + AprioriC4.5JRipPARTCBA + pred. Apriori ecoli labor pima accuracyCBA + AprioriC4.5JRipPARTCBA + pred. Apriori ecoli labor pima

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 14 Conclusions Use classification to evaluate the quality of confidence-based association rule miners Test evaluation: –Pred. Apriori mines a higher quality set of rules –Pred. Apriori needs fewer rules –But: pred. Apriori is slower than Apriori Comparison of standard associative classifier (CBA) to standard ML techniques: –CBA comparable accuracy to standard techniques –CBA mines more rules and is slower All algorithms are implemented in WEKA or an add-on to WEKA available from

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 15 The End Thank you for your attention. Questions... Contact: