Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial Intelligence, Cairns, 06.12.2004-10.12.2004 Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining

2 Idea: Motivation Previous work –Association rule mining Run time used to compare mining algorithms Lack of accuracy-based comparisons –Associative classification: Focus on accurate classifiers Think backwards –Using the resulting classifiers as basis for comparisons of confidence-based rule miners Side effect: Comparison of a standard associative classifier to standard techniques

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 3 Overview Motivation Basics: –Definitions –Associative classification (Class) Association Rule Mining –Apriori vs. predictive Apriori (by Scheffer) Pruning Classification Quality measures and Experiments Results Conclusions evaluate the sort order of rules using properties of associative classifiers

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 4 Basics: Definitions A table over n attributes (item attribute-value pair) Class association rule: implication where class attribute X body of rule, Y head of the rule Confidence of a (class) association rule: (support s(X) : the number of database records that satisfy X ) Relative frequency of a correct prediction in the (training) table of instances.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 5 Basics: Mining - Apriori 1.Mines all item sets above minimum support (frequent item sets) 2.Divide frequent item sets in rule body and head. Check if confidence of the rule is above minimum confidence rules sorted according to: confidence generates all (class) association rules with support and confidence larger than predefined values. Adaptations to mine class association rules as described by Liu et al (CBA): divide training set into classes; one for each class mine frequent item set separately in each subset take frequent item set as body and class label as head

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 6 Basics: Mining – predictive Apriori Predictive accuracy of a rule r: support based correction of the confidence value Inherent pruning strategy: –Output the best n rules according to: 1.Expected pred. accuracy among n best 2.Rule not subsumed by a rule with at least the same expected pred. accuracy prefers more general rules rules sorted according to: expected predicted accuracy Adaptations to mine class association rules: –Generate frequent item sets from all data (class attribute deleted) as rule body –Generate rule for each class label

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 7 Number of rules too big for direct use in a classifier Simple strategy: –Bound the number of rules –Sort order of mining algorithm remains CBA: Optional pruning step: pessimistic error-rate-based pruning: –A rule is pruned if removing a single item from a rule results in a reduction of the pessimistic error rate CBA: Obligatory pruning: database coverage method: –Rule that classifies at least one instance correctly (is highest ranked) belongs to intermediate classifier –Delete all covered instances –Take intermediate classifier with lowest number of errors Basics: Pruning

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 8 Overview Motivation Basics: –Definitions –Associative classification (Class) Association Rule Mining –Apriori vs. predictive Apriori (by Scheffer) Pruning Classification Quality measures Results Conclusions Think backwards: Use the properties of different classifiers to obtain accuracy-based measures for a set of (class) association rules

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 9 Classification Input: –Pruned, sorted list of class association rules Two different approaches –Weighted vote algorithm Majority vote Inversely weighted –Decision list classifier, e.g. CBA Use first rule that covers test instance for classification Think backwards: Mining algorithm preferable if resulting classifier is more accurate, compact, and built in an efficient way

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 10 Quality measures and Experiments 1.Accuracy on a test set (2 slides) 2.Average rank of the first rule that covers and correctly predicts a test instance 3.Number of mined rules and number of rules after pruning 4.Time required for mining and for pruning Measures to evaluate confidence-based mining algorithms: 12 UCI datasets: balance,breast-w, ecoli, glass, heart-h, iris, labor led7, lenses, pima, tic-tac-toe,wine One 10 fold cross validation Discretisation Comparative study for Apriori and pred. Apriori:

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 11 1a. Accuracy and Ranking Inversely weighted Emphasises top ranked rules Shows importance of good rule ranking Mining algorithm preferable if resulting classifier is more accurate.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 12 1b. How many rules are necessary to be accurate? Majority vote classifier Similar results for CBA Mining algorithm preferable if resulting classifier is more compact.

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 13 Comparison: CBA to standard techniques rules after miningAprioriPred. Apriori ecoli888.2304.4 labor96084.3228.9 pima3311.4179.5 rules after pruningCBA + AprioriC4.5JRipPARTCBA + pred. Apriori ecoli19.818.39.013.620.3 labor26.73.6 3.48.7 pima39.719.23.37.539.0 accuracyCBA + AprioriC4.5JRipPARTCBA + pred. Apriori ecoli81.2684.2382.1683.6080.96 labor81.3373.6777.0078.6779.33 pima74.3673.8375.1475.2772.79

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 14 Conclusions Use classification to evaluate the quality of confidence-based association rule miners Test evaluation: –Pred. Apriori mines a higher quality set of rules –Pred. Apriori needs fewer rules –But: pred. Apriori is slower than Apriori Comparison of standard associative classifier (CBA) to standard ML techniques: –CBA comparable accuracy to standard techniques –CBA mines more rules and is slower All algorithms are implemented in WEKA or an add-on to WEKA available from http://www.cs.waikato.ac.nz/~ml

Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining 15 The End Thank you for your attention. Questions... Contact: stefan_mutter@directbox.com mhall@cs.waikato.ac.nz eibe@cs.waikato.ac.nz

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Similar presentations

Presentation on theme: "Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

Similar presentations

Presentation on theme: "Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial."— Presentation transcript:

Similar presentations

About project

Feedback