Rule Induction for Classification Using

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Decision Tree Algorithm
Feature Selection for Regression Problems
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Basic Data Mining Techniques
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Efficient Model Selection for Support Vector Machines
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Classification Techniques: Bayesian Classification
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
CS690L Data Mining: Classification
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Data Mining and Decision Support
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
Data Science Credibility: Evaluating What’s Been Learned
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning in Practice Lecture 18
Evolving Decision Rules (EDR)
Alan P. Reynolds*, David W. Corne and Michael J. Chantler
Chapter 7. Classification and Prediction
Indexing Structures for Files and Physical Database Design
Updating SF-Tree Speaker: Ho Wai Shing.
Assumptions For testing a claim about the mean of a single population
Bulgarian Academy of Sciences
International Workshop
Presented by: Dr Beatriz de la Iglesia
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
Boosting and Additive Trees
Data Mining Classification: Alternative Techniques
Basic machine learning background with Python scikit-learn
Decision Tree Saed Sayad 9/21/2018.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
An Introduction to Support Vector Machines
Classification Techniques: Bayesian Classification
Data Mining Practical Machine Learning Tools and Techniques
Hidden Markov Models Part 2: Algorithms
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
MURI Kickoff Meeting Randolph L. Moses November, 2008
COSC 4335: Other Classification Techniques
Linear Model Selection and regularization
Machine Learning Chapter 3. Decision Tree Learning
Introduction to Artificial Intelligence Lecture 9: Two-Player Games I
MIS2502: Data Analytics Clustering and Segmentation
Ensemble learning.
Learning Chapter 18 and Parts of Chapter 20
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Rule Induction for Classification Using Alan P. Reynolds and Beatriz de la Iglesia, School of Computing Sciences, University of East Anglia ar@cmp.uea.ac.uk, bli@cmp.uea.ac.uk Rule Induction for Classification Using Multi-Objective Genetic Programming Rule Induction for Prediction Partial classification describes the search for simple rules that represent ‘strong’ or ‘interesting’ descriptions of a specified class, or subsets of this class. However, the simple rule representations used are insufficiently descriptive to produce an individual rule with both high confidence and coverage. Therefore, partial classification is a subset of descriptive data mining. Here our previous work is extended to create understandable and highly predictive models that can classify previously unseen records. Attribute Tests Attribute tests (ATs) may select or eliminate a value for a categorical field or apply a bound to a numeric field. For each field, the values occurring in the dataset are stored in arrays which are referenced by the ATs as shown. Parameter Tuning Experiments were performed over a range of population sizes and crossover rates. 30 runs were performed for each combination of parameters, with 200,000 rule evaluations per run. Results were compared by summing the error rates of the best rules at each level of complexity, up to 20 ATs. The graph shows the mean results for the Adult dataset, suggesting a population size from 50 to 100 solutions and a crossover rate from 20% to 40%. Similar results were obtained when using a subset of the Cover type dataset. (All datasets used are from the UCI machine learning repository.) Training, Validation and Testing Evaluating the performance of the approach requires 3 stages: Training: The genetic algorithm, NSGA II, is used to produce a selection of rule trees, minimizing misclassification costs and complexity on a set of training data. Validation: The best rules are presented to the client. In order to give the client an idea of how well the rules might generalize, rules are re-evaluated using a second set of data. The client then selects the rule with the desired trade-off between accuracy and simplicity. Testing: In order to fully evaluate the approach, the selected rule must be re-evaluated again on a third set of data – the test set. Rule Representation Rather than attempting to combine simple rules obtained by partial classification or association rule mining, rules trees are generated from scratch. Rule trees are more expressive than the simple rules generated in partial classification or association rule mining and more compact than sets of such rules. Notice that the rule tree shown contains only 6 ATs, compared with the 12 ATs required to represent this with a set of simple conjunctive rules. Rule trees are optimized to describe a class of interest. Unlike in partial classification, if an unseen record does not match the rule, it is predicted to not belong to the class of interest. The result is that the rule trees generated act as binary predictive classifiers. Misclassification costs The real world cost of a misclassification often depends on whether the error is a false positive or a false negative, e.g. in cancer diagnosis. Alternatively, when interested in a minority class, a higher false negative cost may be required to produce useful descriptions of the class, rather than a rule that always predicts that a record does not belong to the class. Our algorithm may be used to minimize the simple error rate, balanced error rate or any other measure of overall misclassification cost. Objectives Rule complexity The primary reason for also minimizing rule complexity is to encourage the production of rules that are easily understood by the client. The client can be presented with a set of rules, allowing him to balance his understanding of the rule against the rule accuracy. A second reason is to reduce the chance of overfitting the data. The number of ATs in the rule can be used as the measure of complexity or more sophisticated measures can be used. For example, the number of ATs in the equivalent rule set has been used in some of the experiments. Conclusions The testing error rates obtained when the client chooses the rule which minimizes the validation error rate are similar to the error rates obtained by other classification algorithms The algorithm allows the client to balance the need to be able to comprehend the rule against the rule’s accuracy. The approach is flexible. For example, other measures of misclassification cost can easily be used. Rules can be presented to the client in a number of ways and the measure of rule complexity can be adapted to suit the method of rule presentation and the client’s concept of rule comprehensibility. Other AT types and operators may be used to further enhance the expressiveness of the rules produced, though this is a matter for further research. There is scope for a number of efficiency improvements, e.g. partial evaluation of rules early in the search. Research is under way to determine how best to adapt the approach to perform full classification on datasets with more than two classes. Results   Select Best Select 5AT Dataset Cost Mean StdDev ATs Time (s) Adult Simple 14.42% 0.10% 19.6 15.64% 0.12% 1032 Balanced 17.82% 0.14% 18.0 19.42% 0.07% 957 1 - 10 12.45% 0.13% 15.8 12.90% 0% 898 Cover type (spruce/fir) 21.00% 0.19% 19.5 23.35% 0.08% 8217 21.71% 0.31% 23.57% 0.16% 9168 10.58% 0.17% 19.2 11.20% 0.09% 7988 Contraceptive 29.45% 3.48% 8.8 29.46% 3.53% 55 31.97% 3.78% 7.9 32.22% 4.18% Breast cancer 4.14% 2.25% 4.2 4.10% 2.28% 37 4.97% 2.53% 4.8 5.09% 2.61% 30 Pima Indians 24.38% 4.47% 4.9 24.48% 4.65% 47 26.35% 6.25% 8.0 26.84% 6.33% 41 Rule Simplification: In order to reduce computational effort and reduce rule complexity, some effort is made before each rule is evaluated to simplify the rule, if possible. Rule Reduction: If, after rule simplification, the rule is still larger than a preset maximum size (20 ATs in the experiments reported here), ATs and their parent nodes are removed at random until the rule meets the size constraint. Before Evaluation The table shows results on the test set for five datasets, when the client selects the best rule according to misclassification cost, or the rule with 5 ATs. With the exception of the Cover type dataset, rule trees are created for the minority class. Misclassification costs are either the simple error rate, balanced error rate or the error rate when a false negative has ten times the cost of a false positive.