Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rule Induction for Classification Using

Similar presentations


Presentation on theme: "Rule Induction for Classification Using"— Presentation transcript:

1 Rule Induction for Classification Using
Alan P. Reynolds and Beatriz de la Iglesia, School of Computing Sciences, University of East Anglia Rule Induction for Classification Using Multi-Objective Genetic Programming Rule Induction for Prediction Partial classification describes the search for simple rules that represent ‘strong’ or ‘interesting’ descriptions of a specified class, or subsets of this class. However, the simple rule representations used are insufficiently descriptive to produce an individual rule with both high confidence and coverage. Therefore, partial classification is a subset of descriptive data mining. Here our previous work is extended to create understandable and highly predictive models that can classify previously unseen records. Attribute Tests Attribute tests (ATs) may select or eliminate a value for a categorical field or apply a bound to a numeric field. For each field, the values occurring in the dataset are stored in arrays which are referenced by the ATs as shown. Parameter Tuning Experiments were performed over a range of population sizes and crossover rates. 30 runs were performed for each combination of parameters, with 200,000 rule evaluations per run. Results were compared by summing the error rates of the best rules at each level of complexity, up to 20 ATs. The graph shows the mean results for the Adult dataset, suggesting a population size from 50 to 100 solutions and a crossover rate from 20% to 40%. Similar results were obtained when using a subset of the Cover type dataset. (All datasets used are from the UCI machine learning repository.) Training, Validation and Testing Evaluating the performance of the approach requires 3 stages: Training: The genetic algorithm, NSGA II, is used to produce a selection of rule trees, minimizing misclassification costs and complexity on a set of training data. Validation: The best rules are presented to the client. In order to give the client an idea of how well the rules might generalize, rules are re-evaluated using a second set of data. The client then selects the rule with the desired trade-off between accuracy and simplicity. Testing: In order to fully evaluate the approach, the selected rule must be re-evaluated again on a third set of data – the test set. Rule Representation Rather than attempting to combine simple rules obtained by partial classification or association rule mining, rules trees are generated from scratch. Rule trees are more expressive than the simple rules generated in partial classification or association rule mining and more compact than sets of such rules. Notice that the rule tree shown contains only 6 ATs, compared with the 12 ATs required to represent this with a set of simple conjunctive rules. Rule trees are optimized to describe a class of interest. Unlike in partial classification, if an unseen record does not match the rule, it is predicted to not belong to the class of interest. The result is that the rule trees generated act as binary predictive classifiers. Misclassification costs The real world cost of a misclassification often depends on whether the error is a false positive or a false negative, e.g. in cancer diagnosis. Alternatively, when interested in a minority class, a higher false negative cost may be required to produce useful descriptions of the class, rather than a rule that always predicts that a record does not belong to the class. Our algorithm may be used to minimize the simple error rate, balanced error rate or any other measure of overall misclassification cost. Objectives Rule complexity The primary reason for also minimizing rule complexity is to encourage the production of rules that are easily understood by the client. The client can be presented with a set of rules, allowing him to balance his understanding of the rule against the rule accuracy. A second reason is to reduce the chance of overfitting the data. The number of ATs in the rule can be used as the measure of complexity or more sophisticated measures can be used. For example, the number of ATs in the equivalent rule set has been used in some of the experiments. Conclusions The testing error rates obtained when the client chooses the rule which minimizes the validation error rate are similar to the error rates obtained by other classification algorithms The algorithm allows the client to balance the need to be able to comprehend the rule against the rule’s accuracy. The approach is flexible. For example, other measures of misclassification cost can easily be used. Rules can be presented to the client in a number of ways and the measure of rule complexity can be adapted to suit the method of rule presentation and the client’s concept of rule comprehensibility. Other AT types and operators may be used to further enhance the expressiveness of the rules produced, though this is a matter for further research. There is scope for a number of efficiency improvements, e.g. partial evaluation of rules early in the search. Research is under way to determine how best to adapt the approach to perform full classification on datasets with more than two classes. Results Select Best Select 5AT Dataset Cost Mean StdDev ATs Time (s) Adult Simple 14.42% 0.10% 19.6 15.64% 0.12% 1032 Balanced 17.82% 0.14% 18.0 19.42% 0.07% 957 1 - 10 12.45% 0.13% 15.8 12.90% 0% 898 Cover type (spruce/fir) 21.00% 0.19% 19.5 23.35% 0.08% 8217 21.71% 0.31% 23.57% 0.16% 9168 10.58% 0.17% 19.2 11.20% 0.09% 7988 Contraceptive 29.45% 3.48% 8.8 29.46% 3.53% 55 31.97% 3.78% 7.9 32.22% 4.18% Breast cancer 4.14% 2.25% 4.2 4.10% 2.28% 37 4.97% 2.53% 4.8 5.09% 2.61% 30 Pima Indians 24.38% 4.47% 4.9 24.48% 4.65% 47 26.35% 6.25% 8.0 26.84% 6.33% 41 Rule Simplification: In order to reduce computational effort and reduce rule complexity, some effort is made before each rule is evaluated to simplify the rule, if possible. Rule Reduction: If, after rule simplification, the rule is still larger than a preset maximum size (20 ATs in the experiments reported here), ATs and their parent nodes are removed at random until the rule meets the size constraint. Before Evaluation The table shows results on the test set for five datasets, when the client selects the best rule according to misclassification cost, or the rule with 5 ATs. With the exception of the Cover type dataset, rule trees are created for the minority class. Misclassification costs are either the simple error rate, balanced error rate or the error rate when a false negative has ten times the cost of a false positive.


Download ppt "Rule Induction for Classification Using"

Similar presentations


Ads by Google