Presentation on theme: "Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software."— Presentation transcript:
Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software Development Laboratory and Computers and Applications Laboratory Department of Mathematics, University of Patras, Greece
Aim Handling the problem of learning to predict ordinal (i.e., ordered discrete) classes. To propose a technique that can be a more robust solution to the problem.
Contents Introduction Techniques for Dealing with Ordinal Problems Proposed Technique Experiments Conclusions
Ordinal Classification Problems A class of problems between classification and regression (discrete classes with a linear ordering) Given ordered classes, one is not only interested in maximizing the classification accuracy, but also in minimizing the distances between the actual and the predicted classes.
Simple Techniques for Dealing with Ordinal Problems Classification algorithms by discarding the ordering information in the class attribute. Regression algorithms where each class is mapped to a numeric value. Reducing the multi-class ordinal classification problem to a set of binary classification problems using the one- against-all approach.
Another more Sophisticated Technique (ORD) Converting the original ordinal class problem into a series of binary problems that encode the ordering of the original classes, too. However, to predict the class value of an unseen instance this variant algorithm needs to estimate the probabilities of the k original ordinal classes using k − 1 models. For a three class ordinal problem, estimation of the probability for the first ordinal class value depends on a single classifier: P(Target second value). However, for class value in the middle of the range, the probability depends on a pair of classifiers and is given by P(Target > first value) * (1 − P(Target > second value))
Proposed Technique (1) Combines the predictions of a classification tree and a model tree algorithm. When learners are combined using a voting methodology, we expect to obtain good results based on the belief that the majority of classifiers are more likely to be correct in their decision when they agree in their opinion.
Proposed Technique (2)
Proposed Technique (3) In the proposed ensemble the sum rule is used - each voter gives the probability of its prediction for each candidate. Next all confidence values are added for each candidate and the candidate with the highest sum wins the election.
Experiments (1) To test the hypothesis that the above method improves the generalization performance on ordinal prediction problems, we performed experiments on real-world ordinal datasets donated by Dr. Arie Ben David (http://www.cs.waikato.ac.nz/ml/weka/). We also used datasets from UCI repository because of the lack of numerous benchmark datasets involving ordinal class values. These datasets represented numeric prediction problems. We converted the numeric target values into ordinal quantities using equal-size binning (three equal size intervals).
Experiments (2) All accuracy estimates were obtained by averaging the results from 10 separate runs of stratified 10-fold cross-validation. 26 datasets
Experiments (3) For each data set the algorithms are compared according to: classification accuracy (the rate of correct predictions) mean absolute error: where p: predicted values and a: actual values.
Results (1) Table shows the summary results for the proposed technique in comparison with: C4.5 without any modification in conjunction with the ordinal classification method (C4.5-ORD) using classification via regression (M5 ΄ ) DatasetsVote-C4.5- M5΄ M5΄C4.5C4.5-ORD AVERAGE accuracy MeanError
Statistical Results (as far as root mean square error) The presented ensemble is significantly more accurate than M5΄ in 4 out of the 26 datasets, whilst it has significantly higher root mean square error in none dataset. The presented ensemble has also significantly lower root mean square error in 8 out of the 26 datasets than both C4.5 and C4.5-ORD, whereas it is significantly less accurate in none dataset.
Statistical Results (as far as classification accuracy) The presented ensemble is significantly more accurate than M5΄ in 4 out of the 26 datasets, whilst it has significantly higher error rate in 2 datasets. The presented ensemble has also significantly lower error rate in 3 out of the 26 datasets than C4.5-ORD, whereas it is significantly less accurate in 1 dataset. The proposed method is significantly more accurate than C4.5 in 1 out of the 26 data- sets, whilst it has significantly higher error rate in none dataset.
Discussion If the ranking problem is posed as a classification problem then the inherent structure present in ranked data is not made use of and hence generalization ability of such classifiers is severely limited. On the other hand, posing the task of sorting as a regression problem leads to a highly constrained problem.
Conclusion According to our experiments in synthetic and real ordinal data sets, the proposed method manages to minimize the distances between the actual and the predicted classes, without harming but actually slightly improving the classification accuracy.
Future work More extensive experiments with real ordinal data sets from diverse areas will be needed to establish the precise capabilities and relative advantages of this methodology.