Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine

Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine {sbay,pazzani}@ics.uci.edu

Evaluation Tools loss/accuracy confusion matrices ROC curves Kappa statistic (Cohen, 1960) Problem: Cannot answer questions like “On which types of examples is my classifier most and least accurate?” “What are the differences between these two classifiers given that they have the same accuracy?’’

Adult data set Census database –48000 examples –12 demographic variables –classification task: predict salary >$50K or  $50K –C5 accuracy ~85% available from UCI Machine Learning Repository (Blake & Merz) http://www.ics.uci.edu/~mlearn/MLRepository.html

Our Goal Characterize model errors or model differences in the feature space of the problem Examples: Classifier MC4 is 21% less accurate than average on people who are between 45 and 55 years of age, are high school graduates, and are married. This represents 115 misclassified instances. MC4 and naive Bayes are 9% less likely to agree than average on people who have Masters degrees and are married. This represents 50 instances with different predictions. MC4 is a C4.5 clone (Kohavi, Sommerfield & Dougherty, 1997)

Framework Simple meta-learning framework MErr: does the model agree with the true class labels? MDiff: do two models agree with each other?

Exploratory Research (Dietterich, 1996) new task: generating descriptive rule sets for model errors and differences existing solutions do not work well –(i.e. although C5 is a very good classifier it is not appropriate for this task) qualitative and quantitative results define criteria for measuring quality of results

C5  $50K >$50K divorced never married salary marital statuscapital gains agree=1 married  $3500 >$3500 capital gains agree=1 education

STUCCO Two stages: search summarization Bay, S.D. & Pazzani, M.J. (1999). Detecting change in categorical data: Mining contrast sets. In Proceedings of theFifth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining. Let be a conjunction of attribute-value pairs such as occupation=sales or sex=female ^ age = [45,55]

Discriminative vs. Characteristic Learning Classifiers can be broadly classified as discriminative or characteristic (Rubinstein & Hastie, 1997) normally given select class so that is maximized discriminativecharacteristic Bayes Rule:

C5 vs. STUCCO discriminative vs. characteristic incomplete vs. complete unordered vs. hierarchical rule sets Leads to very different rule sets

Rule Set Examples, C5 MC4 Errors on Adult

Rule Set Examples, STUCCO MC4 Errors on Adult

Practical Differences MC4 is 6% more accurate than average on people who have a Bachelors degree, are married, work in a professional specialty, reported a capital gain of $0, and have a salary > $50K. This represents 13 correctly classified instances. MC4 is 26% less accurate than average on people who have a salary > $50K. This represents 1013 misclassified instances. MC4 is 13% less accurate than average on people who are married. This represents 925 misclassified instances. C5 has a fragmentation problem C5 is incomplete and misses the following rules

Evaluation Queries: MC4 Errors 1NN vs. 5NN Naïve Bayes vs. SuperParent (Keogh & Pazzani, 1999) Criteria: substantial effect comprehensible stable

Results Stability: expected agreement between rule sets generated from the same distribution. Effect Size: if we could make the agreement the same as the average, how many examples would be affected?

Stability, MC4 Errors

Stability, 1NN vs. 5NN

Stability, NB vs SP

Accuracy Difference vs. Effect Size

Summary Can treat problem of characterizing model performance as a meta-learning problem may require a different bias from discriminative learners other factors important beyond validity of rules

Future Work generalize to loss investigate how to summarize rules for humans classifier comparisons –single vs. multiple models –comparing ensemble methods

Set-Enumeration Search {} {1}{2}{3}{4} {1,2}{1,3}{1,4}{2,3}{2,4} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,2,3,4} {3,4} (Rymon, 1992)

Rule Summarization Rules are summarized hierarchically to present only surprising findings given when do we show

Iterative Process of Building Machine Learning Systems reprinted with permission from Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996) The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39, 27-34. Copyright 1996 ACM

Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine

Similar presentations

Presentation on theme: "Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine

Similar presentations

Presentation on theme: "Characterizing Model Errors and Differences Stephen D. Bay and Michael J. Pazzani Information and Computer Science University of California, Irvine"— Presentation transcript:

Similar presentations

About project

Feedback