Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Lecture 22: Evaluation April 24, 2010.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Evaluation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Spring 2003Data Mining by H. Liu, ASU1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluation.
Three kinds of learning
Experimental Evaluation
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Chapter 14 Inferential Data Analysis
INTRODUCTION TO Machine Learning 3rd Edition
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
7/03Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU) 1 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Today Evaluation Measures Accuracy Significance Testing
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.
An Exercise in Machine Learning
A Multivariate Biomarker for Parkinson’s Disease M. Coakley, G. Crocetti, P. Dressner, W. Kellum, T. Lamin The Michael L. Gargano 12 th Annual Research.
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
CLassification TESTING Testing classifier accuracy
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Appendix: The WEKA Data Mining Software
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Chapter 9 – Classification and Regression Trees
Experimental Evaluation of Learning Algorithms Part 1.
Learning from Observations Chapter 18 Through
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
An Exercise in Machine Learning
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Machine Learning 5. Parametric Methods.
Validation methods.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning – Classification David Fenyő
Data Mining Lecture 11.
Data Mining Practical Machine Learning Tools and Techniques
Machine Learning Techniques for Data Mining
Evaluation and Its Methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen, 2005 References:
Evaluation and Its Methods
Evaluation and Its Methods
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann, [Lim et al99] Lim, T.-S., Loh, W.-Y. and Shih, Y.-S. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms, Machine Learning. Forthcoming. (Appendix containing complete tables of error rates, ranks, and training times; download the data sets in C4.5 format)

Evaluation of Methods Ideally : find the best one Practically: find comparable classes Classification accuracy –Effect of noise on accuracy Comprehensibility of result –Compactness –Complexity Training time Scalability with increase in sample size

Types of classifiers Decision trees Neural Network Statistical –Regression Pi = w0 + w1*x1 + w2*x wm*xm Use regression on data set: Min Sum(Ci - (w0 + w1*x wm*xm ))^2 to get the weights. –Classification Perform regression for each class; output = 1 for Ci, 0 otherwise, to get a linear expression Given test data, evaluate each linear expression and choose the class corresponding to the largest

Assumptions in classification Data set is a representative sample Prior probability is proportional to the frequency in training sample sizes Changing categories to numerical values If attribute X takes k values: C 1, C 2,..., C k, have (k-1) dimensional vector d 1, d 2,..., d k-1 such that d i =1 if X= C i and d i =0, otherwise. For X= C k, the vector contains all zeros.

Error Rate of Classifier If data set is large –use result from test data –test data is usually 1/3 of total data If data size is small –K-fold cross validation (usually 10) Divide data set into roughly equal K sets. Use K-1 for training, the remaining for testing Repeat K times and average the error rate

Error Rate of Classifier (cont) –Fold choice can affect result STRATIFICATION: Class ratio in each set is same as in whole data set. Use K K-fold cross validation to overcome random variation in fold choice. –Leave-one-out: N-fold cross validation –Bootstrap: Training Set = Select N data items at random with substitution. Prob = (1 - 1/N)^N = 1/e = Test Set = Data Set - Training Set e = * e(test) * e(training)

Evaluation in [ Lim et al 99] 22 DT, 9 Statistical, 2 NN classifiers Time Measurement –Use SPEC marks to rate platforms and scale results based on these marks. Error rates –Calculation For test data > 1000, use its error rate Else, use 10-fold cross validation –Does not use multiple 10-folds.

Evaluation in [ Lim et al 99] –Acceptable performance If p is minimum error rate of all classifiers, those within 1 standard error on p are accepted Std error of p = Sqrt(p*(1-p)/N) Statistical significance of error rates –Null hypothesis: All algorithms with same mean error rate ? Test differences of mean error rates. (Hypothesis REJECTED). – Tukey method: Difference between mean error rates significant at 10% if differ by

Evaluation in [ Lim et al 99] Rank analysis (no normality assumption) –Data Set(i): Sort according to ascending error rate and assign rank (ties given avg rank) Error rate: Rank: –Get Mean Rank across all Data Set Statistical significance of rank –Null Hypothesis on difference of mean ranks (Friedman test) is REJECTED. –Difference in mean ranks greater than 8.7 is significant at 10% level.

Evaluation in [ Lim et al 99] Equivalent classifiers from best (POL) (Training time <= 10 min) –15 found with mean error rate –18 found with mean rank Training time shown in order slower than fastest classifier (10^(x-1) to 10^x) –Decision tree learners (C4.5, FACT - statistical tests for splitter), regression methods train fast –Spline-based statisticals and NNs slower

Evaluation in [ Lim et al 99] Size of trees (in decision trees) –Use 10-fold cross-validation –Noise typically reduces tree size Scalability with data set size –If data set small, use bootstrap re-sampling N items are drawn with substitution Class attribute is randomly changed with prob = 0.1 to a value from the valid set selected uniformly –Else, use given data size

Evaluation in [ Lim et al 99] –log(training time) increases linearly with log(N) –Decision trees usually scale C4.5 with rules does not scale, but trees do QUEST, FACT (multi-attribute) scale –Regression methods usually scale –NN methods were not tested

[ Lim et al 99] Summary Use method that fits the requirement –error rate, training time,... Decision tree with univariate splits (single attribute) for data interpretation –C4.5 trees are big, C4.5 rules don’t scale Simple regression methods are good: fast, easy implementation, scalable

Precision/ Recall Cost based analysis What more ? False (-)ve True (+)ve True (-)veFalse (+)ve PREDICTED ACTUAL

Demonstration from WEKA