 © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Presentation on theme: "© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines."— Presentation transcript:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines 3.Ensemble Methods 4.ROC Curves

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 General Idea

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate,  = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the re-sampling procedure is repeated l Classification: Increase weight if misclassification; Increase is proportional to classifiers Importance.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Basic AdaBoost Loop (Alg. 5.7 textbook) D 1 = initial dataset with equal weights FOR i=1 to k DO Learn new classifier C i; Computer  i ; Update example weights; Create new training set D i+1 (using weighted sampling) END FOR Return Ensemble classifier that uses C i  weighted by  i (i=1,k)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Illustrating AdaBoost Data points for training Initial weights for each data point

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Illustrating AdaBoost

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Methods for Performance Evaluation l How to obtain a reliable estimate of performance? l Performance of a model may depend on other factors besides the learning algorithm: –Class distribution –Cost of misclassification –Size of training and test sets

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 ROC (Receiver Operating Characteristic) l Developed in 1950s for signal detection theory to analyze noisy signals –Characterize the trade-off between positive hits and false alarms l ROC curve plots TP (on the y-axis) against FP (on the x-axis) l Performance of each classifier represented as a point on the ROC curve –changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 ROC Curve (TP,FP): l (0,0): declare everything to be negative class l (1,1): declare everything to be positive class l (1,0): ideal l Diagonal line: –Random guessing –Below diagonal line:  prediction is opposite of the true class

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal:  Area = 1 l Random guess:  Area = 0.5

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 How to Construct an ROC curve InstanceP(+|A)True Class 10.95+ 20.93+ 30.87- 40.85- 5 - 6 + 70.76- 80.53+ 90.43- 100.25+ Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 How to construct an ROC curve Threshold >= ROC Curve: + + - - - + - + - + - + -

Download ppt "© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines."

Similar presentations