Download presentation

Presentation is loading. Please wait.

Published byPrecious Devon Modified over 4 years ago

1
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines 3.Ensemble Methods 4.ROC Curves

2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 General Idea

3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate, = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

5
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected

6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

9
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9 Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the re-sampling procedure is repeated l Classification: Increase weight if misclassification; Increase is proportional to classifiers Importance.

10
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10 Basic AdaBoost Loop (Alg. 5.7 textbook) D 1 = initial dataset with equal weights FOR i=1 to k DO Learn new classifier C i; Computer i ; Update example weights; Create new training set D i+1 (using weighted sampling) END FOR Return Ensemble classifier that uses C i weighted by i (i=1,k)

11
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11 Illustrating AdaBoost Data points for training Initial weights for each data point

12
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12 Illustrating AdaBoost

13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13 Methods for Performance Evaluation l How to obtain a reliable estimate of performance? l Performance of a model may depend on other factors besides the learning algorithm: –Class distribution –Cost of misclassification –Size of training and test sets

14
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14 Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate

15
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15 ROC (Receiver Operating Characteristic) l Developed in 1950s for signal detection theory to analyze noisy signals –Characterize the trade-off between positive hits and false alarms l ROC curve plots TP (on the y-axis) against FP (on the x-axis) l Performance of each classifier represented as a point on the ROC curve –changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point

16
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16 ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 - 1-dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive

17
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17 ROC Curve (TP,FP): l (0,0): declare everything to be negative class l (1,1): declare everything to be positive class l (1,0): ideal l Diagonal line: –Random guessing –Below diagonal line: prediction is opposite of the true class

18
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18 Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal: Area = 1 l Random guess: Area = 0.5

19
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19 How to Construct an ROC curve InstanceP(+|A)True Class 10.95+ 20.93+ 30.87- 40.85- 5 - 6 + 70.76- 80.53+ 90.43- 100.25+ Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

20
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20 How to construct an ROC curve Threshold >= ROC Curve: + + - - - + - + - + - + -

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google