Download presentation

Presentation is loading. Please wait.

Published byPrecious Devon Modified about 1 year ago

1
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines 3.Ensemble Methods 4.ROC Curves

2
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ General Idea

3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Why does it work? l Suppose there are 25 base classifiers –Each classifier has error rate, = 0.35 –Assume classifiers are independent –Probability that the ensemble classifier makes a wrong prediction:

4
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Examples of Ensemble Methods l How to generate an ensemble of classifiers? –Bagging –Boosting

5
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Bagging l Sampling with replacement l Build classifier on each bootstrap sample l Each sample has probability (1 – 1/n) n of being selected

6
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records –Initially, all N records are assigned equal weights –Unlike bagging, weights may change at the end of boosting round

7
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Boosting l Records that are wrongly classified will have their weights increased l Records that are classified correctly will have their weights decreased Example 4 is hard to classify Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

8
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Base classifiers: C 1, C 2, …, C T l Error rate: l Importance of a classifier:

9
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Example: AdaBoost l Weight update: l If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the re-sampling procedure is repeated l Classification: Increase weight if misclassification; Increase is proportional to classifiers Importance.

10
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Basic AdaBoost Loop (Alg. 5.7 textbook) D 1 = initial dataset with equal weights FOR i=1 to k DO Learn new classifier C i; Computer i ; Update example weights; Create new training set D i+1 (using weighted sampling) END FOR Return Ensemble classifier that uses C i weighted by i (i=1,k)

11
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Illustrating AdaBoost Data points for training Initial weights for each data point

12
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Illustrating AdaBoost

13
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Methods for Performance Evaluation l How to obtain a reliable estimate of performance? l Performance of a model may depend on other factors besides the learning algorithm: –Class distribution –Cost of misclassification –Size of training and test sets

14
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Learning Curve l Learning curve shows how accuracy changes with varying sample size l Requires a sampling schedule for creating learning curve: l Arithmetic sampling (Langley, et al) l Geometric sampling (Provost et al) Effect of small sample size: - Bias in the estimate - Variance of estimate

15
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ ROC (Receiver Operating Characteristic) l Developed in 1950s for signal detection theory to analyze noisy signals –Characterize the trade-off between positive hits and false alarms l ROC curve plots TP (on the y-axis) against FP (on the x-axis) l Performance of each classifier represented as a point on the ROC curve –changing the threshold of algorithm, sample distribution or cost matrix changes the location of the point

16
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ ROC Curve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN= dimensional data set containing 2 classes (positive and negative) - any points located at x > t is classified as positive

17
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ ROC Curve (TP,FP): l (0,0): declare everything to be negative class l (1,1): declare everything to be positive class l (1,0): ideal l Diagonal line: –Random guessing –Below diagonal line: prediction is opposite of the true class

18
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Using ROC for Model Comparison l No model consistently outperform the other l M 1 is better for small FPR l M 2 is better for large FPR l Area Under the ROC curve l Ideal: Area = 1 l Random guess: Area = 0.5

19
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to Construct an ROC curve InstanceP(+|A)True Class Use classifier that produces posterior probability for each test instance P(+|A) Sort the instances according to P(+|A) in decreasing order Apply threshold at each unique value of P(+|A) Count the number of TP, FP, TN, FN at each threshold TP rate, TPR = TP/(TP+FN) FP rate, FPR = FP/(FP + TN)

20
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ How to construct an ROC curve Threshold >= ROC Curve:

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google