# ROC & AUC, LIFT ד"ר אבי רוזנפלד.

## Presentation on theme: "ROC & AUC, LIFT ד"ר אבי רוזנפלד."— Presentation transcript:

ROC & AUC, LIFT ד"ר אבי רוזנפלד

Introduction to ROC curves
ROC = Receiver Operating Characteristic Started in electronic signal detection theory (1940s s) Has become very popular in biomedical applications, particularly radiology and imaging גם בשימוש בכריית מידע

False Positives / Negatives
Confusion matrix 1 Confusion matrix 2 P N 20 10 30 90 P N 10 20 15 105 FN Actual Actual FP Predicted Predicted Precision (P) = 20 / 50 = 0.4 Recall (P) = 20 / 30 = 0.666 F-measure=2*.4*.666/1.0666=.5

Different Cost Measures
The confusion matrix (easily generalize to multi-class) Machine Learning methods usually minimize FP+FN TPR (True Positive Rate): TP / (TP + FN) = Recall FPR (False Positive Rate): FP / (TN + FP) = Precision Predicted class Yes No Actual class TP: True positive FN: False negative FP: False positive TN: True negative

Specific Example People without disease People with disease
Test Result

Threshold Test Result Call these patients “negative”
Call these patients “positive” Test Result

Some definitions ... True Positives Test Result
Call these patients “negative” Call these patients “positive” True Positives Test Result without the disease with the disease

False Positives Test Result Call these patients “negative”
Call these patients “positive” False Positives Test Result without the disease with the disease

True negatives Test Result Call these patients “negative”
Call these patients “positive” True negatives Test Result without the disease with the disease

False negatives Test Result Call these patients “negative”
Call these patients “positive” False negatives Test Result without the disease with the disease

Moving the Threshold: left
‘‘-’’ ‘‘+’’ Test Result without the disease Which line has the higher recall of -? Which line has the higher precision of -? with the disease

ROC curve True Positive Rate (Recall)
0% 100% False Positive Rate (1-specificity) 0% 100%

Figure 5.2 A sample ROC curve.
Jagged line, shows, instance by instance (in order by sorted list shown (Table 5.6)) increases in true positives (correctly predicted as yes) and false positives (incorrectly predicted as yes) It HAD BETTER BE ABOVE THE DIAGONAL LINE or else the learning method is hurting us! Smooth curves may be drawn – or generated using cross validation … see next slide Figure 5.2 A sample ROC curve.

סוגים שונים של ROC גרפים

Area under ROC curve (AUC)
מדד כללי השטח מתחת לגרף ROC 0.50 הוא מחירה רנדומאלי, 1.0 הוא מושלם.

AUC for ROC curves AUC = 100% AUC = 50% AUC = 90% AUC = 65%
True Positive Rate 0% 100% False Positive Rate True Positive Rate 0% 100% False Positive Rate AUC = 100% AUC = 50% True Positive Rate 0% 100% False Positive Rate True Positive Rate 0% 100% False Positive Rate AUC = 90% AUC = 65%

Lift Charts X axis is sample size: (TP+FP) / N Y axis is TP
הגדרה פורמאלי: דיוק המודל / דיוק רנדומאלי 80% of responses for 40% of cost Lift factor = 2 Model 40% of responses for 10% of cost Lift factor = 4 Random

Lift factor Lift Value Sample Size

הקשר בין המדדים

בעיית הOVERFITTING

10-fold cross-validation (one example of K-fold cross-validation)
1. Randomly divide your data into 10 pieces, 1 through k. 2. Treat the 1st tenth of the data as the test dataset. Fit the model to the other nine-tenths of the data (which are now the training data). 3. Apply the model to the test data (e.g., for logistic regression, calculate predicted probabilities of the test observations). 4. Repeat this procedure for all 10 tenths of the data. 5. Calculate statistics of model accuracy and fit (e.g., ROC curves) from the test data only.

תמונה

ניתוח התוצאות

The Kappa Statistic Kappa measures relative improvement over random prediction Dreal / Dperfect = A (accuracy of the real model) Drandom / Dperfect= C (accuracy of a random model) Kappa Statistic = (A-C) / (1-C) = (Dreal / Dperfect – Drandom / Dperfect ) / (1 – Drandom / Dperfect ) Remove Dperfect from all places (Dreal – Drandom) / (Dperfect – Drandom) Kappa = 1 when A = 1 Kappa  0 if prediction is no better than random guessing

Aside: the Kappa statistic
Two confusion matrix for a 3-class problem: real model (left) vs random model (right) Number of successes: sum of values in diagonal (D) Kappa = (Dreal – Drandom) / (Dperfect – Drandom) (140 – 82) / (200 – 82) = 0.492 Accuracy = 140/200 = 0.70 Predicted Predicted a b c 88 10 2 100 14 40 6 60 18 12 120 20 200 a b c 60 30 10 100 36 18 6 24 12 4 40 120 20 200 total total Actual Actual total total

The kappa statistic – how to calculate Drandom ?
Expected confusion matrix, E, for a random model Actual confusion matrix, C a b c 88 10 2 100 14 40 6 60 18 12 120 20 200 total a b c ? 100 60 40 120 20 200 total Actual Actual The idea is to compare actual results with what would have happened if a random predictor predicted answers in the same proportion that the actual predictor did On the left is the actual results – with = 140 correct out of 200 (70%) The actual proportions are 100, 60, and 40 for A,B,C The prediction proportions are 120, 60, 20 for A,B, and C On the right. The prediction proportions are matched – but predictions are random so those 120 predictions of A are split among all of the actual values since 50% of the actual answers are A (100 of 200), of those 120 predictions of A we expect half to actually be correct (60) Since 30% of actual answers are B (60 of 200), of the 120 predictions of A, we expect 30% to be for instances that are actually Bs (36) Since 20% of actual answers are C (40 of 200), of the 120 predictions of A, we expect 20% to be for instances that are actually Cs (24) We’re going to have 60 predictions of B, but predictions are random so we expect those predictions to be split among all of the actual values since 50% of the actual answers are A (100 of 200), of those 60 predictions of B we expect half to be for instances that are actually As (30) Since 30% of actual answers are B (60 of 200), of the 60 predictions of B, we expect 30% to be correct (18) Since 20% of actual answers are C (40 of 200), of the 60 predictions of B, we expect 20% to be for instances that are actually Cs (12) We’re going to have 20 predictions of C, but predictions are random so we expect those predictions to be split among all of the actual values since 50% of the actual answers are A (100 of 200), of those 20 predictions of C we expect half to be for instances that are actually As (10) Since 30% of actual answers are B (60 of 200), of the 20 predictions of C, we expect 30% to be for instances that are actually Bs (6) Since 20% of actual answers are C (40 of 200), of the 20 predictions of C, we expect 20% to be correct (4) The expected results with stratified random prediction is =82 So this classifier being tested squeezed an extra = 58 correct predictions The best possible classifier (100% correct) could have obtained an extra =118 correct predictions So our classifier got 58 out of possible 118 extra correct predictions (49.2%)  that’s the Kappa Statistic total total 100*120/200 = 60 Rationale: 100 actual values, 120/200 in the predicted class, so random is: 100*120/200

לקראת התרגיל...