Presentation is loading. Please wait.

Presentation is loading. Please wait.

TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.

Similar presentations


Presentation on theme: "TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz."— Presentation transcript:

1 TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz

2 INTRODUCTION K-NN  k-NN Classifier (Categorical Outcome)  Determining Neighbors  Classification Rule  Example: Riding Mowers  Choosing k  Setting the Cutoff Value  Advantages and shortcomings of k-NN algorithms 2

3 INTRODUCTION NAIVE BAYES  Basic Classification Procedure  Cutoff Probability Method  Conditional Probability  Naive Bayes  Advantages and shortcomings of the naive Bayes classifier 3

4 SIMPLE CASE APPLICATION  Depression 4

5 SIMPLE CASE APPLICATION  Fruits Example: P(Banana) = 500 / 1000 = 0,5 1-0,5 = 0,5 (Not banana) New fruit  compute all the chances 5 Sweet Not sweet Total Banana350150500 Orange150 300 Other fruit15050200 Total6503501000

6 REAL-LIFE APPLICATION NAIVE BAYES  Medical Data Classification with Naive Bayes Approach  Introduction  Requirements for systems dealing with medical data  An empirical comparison  Tables  Conclusion 6

7 TABLE 2:COMPARATIVE ANALYSIS BASED ON PREDICTIVE ACCURACY 7

8 TABLE 3:COMPARATIVE ANALYSIS BASED ON AREA UNDER ROC CURVE (AUC) 8

9 REAL-LIFE APPLICATION K-NN  Used to help health care professionals in diagnosing heart disease.  Useful for pattern recognition and classification.  Euclidean distance:  Often normalized data due to different variable formats. 9

10 CASE STUDY  “Our customer is a Dutch charity organization that wants to be able to classify it's supporters to donators and non-donators. The non-donators are sent a single marketing mail a year, whereas the donators receive multiple ones (up to 4).”  Who are the donators?  Who are the non-donators?  Application of K-NN & Naive Bayes to training and test dataset.  4000 customers.  SPSS, Excel, XLMiner 10

11 CLEAN-UP  No missing values  1-dimensional outliers removed through sorting (regarding annual & average donation)  2-dimensional outliers removed through scatterplot 11

12 12

13 Variables Kept Average donation Frequency of Response Median Time of Response Time as client Variables removed Annual donation Last donation Time since last response. 13

14  Normalization of scores into z-scores.  Nominal categorization of data  Classification through percentiles of z-score & by manually processing values within the variables. 14

15 ANALYSIS OF CASE STUDY – K-NN 15  Xlminer  Partition data  Models created:  M1 = Zavgdon & Zfrqres  M2 = ZtimeCl, Zfrqres & Zavgdon  M3 = Zmedtor, Zfrqres & Zavgdon  ZtimeCl, Zfrqres, Zmedtor & Zavgdon

16  Validation Data Scoring - Summary Report (for k = 13) 16 Error Report Class# Cases# Errors% Error 0 108318016,62049861 1 53626048,50746269 Overall 161944027,17726992 Classification Confusion Matrix Predicted Class Actual Class01 0 903180 1 260276

17 CHOOSING MODEL FOR K-NN  Accuracy: Proportion of correctly classified instances.  Error rate: (1 – Accuracy)  Sensitivity: Sensitivity is the proportion of actual positives which are correctly identified as positives by the classifier.  Specificity: Like sensitivity, but for the negatives. 17

18 18

19 M1M2 Selecting everyone in validation data €711.20€662.80 Selecting while correcting for sensitivity and specificity €583.60€530.80 19

20 APPLICATION OF MODEL ON TEST DATA Classification Confusion Matrix Predicted Class Actual Class01 0 2300344 1 654750 20 Error Report Class# Cases# Errors% Error 0 264434413,01059 1 140465446,5812 Overall 404899824,65415

21 21

22 ANALYSIS OF THE CASE STUDY – NAIVE BAYES 22 Classification Confusion Matrix Predicted Class Actual Class01 0 856229 1 225309 Error Report Class# Cases# Errors% Error 0 108522921,10599 1 53422542,13483 Overall 161945428,042  M1 = Cfrqres & Cavgdon  M2 = Cfrqresp, Cavgdon & Cmedtor Classes --> Input Variab les 01 ValueProbValueProb CFRQR ES 10,7197710,297483 20,17146520,255149 30,0639830,19222 40,04478640,255149 CAVGD ON 10,63275810,297483 20,27255320,471396 30,07613630,16476 40,01855440,066362

23 23 Model 1Model 2 Selecting everyone€1072€1006 Selecting by class€2460,82€2378.01

24 APPLICATION OF MODEL ON TEST DATA Classes --> Input Variabl es 01 ValueProbValueProb CFRQR ES 10,71450210,306108 20,17333820,254261 30,06646530,18608 40,04569540,253551 CAVGD ON 10,63028710,3125 20,28058920,461648 30,06835330,168324 40,0207740,057528 24

25 25 Classification Confusion Matrix Predicted Class Actual Class01 0 2096548 1 570834 Error Report Class# Cases# Errors% Error 0 264454820,72617 1 140457040,59829 Overall 4048111827,61858

26 LOOKING AT BOTH MODELS 26

27 27

28 QUESTIONS? 28


Download ppt "TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz."

Similar presentations


Ads by Google