Download presentation
Presentation is loading. Please wait.
Published bySharlene Davidson Modified over 9 years ago
1
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION
2
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERVIEW
3
Copyright © 2013, SAS Institute Inc. All rights reserved. REMEMBER? APPLICATIONS: PREDICTION VS. EXPLANATORY ANALYSIS The terms in the model, the values of their coefficients, and their statistical significance are of secondary importance. The focus is on producing a model that is the best at predicting future values of Y as a function of the Xs. The predicted value of Y is given by this formula: The focus is on understanding the relationship between the dependent variable and the independent variables. Consequently, the statistical significance of the coefficients is important as well as the magnitudes and signs of the coefficients.
4
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION ПРИМЕРЫ ЗАДАЧ Target Marketing Attrition Prediction Credit Scoring Fraud Detection
5
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION REGRESSION AND OTHER MODELS Type of Predictors Type of Response CategoricalContinuous Continuous and Categorical Continuous Analysis of Variance (ANOVA) Ordinary Least Squares (OLS) Regression Analysis of Covariance (ANCOVA) Categorical Contingency Table Analysis or Logistic Regression Logistic Regression
6
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION TYPES OF LOGISTIC REGRESSION
7
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION SUPERVISED (BINARY) CLASSIFICATION yx2x2 x3x3 x4x4 x5x5 x6x6...xkxk 1 2 3 5...... n 4 x1x1................................................ Input Variables Cases (Binary) Target
8
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION ЗАДАЧА И ДАННЫЕ Other product usage in a three month period Demographics Did customer purchase variable annuity product? 1= yes 0= no ~32’000 obs 47 vars
9
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION ЗАДАЧА И ДАННЫЕ
10
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES
11
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES OPPORTUNISTIC DATA Operational / Observational Massive Errors and Outliers Missing Values Analytical data preparation step: BENCHMARK: 80/20 [MY] LIFE: 99/1
12
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES MIXED MEASUREMENT SCALES 12 sales, executive, homemaker,... 88.60, 3.92, 34890.50, 45.01,... 0, 1, 2, 3, 4, 5, 6,... F, D, C, B, A 27513, 21737, 92614, 10043,... M, F
13
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES HIGH DIMENSIONALITY 13
14
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES RARE TARGET EVENT 14 Event respond churn default fraud No Event not respond stay pay off legitimate
15
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES NONLINEARITIES AND INTERACTIONS 15 Linear Additive Nonlinear Nonadditive E(y) x1x1 x2x2 x1x1 x2x2
16
Copyright © 2013, SAS Institute Inc. All rights reserved. ANALYTICAL CHALLENGES MODEL SELECTION 16 Underfitting IIIIIIIIIIIIIIIIIIIIIIIII I IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII II IIIIIIII I I I II II I I I II III I I I III II I IIII I II I IIIII II IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII Overfitting Just Right
17
Copyright © 2013, SAS Institute Inc. All rights reserved. THE MODEL & ITS INTERPRETATION
18
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION WHY NOT LINEAR? If the response variable is categorical, then how do you code the response numerically? If the response is coded (1=Yes and 0=No) and your regression equation predicts 0.5 or 1.1 or -0.4, what does that mean practically? If there are only two (or a few) possible response levels, is it reasonable to assume constant variance and normality? OLS Reg: Y i = 0 + 1 X 1i + i Probabilities are bounded, but linear functions can take on any value. (Once again, how do you interpret a predicted value of -0.4 or 1.1?) Given the bounded nature of probabilities, can you assume a linear relationship between X and p throughout the possible range of X? Can you assume a random error with constant variance? What is the observed probability for an observation? Linear Prob. Model: p i = 0 + 1 X 1i
19
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION FUNCTIONAL FORM posterior probability parameter input
20
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION THE LOGIT LINK FUNCTION smaller larger p i = 1 p i = 0
21
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION THE FITTED SURFACE
22
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION LOGISTIC PROCEDURE proc logistic data=develop plots(only)=(effect(clband x=(ddabal depamt checks res)) oddsratio (type=horizontalstat)); class res (param=ref ref='S'); model ins(event='1') = dda ddabal dep depamt cashbk checks res / stb clodds=pl; units ddabal=1000 depamt=1000 / default=1; oddsratio 'Comparisons of Residential Classification' res / diff=all cl=pl; run;
23
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION PROPERTIES OF THE ODDS RATIO Группа в знаменателе имеет более высокие шансы наступления события Группа в числителе имеет более высокие шансы No Association 0 1 Estimated logistic regression model: logit(p) = .7567 +.4373*(gender) where females are coded 1 and males are coded 0 Estimated odds ratio (Females to Males): odds ratio = (e -.7567+.4373 )/(e -.7567 ) = 1.55
24
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION RESULTS FROM ODDSRATIO oddsratio 'Comparisons of Residential Classification' res / diff=all cl=pl;
25
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION RESULTS FROM PLOTS = (EFFECT(… plots(only)=(effect(clband x=(ddabal depamt checks res))
26
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION LOGISTIC DISCRIMINATION
27
Copyright © 2013, SAS Institute Inc. All rights reserved. LOGISTIC REGRESSION CONCORDANT VERSUS DISCORDANT TieConcordant Pair Males (0.32) Discordant PairTie Females (0.42) Males (0.32) Females (0.42) Predicted Outcome Probability TieConcordant PairLow Discordant PairTieHigh LowHigh Predicted Outcome Probability Customer Did Not Buy Variable Annuity Product CustomerBought Variable Annuity Product
28
Copyright © 2013, SAS Institute Inc. All rights reserved. Демо
29
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLING
30
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLING SAMPLING DESIGNS (x,y),(x,y),(x,y), (x,y),(x,y),... {(x,y),(x,y),(x,y),(x,y)} x,x,x, x,x,... y = 0y = 1 {(x,0),(x,0),(x,1),(x,1)} x,x,x, x,x,... Joint Separate
31
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLING THE EFFECT OF OVERSAMPLING
32
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLING OFFSET - в действительности - в выборке Два способа корректировки 1. Включить параметр «сдвига» в модель 2.Скорректировать вероятности на выходе модели Adjusted Probability: model … / offset=X
33
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLING КОРРЕКТИРОВКА ВЕРОЯТНОСТЕЙ /* Specify the prior probability */ /* to correct for oversampling */ %let pi1=.02; /* Correct predicted probabilities */ proc logistic data=develop; model ins(event='1')=dda ddabal dep depamt cashbk checks; score data = pmlr.new out=scored priorevent=&pi1; run;
34
Copyright © 2013, SAS Institute Inc. All rights reserved. PREPARING THE INPUT VARIABLES
35
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES DOES PR(MISSING) DEPEND ON THE DATA? 14 67 ? 33 18 6 31 51 2 1 3 1 2 0 3 1 2 4 1 7 1 1 8 8 No o MCAR (missing completely at random) Yes o that unobserved value o other unobserved values o other observed values (including the target)
36
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES COMPLETE CASE ANALYSIS Cases Input Variables...
37
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES COMPLETE CASE ANALYSIS Cases Input Variables
38
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES NEW MISSING VALUES Fitted Model: New Case: Predicted Value:
39
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES MISSING VALUE IMPUTATION 6.52.3.3366 C99 01 0.8 0C99 6.563 12041.8 00.58665C14 014.837C00 8012.1 14.83764C08 6012.8 19.62266 32.7 01.12864C00 2022.1 15.92163C03 10032.0 063 7012.5 05.56267C12 012.4 00.929C05 6032.6 08.34266C03
40
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES IMPUTATION + INDICATORS 34 63. 22 26 54 18. 47 20 Median = 30 34 63 30 22 26 54 18 30 49 20 00100001000010000100 Completed Data Missing Indicator Incomplete Data
41
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES IMPUTATION + INDICATORS data develop1; /* Create missing indicators */ set develop; /* name the missing indicator variables */ array mi{*} MIAcctAg MIPhone … MICRScor; /* select variables with missing values */ array x{*} acctage phone … crscore; do i=1 to dim(mi); mi{i}=(x{i}=.); end; run; proc stdize data=develop1 reponly method=median /* Impute missing values with the median */ out=imputed; var &inputs; run;
42
Copyright © 2013, SAS Institute Inc. All rights reserved. MISSING VALUES CLUSTER IMPUTATION [AT LATER LECTURES]
43
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS
44
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS DUMMY VARIABLES 000011001...000011001... 010000000...010000000... 001100010...001100010... 100000100...100000100... DADA DBDB DCDCD DBCCAADCA...DBCCAADCA... X
45
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS SMARTER VARIABLES 75 100 150 75 100 150 100. 111011101...111011101... 121133213...121133213... HomeValLocal 99801 99622 99523 99737 99937 99533 99523 99622. ZIP... Urbanicity
46
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS QUASI-COMPLETE SEPARATION 28 16 94 23 7 0 11 21 A B C D 01 10001000 01000100 00100010 DADA DBDB DcDc 00010001D
47
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS CLUSTERING LEVELS 2 =2 = A B C D 01 28 16 94 23 7 0 11 21 31.7 Merged: 100%...
48
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS CLUSTERING LEVELS 2 =2 = A B C D 01 28 16 94 23 7 0 11 21 31.7 Merged: 100% B & C 30.7 28 110 23 7 11 21 01 97%...
49
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS CLUSTERING LEVELS 2 =2 = A B C D 01 28 16 94 23 7 0 11 21 31.7 Merged: 100% B & C 30.7 28 110 23 7 11 21 01 97% A & BC 28.6 138 23 18 21 01 90%...
50
Copyright © 2013, SAS Institute Inc. All rights reserved. CATEGORICAL INPUTS CLUSTERING LEVELS A B C D 01 28 16 94 23 7 0 11 21 B & C 30.7 28 110 23 7 11 21 01 97% A & BC 28.6 138 23 18 21 01 90% 16139 01 2 =2 = 31.7 Merged: 100% ABC & D 0 0% Greenacre (1988, 1993) PROC MEANS – PROC CLUSTER – PROC TREE -… HOME WORK
51
Copyright © 2013, SAS Institute Inc. All rights reserved. VARIABLE CLUSTERING
52
Copyright © 2013, SAS Institute Inc. All rights reserved. VARIABLE CLUSTERING REDUNDANCY
53
Copyright © 2013, SAS Institute Inc. All rights reserved. VARIABLE CLUSTERING Credit Card Balance Mortgage Balance Number of Checks Teller Visits Checking Deposits Age PROC VARCLASS [LATER LECTURE]
54
Copyright © 2013, SAS Institute Inc. All rights reserved. VARIABLE SCREENING UNIVARIATE SCREENING
55
Copyright © 2013, SAS Institute Inc. All rights reserved. VARIABLE SCREENING UNIVARIATE SMOOTHING
56
Copyright © 2013, SAS Institute Inc. All rights reserved. EMPIRICAL LOGITS where m i = number of events M i = number of cases
57
Copyright © 2013, SAS Institute Inc. All rights reserved. EMPIRICAL LOGIT PLOTS 1. Hand-Crafted New Input Variables 2. Polynomial Models 3. Flexible Multivariate Function Estimators 4. Do Nothing
58
Copyright © 2013, SAS Institute Inc. All rights reserved. SUBSET SELECTION
59
Copyright © 2013, SAS Institute Inc. All rights reserved. SUBSET SELECTION SCALABILITY IN PROC LOGISTIC 255075100150200 Number of Variables All Subsets Stepwise Fast Backward Time
60
Copyright © 2013, SAS Institute Inc. All rights reserved. MEASURING CLASSIFIER PERFORMANCE
61
Copyright © 2013, SAS Institute Inc. All rights reserved. HONEST ASSESSMENT THE OPTIMISM PRINCIPLE
62
Copyright © 2013, SAS Institute Inc. All rights reserved. HONEST ASSESSMENT DATA SPLITTING Validation Test Training
63
Copyright © 2013, SAS Institute Inc. All rights reserved. HONEST ASSESSMENT OTHER APPROACHES ABCDE Train BCDE ACDE ABDE ABCE ABCD Validate A B C D E 1) 2) 3) 4) 5)
64
Copyright © 2013, SAS Institute Inc. All rights reserved. MISCLASSIFICATION CONFUSION MATRIX True Negative False Positive False Negative True Positive Actual Negative Predicted Negative Predicted Positive Actual Positive Predicted Class Actual Class 0 1 01
65
Copyright © 2013, SAS Institute Inc. All rights reserved. SENSITIVITY AND POSITIVE PREDICTED VALUE True Positive Predicted Positive Actual Positive Predicted Class Actual Class 0 1 01
66
Copyright © 2013, SAS Institute Inc. All rights reserved. ROC CURVE
67
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERSAMPLED TEST SET 2921 1733 Predicted Actual 0 1 01 4654 50 5641 12 Predicted 01 5743 97 3 SamplePopulation
68
Copyright © 2013, SAS Institute Inc. All rights reserved. ADJUSTMENTS FOR OVERSAMPLING 0 ·Sp 0 (1—Sp) 1 (1—Se) 1 ·Se Predicted Class Actual Class 0 1 01 00 11
69
Copyright © 2013, SAS Institute Inc. All rights reserved. ALLOCATION RULES CUTOFFS
70
Copyright © 2013, SAS Institute Inc. All rights reserved. ALLOCATION RULES PROFIT MATRIX 5718 124 669 421 705 916 Total Profit 24*99 - 18 = $2358 21*99 - 9 = $2070 16*99 - 5 = $1579 $0-$1 $0$99 Predicted Actual 0 1 01
71
Copyright © 2013, SAS Institute Inc. All rights reserved. ALLOCATION RULES PROFIT MATRIX Actual Class Decision Bayes Rule: Decision 1 if 0 1 01
72
Copyright © 2013, SAS Institute Inc. All rights reserved. ALLOCATION RULES CLASSIFIER PERFORMANCE
73
Copyright © 2013, SAS Institute Inc. All rights reserved. ALLOCATION RULES USING PROFIT TO ASSESS FIT
74
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERALL PREDICTIVE POWER CLASS SEPARATION
75
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERALL PREDICTIVE POWER K-S STATISTIC
76
Copyright © 2013, SAS Institute Inc. All rights reserved. OVERALL PREDICTIVE POWER AREA UNDER THE ROC CURVE
77
Copyright © 2013, SAS Institute Inc. All rights reserved. ROC AND ROCCONTRAST STATEMENTS ROC ; ROCCONTRAST ; ROC ; ROCCONTRAST ;
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.