Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.

Similar presentations


Presentation on theme: "1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל."— Presentation transcript:

1 1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל

2 2 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Usual Regression Model y i =  0 +  1 x i +  i,i=1,…,n  i ~ N(0  2  independent) The model can be extended to many x’s: y i =  0 +  1 x 1i +  2 x 2i + … +  p x pi +  i Some of the x’s may be categorical (defined by dummies).

3 3 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Many times the y’s are binary, for example: 1) yes/no 2) alive/dead 3) success/failure

4 4 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה E(Y i )= p i p i is a function of the x’s, approaching 1 from below and 0 from above:

5 5 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה There are other possible functions with this form (such as probit) - which are not discussed here.

6 6 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Passengers on the Titanic Description The data give the survival status of 679 passengers on the Titanic, together with their names, age, sex and passenger class. Variable Description Name: Recorded name of passenger Passenger class: 1st, 2nd or 3rd Age: Age in years Gender: 0 = male, 1 = female Survived: 1 = Yes, 0 = No

7 7 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה One binary explanatory variable: An example: GENDER Effect on the Survival of Passengers on the Titanic GENDER is define by x:x=1 for female, and x=0 for male Survival is defined by: Yes=survived and No=didn't survive

8 8 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds OR=1 indicates no effect of x on y.

9 9 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds and the Logistic Model  1 =log(OR) measures the effect of x.  1 =0 (or equivalently OR=1) implies no effect of x on y. log(Odds Ratio)

10 10 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Method of Estimation: Maximum Likelihood Motivation for MLE (Maximum Likelihood Estimator): Value of the parameters which maximizes the probability of observing the data we in fact observed. For individual i we have a Bernoulli distribution: p i are functions of x.

11 11 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Numerical optimization: gives estimates and estimates of their variances and covariances. x’s can be continuous o r categorical.

12 12 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression using SAS Software The LOGISTIC Procedure: Response Variable: SURVIVED Response Levels: 2 Number of Observations: 679 Link Function: Logit Response Profile Ordered Value SURVIVED Count 1 1 296 2 0 383

13 13 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 932.116 708.287. SC 936.637 717.328. -2 LOG L 930.116 704.287 225.829 with 1 DF (p=0.0001)(*) Score.. 215.246 with 1 DF (p=0.0001)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.3013(  0 ) 0.1196 118.4724 0.0001.. GENDER 1 2.6087(  1 ) 0.1923 184.0159(*) 0.0001 0.701146 13.581 (*)tests GENDER effect. ^ ^

14 14 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds(Female) = 0.7871/0.2129 = 3.6970 log(3.6970) = 1.3 GENDER=1:  0 +  1 *1 = -1.3+2.6 = 1.3 = log{Odds(Female)} Odds(Male) = 0.2139/0.7861 = 0.2721log(0.2721) = -1.3 GENDER=0:  0 +  1 *0 = -1.3 = log{Odds(Male)} Odds Ratio = Odds(Female)/Odds(Male) = 3.6970/0.2721 = 13.58 log(OR) = log(13.58) = 2.609 =  1 ^ ^ ^ ^ ^ ^^ ^^ ^^

15 15 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER (Continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT -1.3013 -1.5412 -1.0719 GENDER 2.6087 2.2386 2.9931 Wald Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT -1.3013 -1.5357 -1.0670 GENDER 2.6087 2.2318 2.9856

16 16 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = AGE Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 932.116 930.491. SC 936.637 939.532. -2 LOG L 930.116 926.491 3.625 with 1 DF (p=0.0569)(*) Score.. 3.608 with 1 DF (p=0.0575)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 0.0542 0.1812 0.0894 0.7649.. AGE 1 -0.0102 0.00537 3.5902(*) 0.0581 -0.081774 0.990 (*)tests AGE effect.

17 17 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה

18 18 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Odds Ratio for a continuous explanatory variable x if p k = probability to survive for x=k and odds(k)=p k /(1-p k ), then: which means that e  1 = odds(k+1)/odds(k) is the Odds-Ratio for an increment of one unit on x.

19 19 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = AGE (Continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT 0.0542 -0.3010 0.4103 AGE -0.0102 -0.0208 0.000298 Wald Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT 0.0542 -0.3010 0.4094 AGE -0.0102 -0.0207 0.00035

20 20 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 932.116 708.438. SC 936.637 721.999. -2 LOG L 930.116 702.438 227.678 with 2 DF (p=0.0001)(*) Score.. 216.505 with 2 DF (p=0.0001)(*) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.0298 0.2305 19.9655 0.0001.. GENDER 1 2.6041 0.1926 182.8029 0.0001 0.699919 13.519 AGE 1 -0.00879 0.00648 1.8382 0.1752 -0.070652 0.991 (*)tests GENDER+AGE effect simultaneously.

21 21 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT -1.0298 -1.4885 -0.5838 GENDER 2.6041 2.2334 2.9892 AGE -0.00879 -0.0216 0.00387

22 22 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Likelihood Ratio Test: for AGE effect in addition to GENDER effect -2logL(GENDER) - {-2logL(GENDER+AGE)} = 704.287 - 702.438 = 1.849 <  2 0.95 (df=1) = 3.84 meaning there is no additional effect of AGE over GENDER.

23 23 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) p 1 = Female probability of survival (GENDER=1) p 0 = Male probability of survival (GENDER=0) ^

24 24 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER and AGE (continued) Female Male

25 25 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER, AGE and Interaction(AGE*GEN) Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 932.116 687.326. SC 936.637 705.408. -2 LOG L 930.116 679.326 250.790 with 3 DF (p=0.0001) Score.. 231.471 with 3 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -0.2329 0.2826 0.6793 0.4098.. GENDER 1 0.6885 0.4341 2.5160 0.1127 0.185061 1.991 AGE 1 -0.0364 0.00929 15.3618 0.0001 -0.292707 0.964 AGE*GEN 1 0.0669 0.0145 21.2549 0.0001 0.633984 1.069

26 26 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction(AGE*GEN) (continued) Parameter Estimates and 95% Confidence Intervals Profile Likelihood Confidence Limits Parameter Variable Estimate Lower Upper INTERCPT -0.2329 -0.7887 0.3222 GENDER 0.6885 -0.1604 1.5443 AGE -0.0364 -0.0552 -0.0187 AGE*GEN 0.0669 0.0390 0.0959

27 27 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction(Continued) Odds Ratio for GENDER = -0.2329+0.6885GENDER-0.0364AGE+0.0669AGE*GENDER = female probability of survival (GENDER=1) = male probability of survival (GENDER=0) = - 0.2329 + 0.6885 - 0.0364AGE + 0.0669AGE = - 0.2329 - 0.0364AGE = 0.6885 + 0.0669AGE OR for GENDER is a function of AGE: ^ ^ ^

28 28 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Male Female Male Female

29 29 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Prediction: Survived Didn’t survive Correct Prediction = (207+327)/679 = 0.786

30 30 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE and Interaction (Continued) Sensitivity =proportion of survivors who were correctly predicted to have survived = 207/296 = 0.699 Specificity =proportion of Non-survivors who were correctly predicted to have not survived = 327/383 = 0.854 False Pos. =Proportion of those predicted to survive who in fact did not survive = 56/263 = 0.213 False Neg. =Proportion of those predicted not to survive, who in fact survived = 89/416 = 0.214

31 31 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = GENDER, AGE and Interaction(AGE*GEN) (Continued) Classification Table Correct Incorrect Percentages ------------ ------------ ------------------------------------- Prob Non- Non- Sensi- Speci- False False Level Event Event Event Event Correct tivity ficity POS NEG ------------------------------------------------------------------------ 0.1 292 29 354 4 47.3 98.6 7.6 54.8 12.1 0.2 261 148 235 35 60.2 88.2 38.6 47.4 19.1 0.3 230 307 76 66 79.1 77.7 80.2 24.8 17.7 0.4 221 324 59 75 80.3 74.7 84.6 21.1 18.8 0.5 207 327 56 89 78.6 69.9 85.4 21.3 21.4 0.6 207 327 56 89 78.6 69.9 85.4 21.3 21.4 0.7 190 337 46 106 77.6 64.2 88.0 19.5 23.9 0.8 99 366 17 197 68.5 33.4 95.6 14.7 35.0 0.9 12 381 2 284 57.9 4.1 99.5 14.3 42.7

32 32 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה C-Statistic = 0.814

33 33 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Dummy Variables Passenger Class CLASS1 CLASS2 1 st Class 1 0 2 nd Class 0 1 3 rd Class 0 0

34 34 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = CLASS1 and CLASS2 Using LOGISTIC Procedure Model Fitting Information and Testing Global Null Hypothesis BETA=0 Intercept Intercept and Criterion Only Covariates Chi-Square for Covariates AIC 932.116 871.916. SC 936.637 885.478. -2 LOG L 930.116 865.916 64.200 with 2 DF (p=0.0001) Score.. 62.489 with 2 DF (p=0.0001) Analysis of Maximum Likelihood Estimates Parameter Standard Wald Pr > Standardized Odds Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio INTERCPT 1 -1.0821 0.1482 53.3466 0.0001.. CLASS1 1 1.5506 0.2016 59.1698 0.0001 0.403160 4.715 CLASS2 1 0.8928 0.2025 19.4495 0.0001 0.228277 2.442

35 35 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Logistic Regression Model: Survived = PCLASS Using GENMOD Procedure Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -1.0821 0.1482 53.3466 0.0001 PCLASS 1st 1 1.5506 0.2016 59.1698 0.0001 PCLASS 2nd 1 0.8928 0.2025 19.4495 0.0001 PCLASS 3rd 0 0.0000 0.0000.. LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi PCLASS 2 64.2000 0.0001

36 36 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and interactions Using GENMOD Procedure LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi GENDER 1 1.8340 0.1757 AGE 1 49.0905 0.0001 GENDER*AGE 1 14.3753 0.0001 PCLASS 2 9.3897 0.0091 GENDER*PCLASS 2 0.3855 0.8247 AGE*PCLASS 2 8.9803 0.0112 GENDER*AGE*PCLASS 2 1.6101 0.4471

37 37 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and interactions Excluding the 3-order interaction: GENDER*AGE*PCLASS LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi GENDER 1 2.9593 0.0854 AGE 1 49.8093 0.0001 GENDER*AGE 1 12.9407 0.0003 PCLASS 2 9.4110 0.0090 GENDER*PCLASS 2 17.4457 0.0002 AGE*PCLASS 2 8.5366 0.0140

38 38 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה C-Statistic = 0.879

39 39 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued) Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 -0.2602 0.4560 0.3255 0.5683 GENDER 1 -0.0586 0.5364 0.0119 0.9130 AGE 1 -0.0618 0.0185 11.1485 0.0008 GENDER*AGE 1 0.0703 0.0198 12.6683 0.0004 PCLASS 1st 1 1.7940 0.6879 6.8008 0.0091 PCLASS 2nd 1 1.5078 0.6383 5.5793 0.0182 PCLASS 3rd 0 0.0000 0.0000.. GENDER*PCLASS 1st 1 0.9237 0.6546 1.9909 0.1582 GENDER*PCLASS 2nd 1 2.3532 0.6055 15.1058 0.0001 GENDER*PCLASS 3rd 0 0.0000 0.0000.. AGE*PCLASS 1st 1 0.0068 0.0214 0.1014 0.7502 AGE*PCLASS 2nd 1 -0.0590 0.0251 5.5334 0.0187 AGE*PCLASS 3rd 0 0.0000 0.0000..

40 40 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued) = - 0.2602 - 0.0586GENDER - 0.0618AGE + 0.0703GENDER*AGE + 1.7940CLASS1 + 1.5078CLASS2 + 0.9237GENDER*CLASS1 + 2.3532GENDER*CLASS2 + 0.0068AGE*CLASS1 - 0.0590AGE*CLASS2 Hence, there are 6 models: one for each combination of GENDER and PCLASS. For example: GENDER=1 (Female) and PCLASS=1: = - 0.2602 - 0.0586 + 1.7940 + 0.9237 + (- 0.0618 + 0.0703 + 0.0068)*AGE = + 2.3989 + 0.0153AGE

41 41 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued)

42 42 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה Survived = GENDER, AGE, PCLASS and 2nd-order interactions (continued)


Download ppt "1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל."

Similar presentations


Ads by Google