BINARY LOGISTIC REGRESSION

BINARY LOGISTIC REGRESSION
Kazimieras Pukėnas

Introduction Binomial (or binary) logistic regression is a form of regression which is used when the dependent is a dichotomy and the independents are of any type. Logistic regression can be used to predict a dependent variable on the basis of continuous and/or categorical independents and to determine the percent of variance in the dependent variable explained by the independents. Logistic regression is popular in part because it enables the researcher to overcome many of the restrictive assumptions: Logistic regression does not assume a linear relationship between the dependents and the independents; The dependent variable need not be normally distributed (but does assume its distribution is within the range of the exponential family of distributions, such as normal, Poisson, binomial, gamma);

Introduction The dependent variable need not be homoscedastic for each level of the independents; that is, there is no homogeneity of variance assumption: variances need not be the same within categories. The main restriction is that the model should have little or no multicollinearity. That is that the independent variables should be independent from each other.

Introduction The probability that the random variable Y will take the value 1 (from possible values 0 and 1) is calculated by the equation ; ; where – vector of independent variables ; - parameter estimates. If the estimate is greater than 0.5, the predicted value of Y is 1; if the estimate is less than 0.5, the predicted value of Y is 0. As is valid if and only if , is more convenient to apply this rule for predicting of Y: if ,Y=1, else if , Y=0;

Introduction There are several statistics which can be used for comparing alternative models or evaluating the performance of a single model: Model Chi-Square. Use the “Model Chi-Square” statistic to determine if the overall model is statistically significant. Model Chi-square thus tests the null hypothesis that all population logistic regression coefficients except the constant are zero. When probability (model chi-square) <= .05, we reject the null hypothesis that knowing the independents makes no difference in predicting the dependent in logistic regression, and conclude that at least one coefficient ; Hosmer-Lemeshow test of goodness of fit. Hosmer-Lemeshow test of goodness of fit is an alternative method for testing the same hypothesis. If Hosmer-Lemeshow test of goodness of fit is not significant ( ), then the model has adequate fit. By the same token, if the test is significant, the model does not adequately fit the data.

Introduction Wald statistic (test): The Wald statistic is a test which is commonly used to test the significance of individual logistic regression coefficients for each independent variable (that is, to test the null hypothesis in logistic regression that a particular logit (effect) coefficient is zero). The null hypothesis is rejected, i. e., if ; The percentage of the correct predictions. The percentage of the correct predictions is an important measure for a model's usefulness. The "Percent Correct Predictions" statistic assumes that if the estimated percent is greater than or equal to 50% then the event is expected to occur and not occur otherwise.

How to perform a Binomial Logistic Regression in SPSS
To run a binomial logistic regression in SPSS: Open the file with the data analyzed. Type the values of the independent variables (predictors), if you want to become the binary response (0 or 1) of the predicted dependent variable for those cases. From the menus choose: Analyze  Regression...  Binary Logistic... The dialog box Logistic Regression appears (Fig. 1). Put the dependent binomial variable in the box Dependent and independent variables (predictors) in the box Covariates. To enter independent variables in groups (blocks), select the covariates for a block, and click Next to specify a new block. Repeat until all blocks have been specified.

Fig.1. Dialog box Logistic Regression

Method selection allows you to specify how independent variables are entered into the analysis. Using different methods, you can construct a variety of regression models from the same set of variables. Enter is a procedure for variable selection in which all variables in a block are entered in a single step. An alternative for this method are forward stepwise selection methods (Forward Conditional, Forward LR, and Forward Wald) with entry testing based on the significance of the score statistic or backward stepwise elimination methods (Backward Conditional, Backward LR, and Backward Wald) with removal testing based on the probability of the likelihood-ratio statistic. Enter method is preferable when the single covariate exist.

Click Categorical...if some of the independent variables are categorical (ordinal or nominal), select categorical covariates from the Covariates list in the Logistic Regression: Define Categorical Variables dialog box (Fig. 2) and move them into the Categorical Covariates list. Each variable includes a notation in parentheses indicating the contrast coding to be used. Usually leave the default option Indicator of the Contrast. The categorical variable with k levels will be transformed into k-1 variables (dummy variables) each with two levels. You can choose either the default Last of the Reference Category or First. The reference category is represented in the contrast matrix as a row of zeros.

Fig. 2. Dialog box Logistic Regression: Define Categorical Variables

You can save results of the logistic regression as new variables in the active dataset. Click Save... and select the Probabilities in the Predicted Values area in the dialog box Logistic Regression: Save (Fig. 3) to save the predicted probability of occurrence of the event (the predicted probability of category 1 is saved) and select Group membership to save the group with the largest posterior probability, based on discriminant scores. By clicking the Options..., you can specify options for your logistic regression analysis in the dialog box Logistic Regression: Options (Fig. 3), e. g., Classification plots, Hosmer-Lemeshow goodness-of-fit, etc. This goodness-of-fit statistic is preferable for models with continuous covariates and studies with small sample sizes.

Fig. 3. Dialog boxes Logistic Regression: Save ir Logistic Regression: Options

Example The dichotomous dependent variable outcomes with values 1 = ‘yes’, and 0 = ‘no’ depend on the categorical independents variables: category (with values 1=‘first‘, 2=’second’, 3=’third’), condit_1 (with values 1=‘is satisfied‘, 2=’isn’t satisfied’), and condit_2 (with values 1=‘is satisfied‘, 2=’isn’t satisfied’), and on the interval independent variable interval. Binary logistic regression output table Categorical Variables Codings (Fig. 4) shows the coding of these variables. The first column identifies each categorical variable. For each variable with, say k levels, the table has k lines, one for each level as indicated in the second column. The third column shows how many experimental units had each level of the variable. The critical information is the final k - 1 columns which explain the coding for each of the k - 1 indicator variables created by SPSS for the variable. In our example, we made the coding match the coding we want by using the Categorical button and then leave default Last as the Reference category.

Example If Forward Conditional method is chosen (as in the presented example) the relevant tables can be found in the section Block 1: Method = Forward Stepwise (Conditional) in the SPSS output of our logistic regression analysis. The first table Omnibus Tests of Model Coefficients includes the Chi-Square goodness of fit test. It has the null hypothesis that intercept and all coefficients are zero ( ) Because in the final step (Step 2 in the present example), we reject the null hypothesis that knowing the independents makes no difference in predicting the dependent in logistic regression, and conclude that at least one coefficient Briefly: if Chi-square goodness of fit is significant, then the model has adequate fit.

Omnibus Tests of Model Coefficients Categorical Variables Codings
Example Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step 15.420 1 .000 Block Model Step 2 4.987 .026 20.406 2 Categorical Variables Codings Frequency Parameter coding (1) (2) category first 16 1.000 .000 second third 8 condit_2 is satisfied 24 isn't satisfied condit_1 20 Fig.4. Categorical Variable Codings and Omnibus Tests of Model Coefficients output tables

Example The table Model Summary (Fig. 5.) includes Cox & Snell R Square and Nagelkerke R Square measures both of which should be reported as approximations to Ordinary Least-Squares (OLS) Regression R-Square and the essence of which is measure of strength of association between the predictors and the prediction. The Nagelkerke R Square that does range from 0 to 1 is a more reliable measure of the relationship. If the Hosmer-Lemeshow goodness-of-fit test statistic is greater than .05, then the model has adequate fit. By the same token, if the test is significant, the model does not adequately fit the data. Table Hosmer and Lemeshow Test (Fig.5) suggests the model is a good fit to the data as p=0.761 (>0.05).

Example Fig. 5. Model Summary and Hosmer and Lemeshow Test output tables

Example The Classification Table (Fig.6) provides a quick and easy way of seeing how well the model performs on the observed data. As we can see from the Classification Table, the model correctly predicts 16/20 = 80% of the occurrences outcome=1. This is known as the sensitivity of prediction, the P(correct | event did occur), that is, the percentage of occurrences correctly predicted. We also see that this rule allows us to correctly classify 16/20 = 80% of the subjects where the predicted event was not observed. This is known as the specificity of prediction, the P(correct | event did not occur), that is, the percentage of nonoccurrence’s correctly predicted. Overall our predictions were correct 32 out of 40 times, for an overall success rate of 80%. In a perfect model, all cases will be on the diagonal and the overall percent correct will be 100%.

Example Fig. 6. Classification Table

Example The most important of all output is the Variables in the Equation table (Fig. 7). This table provides the regression coefficient (B), the Wald statistic (to test the statistical significance) and the all important Odds Ratio (Exp (B)) for each variable category. The Wald statistic is commonly used to test the significance of individual logistic regression coefficients for each independent variable (that is, to test the null hypothesis in logistic regression that a particular logit (effect) coefficient is zero). The null hypothesis is rejected, i. e., if , where significance level. In the present example (see final Step 2) the b coefficients are significant and positive for predictors condit_1(1) and condit_2(1), indicating that outcome=1 is associated with value 1=is satisfied of those predictors. The Exp(B) column (the Odds Ratio) tells us that cases from the condit_1(1) are times more likely than those from condit_1(0) (our reference category) to achieve outcome=1. Comparatively those from condit_2(1) are about more likely to achieve outcome=1 than those from the condit_2(0).

Example Fig.7. Variables in the Equation and Variables not in the Equation output tables

Example The Variables not in the Equation table (Fig. 7) reported about independent variables, which have no statistically significant impact on the dependent variable. In this example, such predictors are category and interval. The ‘B’ values are the logistic coefficients that can be used to create a predictive equation formula. In this example, for condit_1(1) and condit_2(1) It is not necessary to calculate the probability of the prediction because it is present in the Data Editor window if you define the values of the predictors and select the Probabilities and Group membership in the dialog box Logistic Regression: Save (see above).

BINARY LOGISTIC REGRESSION

Similar presentations

Presentation on theme: "BINARY LOGISTIC REGRESSION"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

BINARY LOGISTIC REGRESSION

Similar presentations

Presentation on theme: "BINARY LOGISTIC REGRESSION"— Presentation transcript:

Similar presentations

About project

Feedback