Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.

Similar presentations


Presentation on theme: "Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful."— Presentation transcript:

1 Regression with a Binary Dependent Variable

2 Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful What determines if a loan application is denied Why do some people go to college and others don’t? Why do some people buy houses while others rent?

3 In all these examples, the outcome is a qualitative response or a binary choice e.g. the teenager does or does not take up smoking etc. Y – dependent variable is assigned a value of 1 if the event occurs and 0 otherwise. Interpret the regression as modeling the probability that the dependent variable is equal to 1

4 The Linear Probability Model Simplest binary choice model The probability of the event occurring, p, is assumed to be a linear function of the explanatory variable(s)

5 Factors influencing graduation from high school Y = GRAD =1 for those who graduate, 0 otherwise X = AVSCR Composite test score on provincial exam Minimum = 20, scaled with a mean of 50 and a standard deviation of 10

6 Linear Probability Model Using OLS:

7 Interpretation The probability of graduating from high school increases by a proportion of 0.007, or 0.7% for every 1 point increase in score A one standard deviation increase in score will increase the probability of graduating by ????% If the score was zero, the probability of graduating is ????? Is the intercept meaningful??

8 Discrimination???? Y = deny (binary variable equal to 1 if a loan application is denied and 0 otherwise PI = ratio of applicant’s anticipated total monthly loan payments to his or her monthly income black = binary regressor equal to 1 if applicant is black and 0 otherwise

9 Interpretation A white applicant with a 30% PI If Does the estimate suggest that they might be racial bias in mortgage decisions? Explain fully

10 Problems with the model The error term is not a continuous distribution. Neither is it normal If OLS is used, the standard errors and the usual test statistics are invalidated The distribution is heteroscedastic because error term changes with the independent variable The predicted value for Y, may be greater than 1 or less than 0 for extreme values of X

11 Solutions Use MLE techniques to estimate coefficients a.Probit estimation – uses cumulative normal distribution function b.Logit estimation - uses logistic function These use non-linear techniques to estimate the regression function. Estimates are efficient, consistent and normally distributed in large samples so tests can be constructed in the usual way

12 Probit Regression With a single regressor X is Where Ф is the cumulative standard normal (z) distribution function

13 An Example The coefficients are difficult to interpret (but the signs and statistical significance are not) because they affect the probability of denial through the z- value. If β1 is positive then an increase in X will increase the probability that Y =1. If negative then an increase in X will decrease the probability of Y=1 Interpret test of significance as usual

14 Calculating probabilities What is the probability of denial for an applicant with PI= 30% = Ф(-2.91+2.97*0.3) =Ф(-1.30) = P(z<-1.30) =0.097 The probability that the application will be denied if the applicant’s PI is 30% equals 9.7% What is the change in predicted probability when PI ratio increases from 0.4 to 0.5?

15 Check 1.Comment on the coefficient of black.? 2. What is the difference in predicted denial probability between a white applicant with 30% PI ratio and a black applicant with the same ratio?

16 Missing Observations Collect data on Sleep and Age All data on Sleep but 20% of Age is missing How do you use all the data to show the effect of Age on Sleep?

17 Create a new variable say HAVEAGE = 1 if age data is available, 0 otherwise Interact this variable with the AGE variable = haveage*age = 0 if no data on age and =Age if data on age is available. Run a regression Sleep = Bo + B1HAVEAGE + B2HAVEAGE*AGE + e

18 Non-Experimental Data Non-experimental data can sometimes make it very difficult to draw policy implications from regression analysis

19 GUN CONTROL Suppose your sample consists of households that have been victimized by robbery. The dependent variable takes a value of 1 if a household member is shot during the robbery and 0 otherwise. One of your explanatory variables is a dummy variable equal to 1 if there is a handgun present in the house, 0 otherwise. When a handgun is present in a household, an occupant of that house is much more likely to be shot in the process of a robbery than when no handgun is present. Therefore, to minimize injury and loss of life from robbery incidents, private ownership of handguns should be banned. Evaluate this policy proposal and the "evidence" upon which it is premised

20 Briefly describe the nature of the true "experiment" that would allow an unambiguous determination of the effect of handgun presence on robbery shootings via a regression like this.

21 LEGALIZATION OF MARIJUANA: Suppose you have a random sample of at-risk 18-year-olds. The dependent variable is the number of times each teenager has used heroin. Among the explanatory variables is a dummy variable that takes a value of 1 if the subject experimented with marijuana prior to age 13, and 0 otherwise. You find that the coefficient on this dummy variable is positive and strongly statistically significant. Therefore, we should not legalize marijuana use (which would make it much more accessible to pre-teens) since this will lead to widespread use of heroin. Evaluate this policy proposal and the "evidence" upon which it is premised

22 Briefly describe the nature of the true "experiment" that would allow an unambiguous determination of the effect of pre-teen marijuana use on subsequent heroin use via a regression like this


Download ppt "Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful."

Similar presentations


Ads by Google