Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Quantitative Techniques

Similar presentations


Presentation on theme: "Advanced Quantitative Techniques"— Presentation transcript:

1 Advanced Quantitative Techniques
Logistic regressions

2 Difference between linear and logistic regression
Linear (OLS) Regression Logistic Regression For an interval-ratio dependent variable For a categorical (usually binary)* dependent variable Predicts value of dependent variable given values of independent variables Predicts probability that dependent variable will show membership to a category given values of independent variables *For this class, we are only using interval-ratio or binary variables. Count variables (categorical variables with more than two outcomes) require a more advanced regression (Poisson regression).

3 Logistic / logit Logit=ln(odds ratio)
In Stata, there are two commands for logistic regression: logit and logistic. The logit command gives the regression coefficients to estimate the logit score. The logistic command gives us the odds ratios we need to interpret the effect size of the predictors. The logit is a function of the logistic regression: it is just a different way of presenting the same relationship between independent and dependent variables (see Acock, section 11.2)

4 Logistic / logit Open nlsy97_chapter11.dta
We want to test the impact of some variables on the likelihood that a young person will drink alcohol summarize drank30 age97 pdrink97 dinner97 male if !missing(drank30, age97, pdrink97, dinner97, male)

5 Logistic Interpretation:
The odds of drinking are multiplied by for each more year of age. The odds of drinking are multiplied by for each peer that drinks. The odds of drinking are multiplied by for every day the person has dinner with their family. The LR chi2(4)=78.01, P<0.0001, means the model is statistically significant

6 Logit Coefficients tell the amount of increase in the predicted log odds of low = 1 that would be predicted by a 1 unit increase in the predictor, holding all other predictors constant. 

7 Comparing effects of variables
It is hard to compare the effect of two independent variables using odds ratio when they are measured in different scales. For example, the variable male is binary (0 to 1), so it is simple to observe its effect in odds ratio terms. But it is hard to compare the effect of “male” with the effect of variable dinner97 (number of days the person has dinner with his or her family), which goes from 0 to 7. If he odds ratio of “male” tells us how more likely it is that a male will drink compared to a female, dinner97 tells us the probability change for each day. Beta coefficients standardize the effects, allowing a comparison based on standard deviations.

8 Comparing effect of variables
listcoef, help If listcoef does not work, use findit listcoef to install command

9 Comparing effect of variables
listcoef, help percent

10 Hypothesis testing 1. Wald chi-squared test: z reported by Stata in logistic regression. 2. Likelihood-ratio chi-squared test. Compare LR chi2 with and without the variable you want to test. To test variable “age97”: logistic drank30 male dinner97 pdrink97 estimates store a logistic drank30 age97 male dinner97 pdrink97 lrtest a

11 Hypothesis testing Models are statistically different

12 Hypothesis testing Same process, but for each of the variables lrdrop1
(install command using ssc install lrdrop1)

13 Marginal effects We will use the variable race97 and dropping the variable male. We want to test the effect of a person being black compared to being white. Thus, we will drop observations where the person has other racial background. generate black = race97 – 1 replace black=. If race97>2

14 Marginal effects label define black 0 “White” 1 “Black” label define drank30 0 “No” 1 “Yes” label values drank30 drank30 label values black black logit drank30 age97 i.black pdrink97 dinner97

15 Marginal effects

16 Marginal effects The margins command tell the difference in the probability of having drunk in the last 30 days is an individual is black compared with an individual is white. Initially, we are setting the covariates at the mean. So the command will tell us what is the difference between blacks and whites who are average on the other covariates.

17 Marginal effects margins, dydx(black) atmeans
dy/dx: derivate at the point selected (where all other variables are at the mean) Interpretation: a black individual that is years old, etc. will be 8.6% less likely to drink that a white individual that is years old, etc.

18 Marginal effects We can also test marginal effects at points other than the mean using the at( ) option. margins, at(pdrink97=( )) atmeans

19 Marginal effects For an individual with pdrink97 coded 2 we estimate a 36% probability that he or she drank in the last 30 days

20 Marginal effects Estimated probability that an adolescent drank in last month adjusted for age, race, and frequency of family meals (testing all of those at the mean).

21 Marginal effects For an individual that has dinner with his or her family 3 times a week, we estimate a 39% probability that he or she drank in the last 30 days

22 Example 1 Use severity.dta

23 Example 1 Use severity.dta
We are trying to see what predicts whether an individual thinks that prison sentences are too severe

24 Example 1

25 Example 1

26 Diagnostics For diagnostics we will use the drink example. Use nlsy97_chapter11.dta

27 Diagnostics-Multicollinearity
Run an OLS regression (estat vif command not available after logistic): regress drank30 age97 pdrink97 dinner97 male Back to the drink example Very low multicollinearity, no problem detected

28 Diagnostics-Outliers
8 6 3 . 2 7 5 1 4 9 P r o b O u t c m e : l a s i k y Run logistic regression predict probabilities and standardized residuals: predict prob predict residual, res predict rstandard, rstandard Now to identify outliers use leastlikely command

29 Diagnostics-Outliers
Examining outliers Remember: in a logistic regression, the [Pearson] residual is not the same as the residual in an OLS regression. The Pearson residual is “the difference between the observed and estimated probabilities divided by the binomial standard deviation of the estimated probability” (Menard, chapter 4.4, p.82)

30 Diagnostics-Outliers
scatter pubid rstandard, msymbol(none) mlabel(pubid) mlabposition(0)

31 Diagnostics-Influential cases
list pubid if prob_dfbeta>1 & prob_dfbeta!=. No influential cases found.

32 Interactions Question: does having friends that drink have more impact on kids that do not have dinner at home? We are going to use dummy variables: dinner_away and peersdrink generate dinner_away=. replace dinner_away=0 if dinner97==7 replace dinner_away=1 if dinner97<7 gen peersdrink=. replace peersdrink=0 if pdrink97==1 replace peersdrink=1 if pdrink97>1

33 Interactions Test interaction between having dinner not at home at least once a week, and having friends that drink. Generate new variable: away_peers, and include it in the logistic regression.

34 Interactions

35 Interactions marginsplot


Download ppt "Advanced Quantitative Techniques"

Similar presentations


Ads by Google