Advanced Quantitative Techniques

Slides:



Advertisements
Similar presentations
Logistic Regression Psy 524 Ainsworth.
Advertisements

Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
Logit & Probit Regression
University of North Carolina at Chapel Hill
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Ordinal Logistic Regression
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Hierarchical Binary Logistic Regression
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Chapter 16 Data Analysis: Testing for Associations.
Regression & Correlation. Review: Types of Variables & Steps in Analysis.
Copyright © 2010 Pearson Education, Inc. Slide
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Logistic Regression. Linear Regression Purchases vs. Income.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
AMMBR II Gerrit Rooks. Checking assumptions in logistic regression Hosmer & Lemeshow Residuals Multi-collinearity Cooks distance.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Nonparametric Statistics
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Multiple Regression Analysis Bernhard Kittel Center for Social Science Methodology University of Oldenburg.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Logistic Regression PLAN 6930.
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Advanced Quantitative Techniques
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Advanced Quantitative Techniques
CHAPTER 7 Linear Correlation & Regression Methods
Dr. Siti Nor Binti Yaacob
Notes on Logistic Regression
Logistic Regression CSC 600: Data Mining Class 14.
William Greene Stern School of Business New York University
Multiple Regression Analysis and Model Building
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Lab 9 – Regression Diagnostics
Generalized Linear Models
Business Statistics, 4e by Ken Black
Multiple logistic regression
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Nonparametric Statistics
Soc 3306a: ANOVA and Regression Models
University of North Carolina at Chapel Hill
ביצוע רגרסיה לוגיסטית. פרק ה-2
Scatter Plots of Data with Various Correlation Coefficients
Chapter 2 Looking at Data— Relationships
Soc 3306a Lecture 11: Multivariate 4
Introduction to Logistic Regression
Modeling with Dichotomous Dependent Variables
Regression Forecasting and Model Building
Business Statistics, 4e by Ken Black
Logistic Regression.
Presentation transcript:

Advanced Quantitative Techniques Logistic regressions

Difference between linear and logistic regression Linear (OLS) Regression Logistic Regression For an interval-ratio dependent variable For a categorical (usually binary)* dependent variable Predicts value of dependent variable given values of independent variables Predicts probability that dependent variable will show membership to a category given values of independent variables *For this class, we are only using interval-ratio or binary variables. Count variables (categorical variables with more than two outcomes) require a more advanced regression (Poisson regression).

Logistic / logit Logit=ln(odds ratio) In Stata, there are two commands for logistic regression: logit and logistic. The logit command gives the regression coefficients to estimate the logit score. The logistic command gives us the odds ratios we need to interpret the effect size of the predictors. The logit is a function of the logistic regression: it is just a different way of presenting the same relationship between independent and dependent variables (see Acock, section 11.2)

Logistic / logit Open nlsy97_chapter11.dta We want to test the impact of some variables on the likelihood that a young person will drink alcohol summarize drank30 age97 pdrink97 dinner97 male if !missing(drank30, age97, pdrink97, dinner97, male)

Logistic Interpretation: The odds of drinking are multiplied by 1.169 for each more year of age. The odds of drinking are multiplied by 1.329 for each peer that drinks. The odds of drinking are multiplied by 0.942 for every day the person has dinner with their family. The LR chi2(4)=78.01, P<0.0001, means the model is statistically significant

Logit Coefficients tell the amount of increase in the predicted log odds of low = 1 that would be predicted by a 1 unit increase in the predictor, holding all other predictors constant. 

Comparing effects of variables It is hard to compare the effect of two independent variables using odds ratio when they are measured in different scales. For example, the variable male is binary (0 to 1), so it is simple to observe its effect in odds ratio terms. But it is hard to compare the effect of “male” with the effect of variable dinner97 (number of days the person has dinner with his or her family), which goes from 0 to 7. If he odds ratio of “male” tells us how more likely it is that a male will drink compared to a female, dinner97 tells us the probability change for each day. Beta coefficients standardize the effects, allowing a comparison based on standard deviations.

Comparing effect of variables listcoef, help If listcoef does not work, use findit listcoef to install command

Comparing effect of variables listcoef, help percent

Hypothesis testing 1. Wald chi-squared test: z reported by Stata in logistic regression. 2. Likelihood-ratio chi-squared test. Compare LR chi2 with and without the variable you want to test. To test variable “age97”: logistic drank30 male dinner97 pdrink97 estimates store a logistic drank30 age97 male dinner97 pdrink97 lrtest a

Hypothesis testing Models are statistically different

Hypothesis testing Same process, but for each of the variables lrdrop1 (install command using ssc install lrdrop1)

Marginal effects We will use the variable race97 and dropping the variable male. We want to test the effect of a person being black compared to being white. Thus, we will drop observations where the person has other racial background. generate black = race97 – 1 replace black=. If race97>2

Marginal effects label define black 0 “White” 1 “Black” label define drank30 0 “No” 1 “Yes” label values drank30 drank30 label values black black logit drank30 age97 i.black pdrink97 dinner97

Marginal effects

Marginal effects The margins command tell the difference in the probability of having drunk in the last 30 days is an individual is black compared with an individual is white. Initially, we are setting the covariates at the mean. So the command will tell us what is the difference between blacks and whites who are average on the other covariates.

Marginal effects margins, dydx(black) atmeans dy/dx: derivate at the point selected (where all other variables are at the mean) Interpretation: a black individual that is 13.67 years old, etc. will be 8.6% less likely to drink that a white individual that is 13.67 years old, etc.

Marginal effects We can also test marginal effects at points other than the mean using the at( ) option. margins, at(pdrink97=(1 2 3 4 5)) atmeans

Marginal effects For an individual with pdrink97 coded 2 we estimate a 36% probability that he or she drank in the last 30 days

Marginal effects Estimated probability that an adolescent drank in last month adjusted for age, race, and frequency of family meals (testing all of those at the mean).

Marginal effects For an individual that has dinner with his or her family 3 times a week, we estimate a 39% probability that he or she drank in the last 30 days

Example 1 Use severity.dta

Example 1 Use severity.dta We are trying to see what predicts whether an individual thinks that prison sentences are too severe

Example 1

Example 1

Diagnostics For diagnostics we will use the drink example. Use nlsy97_chapter11.dta

Diagnostics-Multicollinearity Run an OLS regression (estat vif command not available after logistic): regress drank30 age97 pdrink97 dinner97 male Back to the drink example Very low multicollinearity, no problem detected

Diagnostics-Outliers 8 6 3 . 2 7 5 1 4 9 P r o b O u t c m e : l a s i k y Run logistic regression predict probabilities and standardized residuals: predict prob predict residual, res predict rstandard, rstandard Now to identify outliers use leastlikely command

Diagnostics-Outliers Examining outliers Remember: in a logistic regression, the [Pearson] residual is not the same as the residual in an OLS regression. The Pearson residual is “the difference between the observed and estimated probabilities divided by the binomial standard deviation of the estimated probability” (Menard, chapter 4.4, p.82)

Diagnostics-Outliers scatter pubid rstandard, msymbol(none) mlabel(pubid) mlabposition(0)

Diagnostics-Influential cases list pubid if prob_dfbeta>1 & prob_dfbeta!=. No influential cases found.

Interactions Question: does having friends that drink have more impact on kids that do not have dinner at home? We are going to use dummy variables: dinner_away and peersdrink generate dinner_away=. replace dinner_away=0 if dinner97==7 replace dinner_away=1 if dinner97<7 gen peersdrink=. replace peersdrink=0 if pdrink97==1 replace peersdrink=1 if pdrink97>1

Interactions Test interaction between having dinner not at home at least once a week, and having friends that drink. Generate new variable: away_peers, and include it in the logistic regression.

Interactions

Interactions marginsplot