Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Brief introduction on Logistic Regression
Unit 0, Pre-Course Math Review Session 0.2 More About Numbers
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Logistic Regression.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Logistic Regression Part I - Introduction. Logistic Regression Regression where the response variable is dichotomous (not continuous) Examples –effect.
The Use and Interpretation of the Constant Term
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Multiple Linear Regression Model
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 3, 4, 6, 7.
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Chapter 4 Multiple Regression.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 4. Further Issues.
Topic 3: Regression.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
An Introduction to Logistic Regression
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Hypothesis Testing II The Two-Sample Case.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 Research Method Lecture 6 (Ch7) Multiple regression with qualitative variables ©
Lecture 22 Dustin Lueker.  The sample mean of the difference scores is an estimator for the difference between the population means  We can now use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Ordinary Least Squares Estimation: A Primer Projectseminar Migration and the Labour Market, Meeting May 24, 2012 The linear regression model 1. A brief.
Logistic Regression Database Marketing Instructor: N. Kumar.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
LOGISTIC REGRESSION A statistical procedure to relate the probability of an event to explanatory variables Used in epidemiology to describe and evaluate.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Regression. Types of Linear Regression Model Ordinary Least Square Model (OLS) –Minimize the residuals about the regression linear –Most commonly used.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.
© Copyright McGraw-Hill 2000
Chapter 13 Multiple Regression
Multiple Logistic Regression STAT E-150 Statistical Methods.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
Multiple Regression David A. Kenny January 12, 2014.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Nonparametric Statistics
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Curvilinear Relationships
THE LOGIT AND PROBIT MODELS
The Correlation Coefficient (r)
THE LOGIT AND PROBIT MODELS
Nonparametric Statistics
What’s the plan? First, we are going to look at the correlation between two variables: studying for calculus and the final percentage grade a student gets.
Logistic Regression.
The Correlation Coefficient (r)
Presentation transcript:

Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression

Interaction term An interaction means that the effect of one variable is different for different types of individuals, e.g. Males and Females. If we think that Males react differently to pain and this reaction has different effect on Y than in the case of Females we would estimate the following model: Y = b0 + b1*pain + b3*male*pain + b2*male which can be rewritten as: Y = b0 + (b1 + b3*male)*pain + b2*male Now the effect of pain for females (female==0) is b1 + b3*0 = b1. The effect of pain for males (male==1) is b1 + b3*1 = b1 + b3.

Interaction Note that the only '1' for male*pain is when you actually have a male with pain. Thus, the coefficient associated with this value of 1 will represents a unique effect of pain on males that is not there on females. This is an interaction.

Example with dummies Dependent Variable Y – exam scores C = Coffee (yes = 1, no = 0) R = Chocolate (yes = 1, no = 0) Some take C, some R, but some both. In a regression, the "both Coffee and Chocolate" variable would be referred to as "the interaction of Coffee and Chocolate". C*R Regression: Y = C + 10R – 3C*R

Interpretation If either C or R is zero, C*R equals zero; if both Coffee and Chocolate are 1, then C*R equals one. That is exactly what we want for comparison of the effect of interaction. In a regression result, the simplest way to interpret the coefficient of a dummy variable is, "what happens when you change the value from 0 to 1 and leave all the other variables the same.“ However note that that C*R = 1 implies that C=1 and R=1

Combination of dummies There are four possible combinations for C, R, and CxR: 1. C = 0, R = 0, C*R = 0 2. C = 1, R = 0, C*R = 0 3. C = 0, R = 1, C*R = 0 4. C = 1, R = 1, C*R = 1 Interpretation of these situations

Diminishing return Y = C + 20R – 3C*R Even though the coefficient of the interaction is negative, Coffee and Chocolate together might be a positive thing. Taking both Coffee and Chocolate, score 27 points higher! What the -3 is telling you is, there are diminishing returns to taking both. You might think that, since Coffee improves you by 10, and Chocolate improves you by 20, that, if you take both, you'll improve by 30. That is not right.

Interpreting Parameters with Interaction Terms An interaction term is a term composed of the product of two characteristics. For example: Income explained by gender and education Interaction term: Female*Education. Why are interaction terms used? Different slopes for men and women!

Eg Income = a + b1F + b2educ + b3F*educ The parameter on the interaction term, b3,tells us the difference between the male slope and female slope for income.

Parameters Suppose we estimate parameters using regression for the following two models: Income = a1 + g*educ for men Income = a2 + d*educ for women And then we estimate the parameters of a third model on pooled data: Income= a + b1F + b2educ + b3(F*educ) It turns out that: a = a1 b1 = a2 – a1 b2 = g b3 = d - g

Logistic regression Regression and dummy DV: I What we want to predict from a knowledge of relevant independent variables is not a precise numerical value of a dependent variable, but rather the probability (p) that it is 1 (event occurring) rather than 0 (event not occurring). This means that, while in linear regression, the relationship between the dependent and the independent variables is linear, this assumption is not made in logistic regression. Instead, the logistic regression function is use. Why not to use ordinary regression? The predicted values could become greater than one and less than zero. Such values are theoretically inadmissible.

Regression and dummy DV: II One of the assumptions of regression is that the variance of Y is constant across values of X. This cannot be the case with a binary variable, because the variance is pq. When 50 percent of the people are 1s, then the variance is.25, its maximum value. As we move to more extreme values, the variance decreases. When P=.10, the variance is.1*.9 =.09, so as P approaches 1 or zero, the variance approaches zero.

Regression and dummy DV: III The significance testing of the b weights rest upon the assumption that errors of prediction (Y-Y') are normally distributed. Because Y only takes the values 0 and 1, this assumption is pretty hard to justify, even approximately. Therefore, the tests of the regression weights are suspect if you use linear regression with a binary DV.

Odds and log odds Suppose we only know a person's education and we want to predict whether that person voted (1) or not voted (0) in the last election. We can talk about the probability of voting, or we can talk about the odds of voting. Let's say that the probability of voting at a given education is.90. Then the odds would be Odds = p / 1 – p or Odds = p / q where q = 1 - p (Odds can also be found by counting the number of people in each group and dividing one number by the other. Clearly, the probability is not the same as the odds.)

Odds and log odds In our example, the odds would be.90/.10 or 9 to one. Now the odds of not voting would be.10/.90 or 1/9 or.11. This asymmetry is unappealing, because the odds of voting should be the opposite of the odds of not votiong. We can take care of this asymmetry though the natural logarithm, ln. The natural log of 9 is (ln(.9/.1)=2.217). The natural log of 1/9 is (ln(.1/.9)=-2.217), so the log odds of voting is exactly opposite to the log odds of not voting.

Natural logarithm The natural logarithm is the logarithm to the base e, where e is a constant approximately equal to 2.7. The natural logarithm is generally written as ln(x), or sometimes, if the base of e is implicit, as log(x). The natural logarithm of a number x (written as ln(x)) is the power to which e would have to be raised to equal x. For example, ln( ) is 2, because e 2 = The natural log of e itself (ln(e)) is 1 because e 1 = e, while the natural logarithm of 1 (ln(1)) is 0, since e 0 = 1.

Ln Note that the natural log is zero when X is 1. When X is larger than one, the log curves up slowly. When X is less than one, the natural log is less than zero, and decreases rapidly as X approaches zero. When P =.50, the odds are.50/.50 or 1, and ln(1) =0. If P is greater than.50, ln(P/(1-P) is positive; if P is less than.50, ln(odds) is negative. [A number taken to a negative power is one divided by that number, e.g. e-10 = 1/e10. A logarithm is an exponent from a given base, for example ln(e10) = 10.]

Logistic regression Ln(p / 1 – p) = a + B 1 *X 1 + B 2 *X 2