Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,

Slides:



Advertisements
Similar presentations
Making Inferences about Causality In general, children who watch violent television programs tend to behave more aggressively toward their peers and siblings.
Advertisements

Topic 12: Multiple Linear Regression
Canonical Correlation
Lesson 10: Linear Regression and Correlation
Structural Equation Modeling
Probability & Statistical Inference Lecture 9
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Statistics Measures of Regression and Prediction Intervals.
 Coefficient of Determination Section 4.3 Alan Craig
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Evaluating Theoretical Models R-squared represents the proportion of the variance in Y that is accounted for by the model. When the model doesn’t do any.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.
Understanding the General Linear Model
Structural Equation Modeling
Causal Modelling and Path Analysis. Some Notes on Causal Modelling and Path Analysis. “Path analysis is... superior to ordinary regression analysis since.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Statistics for the Social Sciences
Path Analysis Danielle Dick Boulder Path Analysis Allows us to represent linear models for the relationships between variables in diagrammatic form.
“Ghost Chasing”: Demystifying Latent Variables and SEM
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Path Analysis Frühling Rijsdijk SGDP Centre Institute of Psychiatry King’s College London, UK.
Path Analysis Frühling Rijsdijk. Biometrical Genetic Theory Aims of session:  Derivation of Predicted Var/Cov matrices Using: (1)Path Tracing Rules (2)Covariance.
Correlation and Regression Analysis
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Relationships Among Variables
Path Analysis HGEN619 class Method of Path Analysis allows us to represent linear models for the relationship between variables in diagrammatic.
Lecture 15 Basics of Regression Analysis
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Regression with 2 IVs Generalization of Regression from 1 to 2 Independent Variables.
Basic linear regression and multiple regression Psych Fraley.
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
Business Research Methods William G. Zikmund Chapter 23 Bivariate Analysis: Measures of Associations.
Institute of Psychiatry King’s College London, UK
G Lect 21 G Lecture 2 Regression as paths and covariance structure Alternative “saturated” path models Using matrix notation to write linear.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
Review of Research Methods. Overview of the Research Process I. Develop a research question II. Develop a hypothesis III. Choose a research design IV.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Discussion of time series and panel models
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
Regression Understanding relationships and predicting outcomes.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Correlation & Regression Analysis
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Causal inferences This week we have been discussing ways to make inferences about the causal relationships between variables. One of the strongest ways.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 18 Multivariate Statistics.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Multiple Regression.
MRC SGDP Centre, Institute of Psychiatry, Psychology & Neuroscience
Making Sense of Advanced Statistical Procedures in Research Articles
Regression Analysis PhD Course.
Correlation and Regression
Stats Club Marnie Brennan
Multiple Regression.
CHAPTER- 17 CORRELATION AND REGRESSION
Regression Analysis.
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
Structural Equation Modeling
Presentation transcript:

Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g., 1 = experimental condition, 0 = control condition), the regression weights encode the mean difference between groups on the continuous outcome variable. Coding variables in this manner is sometimes referred to as dummy coding because the numbers do not correspond to real-scaled values

Example CaseXY a b c d e f g h 010.0

CaseXY a b c d e f g h M X = 0 = 9.7 M X = 1 = 14.2 least-squares estimates:

The Meaning of the Parameters In short, a represents the mean for people with a score of zero (e.g., the control condition). (a + b) represents the mean for people with a score of 1.00 (e.g., people in the experimental condition) b represents the mean difference between conditions

a = value of Y when X = 0 b = increase in Y for a one unit increase in X

R-Squared R 2 represents the proportion of variance in Y that is explained by the experimental manipulation. In this example, approximately 84% of the variation in Y can be accounted for by the manipulation.

R-Squared Because the total variance in Y is 5.717, we say that ( ) =.165 of the variance is residual variance—variance unexplained by the model

Point Biserial Correlation Recall that previously we had noted that, in situations involving a single predictor, the squared correlation coefficient is equal to R 2. In this example, r =.914. When we have a correlation between a binary variable and a continuous variable, we often call it a point biserial correlation.

Linear Regression and Binary Variables In summary, the basic linear model can be used to model data in two common research scenarios in psychology –modeling the relationship between two continuous variables –modeling the relationship between a binary variable (e.g., experimental condition) and a continuous outcome The parameters have the same meaning in both cases, and the resulting R 2 can be used to quantify the model’s ability to represent the data effectively.

Basic Symbols for Diagramming (More Complex) Causal Structures circles denote latent variables double-headed arrows represent covariances (a variance is the covariance of a variable with itself) single-headed arrows represent unidirectional casual influences (the smaller arrow represents the contribution of residual variance that is uncorrelated with the IV) When representing the causal relations among variables, psychologists often use visual diagrams As we discussed before, these symbols are used by convention, often under the rubric of “causal analysis,” “path analysis,” or “structural equation modeling”

Basic Symbols for Diagramming (More Complex) Causal Structures Rectangles are often used to denote manifest or measured variables, whereas circles are used to specify latent variables or constructs

Why is modeling the processes in this way useful? Different casual structures imply different covariance/ correlation matrices The first casual model predicts r(x 3, x 1 ) =.5 whereas the second model predicts r(x 3, x 1 ) =.7 How do we study the different predictions implied by these models?

Tracing Rules The correlation between X and Y can be expressed as the sum of the compound paths connecting these two variables, where a compound path is a path along arrows that follows three rules: 1. no loops 2. no going forward then backward 3. no more than one curved arrow per path X Y Z a c b r X,Y = a + bc

Confounds If (a) X and Z are correlated and (b) Z has an affect on Y, then Z is a potential confound. When these two conditions are met, we expect a correlation between X and Y regardless of whether X has a causal influence on Y. X Y Z 0 c b r X,Y = bc X Y Z a c b r X,Y = a + bc

Random Assignment How can we eliminate confounds? One approach involves random assignment to conditions. If one has the luxury of conducting a proper experiment, one can randomly assign people to different levels of X. When people are randomly assigned to different levels of X, there is less of a reason to expect a correlation between X and Z. X Y Z 0 c 0 r X,Y = 0

Random Assignment This is not a perfect solution because if X causes Z, there will be covariation between X and Z, and, hence, a correlation between X and Y even if X doesn’t cause Y. In this situation we would say that X has an indirect effect on Y. X still causes Y, but only indirectly. This will have different implications for theory. X Y Z 0 c b r X,Z = b r X,Y = bc

Random Assignment Multiple Regression Another way to handle possible confounds is to measure them and use the tools of multiple regression to estimate the paths simultaneously. In multiple regression, we estimate the weight for each variable while simultaneously statistically controlling the influence of other measured variables X Y Z b1b1 r x,y r X,Y = 0 b2b2

Let’s experiment with this S-Plus code n<-1000 a<-.5 b1<-.7 b2<-.3 f1<-rnorm(n) x1<-a*f1 + sqrt(1-a^2)*rnorm(n) x2<-a*f1 + sqrt(1-a^2)*rnorm(n) x3<- b1*x1 + b2*x2 + rnorm(n) summary(lm(x3~x1+x2)) adjust the two weights and see if the estimation techniques are able to accurately recover the “true” parameters