Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,

Similar presentations


Presentation on theme: "Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,"— Presentation transcript:

1 Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g., 1 = experimental condition, 0 = control condition), the regression weights encode the mean difference between groups on the continuous outcome variable. Coding variables in this manner is sometimes referred to as dummy coding because the numbers do not correspond to real-scaled values

2 Example CaseXY a 114.3 b 115.8 c 113.5 d 114.6 e 010.7 f 0 8.5 g 112.9 h 010.0

3 CaseXY a 114.3 b 115.8 c 113.5 d 114.6 e 010.7 f 0 8.5 g 112.9 h 010.0 M X = 0 = 9.7 M X = 1 = 14.2 least-squares estimates:

4 The Meaning of the Parameters In short, a represents the mean for people with a score of zero (e.g., the control condition). (a + b) represents the mean for people with a score of 1.00 (e.g., people in the experimental condition) b represents the mean difference between conditions

5 a = value of Y when X = 0 b = increase in Y for a one unit increase in X

6 R-Squared R 2 represents the proportion of variance in Y that is explained by the experimental manipulation. In this example, approximately 84% of the variation in Y can be accounted for by the manipulation.

7 R-Squared Because the total variance in Y is 5.717, we say that (1 -.835) =.165 of the variance is residual variance—variance unexplained by the model

8 Point Biserial Correlation Recall that previously we had noted that, in situations involving a single predictor, the squared correlation coefficient is equal to R 2. In this example, r =.914. When we have a correlation between a binary variable and a continuous variable, we often call it a point biserial correlation.

9 Linear Regression and Binary Variables In summary, the basic linear model can be used to model data in two common research scenarios in psychology –modeling the relationship between two continuous variables –modeling the relationship between a binary variable (e.g., experimental condition) and a continuous outcome The parameters have the same meaning in both cases, and the resulting R 2 can be used to quantify the model’s ability to represent the data effectively.

10 Basic Symbols for Diagramming (More Complex) Causal Structures circles denote latent variables double-headed arrows represent covariances (a variance is the covariance of a variable with itself) single-headed arrows represent unidirectional casual influences (the smaller arrow represents the contribution of residual variance that is uncorrelated with the IV) When representing the causal relations among variables, psychologists often use visual diagrams As we discussed before, these symbols are used by convention, often under the rubric of “causal analysis,” “path analysis,” or “structural equation modeling”

11 Basic Symbols for Diagramming (More Complex) Causal Structures Rectangles are often used to denote manifest or measured variables, whereas circles are used to specify latent variables or constructs

12 Why is modeling the processes in this way useful? Different casual structures imply different covariance/ correlation matrices The first casual model predicts r(x 3, x 1 ) =.5 whereas the second model predicts r(x 3, x 1 ) =.7 How do we study the different predictions implied by these models?

13 Tracing Rules The correlation between X and Y can be expressed as the sum of the compound paths connecting these two variables, where a compound path is a path along arrows that follows three rules: 1. no loops 2. no going forward then backward 3. no more than one curved arrow per path X Y Z a c b r X,Y = a + bc

14 Confounds If (a) X and Z are correlated and (b) Z has an affect on Y, then Z is a potential confound. When these two conditions are met, we expect a correlation between X and Y regardless of whether X has a causal influence on Y. X Y Z 0 c b r X,Y = bc X Y Z a c b r X,Y = a + bc

15 Random Assignment How can we eliminate confounds? One approach involves random assignment to conditions. If one has the luxury of conducting a proper experiment, one can randomly assign people to different levels of X. When people are randomly assigned to different levels of X, there is less of a reason to expect a correlation between X and Z. X Y Z 0 c 0 r X,Y = 0

16 Random Assignment This is not a perfect solution because if X causes Z, there will be covariation between X and Z, and, hence, a correlation between X and Y even if X doesn’t cause Y. In this situation we would say that X has an indirect effect on Y. X still causes Y, but only indirectly. This will have different implications for theory. X Y Z 0 c b r X,Z = b r X,Y = bc

17 Random Assignment Multiple Regression Another way to handle possible confounds is to measure them and use the tools of multiple regression to estimate the paths simultaneously. In multiple regression, we estimate the weight for each variable while simultaneously statistically controlling the influence of other measured variables X Y Z b1b1 r x,y r X,Y = 0 b2b2

18 Let’s experiment with this S-Plus code n<-1000 a<-.5 b1<-.7 b2<-.3 f1<-rnorm(n) x1<-a*f1 + sqrt(1-a^2)*rnorm(n) x2<-a*f1 + sqrt(1-a^2)*rnorm(n) x3<- b1*x1 + b2*x2 + rnorm(n) summary(lm(x3~x1+x2)) adjust the two weights and see if the estimation techniques are able to accurately recover the “true” parameters


Download ppt "Linear Regression and Binary Variables The independent variable does not necessarily need to be continuous. If the independent variable is binary (e.g.,"

Similar presentations


Ads by Google