Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.

Similar presentations


Presentation on theme: "Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors."— Presentation transcript:

1 Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors are related to Y?

2 Multiple linear regression model Y =  o +  1 X 1 +  2 X 2 +... +  p X p + ε Y = outcome, dependent variable X i = predictors, independent variables ε = error (or residual), normal; mean = 0, constant variance =  2 reflects how individuals deviate from others with the same values of x’s  i parameters describing the intercept and slope for each predictor

3 Evaluating Assumptions Y =  o +  1 X 1  Y is normally distributed for each value of X Can draw histogram overall for Y – can’t likely do for each X  Mean of Y changes linearly with X Scatterplot of X and Y (see if points follow a line) Plots of residuals versus X (or predicted values)  Variance  2 is constant for each X Scatterplot of X and Y (see if deviations from line are same by X levels)  Remember there is no assumption on distribution of X

4 Plot of SBP Versus AGE PLOT sbp*age;

5 Plot of Model Residuals Versus AGE Look for patterns. Patterns indicate relationship not linear. Note the sum of residuals = 0 PLOT residual.*age;

6 Plot of Model Residuals Versus Predicted Values Look for patterns. For simple regression this is same as previous graph (residual versus x) PLOT residual.* predicted.;

7 Evaluating Assumptions: Multiple Regression Y =  o +  1 X 1 +  2 X 2  Y is normally distributed for each combination of Xs Can draw histogram overall – can’t likely do for each X  Mean of Y changes linearly with each X and for every value of every other X  Variance  2 is constant for each combinations of X Scatterplot of Y with each X (doesn’t really test assumption) Scatterplot of residuals versus predicted values Test for interactions

8 Interpreting Coefficients: Simple Regression Y =  o +  1 X 1  o = mean of Y when X 1 = 0  1 = change in mean of Y per 1-unit increase in X 1 Suppose X 1 = 5Then Y =  o + 5  1 Suppose X 1 = 6Then Y =  o + 6  1 mean Y x1=6 – mean Y x1=5 = (  0 + 6  1 ) - (  0 + 5  1 ) =  1 Same difference for any x and x+1 chosen

9 Interpreting Coefficients: Multiple Regression Y =  o +  1 X 1 +  2 X 2  o = mean of Y when X 1 = 0 and X 2 = 0  1 = change in mean of Y per 1-unit increase in X 1 for fixed X 2 Suppose X 1 = 5Then Y =  o + 5  1 + X 2  2 Suppose X 1 = 6Then Y =  o + 6  1 + X 2  2 mean Y x1=6 – mean Y x1=5 = (  0 + 6  1 + X 2  2 ) - (  0 + 5  1 + X 2  2 ) =  1 Same value for every value of X 2

10 Interpreting Relationships: Multiple Regression Y =  o +  1 X 1 +  2 X 2  1 measures effect of X 1 “adjusting for X 2 ” or “above and beyond” X 2  2 measures effect of X 2 “adjusting for X 1 ” or “above and beyond” X 1 If X 1 is significantly related to Y in simple regression but not after including X 2 in the model then: 1) The relationship of Y to X 1 was confounded by X 2 2) X 1 is not an independent predictor of Y

11 Multiple Regression: R 2 Coefficient of Determination (R 2 ) is proportion of variance explained by all variables in model Adding variables to the model can only increase the R 2. Adding a highly correlated variable to a model will likely add little to R 2. Always interpret R 2 in the context of the problem –Laboratory conditions yield high R 2 –Real world yield lower R 2 but X variables may still be important

12 Categorical Predictors; 0/1 coding Compare two groups; A and B. Let X = 0 for A, X = 1 for B Y =  0 +  1 X For Group A, X= 0, mean outcome is; Y =  0 +  1 (0) =  0 For Group B, X = 1, mean outcome is; Y =  0 +  1 (1) =  0 +  1 mean Y group B - mean Y group A = (  0 +  1 ) -  0 =  1  0 is the mean response for Group A   is the difference in mean response between Group B and Group A

13 What if I use 1 and 2? Compare two groups; A and B. Let X = 1 for A, X = 2 for B Y =  0 +  1 X For Group A, X= 5, mean outcome is; Y =  0 +  1 (1) =  0 +  1 For Group B, X = 6, mean outcome is; Y =  0 +  1 (2) =  0 + 2  1 mean Y group B - mean Y group A = (  0 + 2  1 ) – (  0 +  1 ) =  1  0 +  1 is the mean response for Group A   is the difference in mean response between Group B and Group A

14 Categorical Predictors More than two groups require more dummy (indicator) variables Choose one group as reference group Form a indicator variable for each of the other groups K groups require K-1 indicator variables

15 Example - three groups Diet 1, 2, and 3; Choose “3” as reference group (could choose any of three) Y =  0 +  1 X 1 +  2 X 2 Diet 1: X 1 = 1, X 2 = 0 Diet 2: X 1 = 0, X 2 = 1 Diet 3: X 1 = 0, X 2 = 0  0 is mean response for Diet 3  1 is difference in mean response between Diet 1 and Diet 3  2 is difference in mean response between Diet 2 and Diet 3

16 DUMMY CODING IN SAS * Assume variable diet with value 1-3; x1 = 0; x2 = 0; if diet = 1 then x1 = 1; if diet = 2 then x2 = 1; PROC REG DATA = lipid; MODEL chol = x1 x2; RUN;

17 DATA lipid; INFILE DATALINES; INPUT diet chol wt; * Assume variable diet with value 1-3; x1 = 0; x2 = 0; if diet = 1 then x1 = 1; if diet = 2 then x2 = 1; DATALINES; 1 175 140 1 180 135 1 185 145 1 190 140 1 195 155 2 190 140 2 195 135 2 200 150 2 205 155 2 210 150 3 180 140 3 185 150 3 190 155 3 195 145 3 200 150 ;

18 PROC MEANS N MEAN STD; CLASS diet; PROC REG; MODEL chol = x1 x2; RUN; PROC MEANS OUTPUT Analysis Variable : chol diet Obs N Mean Std Dev 1 5 5 185.0000000 7.9056942 2 5 5 200.0000000 7.9056942 3 5 5 190.0000000 7.9056942 PROC REG OUTPUT Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 190.00000 3.53553 53.74 <.0001 x1 1 -5.00000 5.00000 -1.00 0.3370 x2 1 10.00000 5.00000 2.00 0.0687

19 PROC REG; MODEL chol = x1 x2 ; MODEL chol = x1 x2 wt; RUN; PROC REG OUTPUT Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 190.00000 3.53553 53.74 <.0001 x1 1 -5.00000 5.00000 -1.00 0.3370 x2 1 10.00000 5.00000 2.00 0.0687 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 84.28571 36.91070 2.28 0.0433 x1 1 -1.42857 4.13890 -0.35 0.7365 x2 1 11.42857 3.97892 2.87 0.0152 wt 1 0.71429 0.24868 2.87 0.0152


Download ppt "Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors."

Similar presentations


Ads by Google