Download presentation
Presentation is loading. Please wait.
1
Some Terms Y = o + 1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors are related to Y?
2
Multiple linear regression model Y = o + 1 X 1 + 2 X 2 +... + p X p + ε Y = outcome, dependent variable X i = predictors, independent variables ε = error (or residual), normal; mean = 0, constant variance = 2 reflects how individuals deviate from others with the same values of x’s i parameters describing the intercept and slope for each predictor
3
Evaluating Assumptions Y = o + 1 X 1 Y is normally distributed for each value of X Can draw histogram overall for Y – can’t likely do for each X Mean of Y changes linearly with X Scatterplot of X and Y (see if points follow a line) Plots of residuals versus X (or predicted values) Variance 2 is constant for each X Scatterplot of X and Y (see if deviations from line are same by X levels) Remember there is no assumption on distribution of X
4
Plot of SBP Versus AGE PLOT sbp*age;
5
Plot of Model Residuals Versus AGE Look for patterns. Patterns indicate relationship not linear. Note the sum of residuals = 0 PLOT residual.*age;
6
Plot of Model Residuals Versus Predicted Values Look for patterns. For simple regression this is same as previous graph (residual versus x) PLOT residual.* predicted.;
7
Evaluating Assumptions: Multiple Regression Y = o + 1 X 1 + 2 X 2 Y is normally distributed for each combination of Xs Can draw histogram overall – can’t likely do for each X Mean of Y changes linearly with each X and for every value of every other X Variance 2 is constant for each combinations of X Scatterplot of Y with each X (doesn’t really test assumption) Scatterplot of residuals versus predicted values Test for interactions
8
Interpreting Coefficients: Simple Regression Y = o + 1 X 1 o = mean of Y when X 1 = 0 1 = change in mean of Y per 1-unit increase in X 1 Suppose X 1 = 5Then Y = o + 5 1 Suppose X 1 = 6Then Y = o + 6 1 mean Y x1=6 – mean Y x1=5 = ( 0 + 6 1 ) - ( 0 + 5 1 ) = 1 Same difference for any x and x+1 chosen
9
Interpreting Coefficients: Multiple Regression Y = o + 1 X 1 + 2 X 2 o = mean of Y when X 1 = 0 and X 2 = 0 1 = change in mean of Y per 1-unit increase in X 1 for fixed X 2 Suppose X 1 = 5Then Y = o + 5 1 + X 2 2 Suppose X 1 = 6Then Y = o + 6 1 + X 2 2 mean Y x1=6 – mean Y x1=5 = ( 0 + 6 1 + X 2 2 ) - ( 0 + 5 1 + X 2 2 ) = 1 Same value for every value of X 2
10
Interpreting Relationships: Multiple Regression Y = o + 1 X 1 + 2 X 2 1 measures effect of X 1 “adjusting for X 2 ” or “above and beyond” X 2 2 measures effect of X 2 “adjusting for X 1 ” or “above and beyond” X 1 If X 1 is significantly related to Y in simple regression but not after including X 2 in the model then: 1) The relationship of Y to X 1 was confounded by X 2 2) X 1 is not an independent predictor of Y
11
Multiple Regression: R 2 Coefficient of Determination (R 2 ) is proportion of variance explained by all variables in model Adding variables to the model can only increase the R 2. Adding a highly correlated variable to a model will likely add little to R 2. Always interpret R 2 in the context of the problem –Laboratory conditions yield high R 2 –Real world yield lower R 2 but X variables may still be important
12
Categorical Predictors; 0/1 coding Compare two groups; A and B. Let X = 0 for A, X = 1 for B Y = 0 + 1 X For Group A, X= 0, mean outcome is; Y = 0 + 1 (0) = 0 For Group B, X = 1, mean outcome is; Y = 0 + 1 (1) = 0 + 1 mean Y group B - mean Y group A = ( 0 + 1 ) - 0 = 1 0 is the mean response for Group A is the difference in mean response between Group B and Group A
13
What if I use 1 and 2? Compare two groups; A and B. Let X = 1 for A, X = 2 for B Y = 0 + 1 X For Group A, X= 5, mean outcome is; Y = 0 + 1 (1) = 0 + 1 For Group B, X = 6, mean outcome is; Y = 0 + 1 (2) = 0 + 2 1 mean Y group B - mean Y group A = ( 0 + 2 1 ) – ( 0 + 1 ) = 1 0 + 1 is the mean response for Group A is the difference in mean response between Group B and Group A
14
Categorical Predictors More than two groups require more dummy (indicator) variables Choose one group as reference group Form a indicator variable for each of the other groups K groups require K-1 indicator variables
15
Example - three groups Diet 1, 2, and 3; Choose “3” as reference group (could choose any of three) Y = 0 + 1 X 1 + 2 X 2 Diet 1: X 1 = 1, X 2 = 0 Diet 2: X 1 = 0, X 2 = 1 Diet 3: X 1 = 0, X 2 = 0 0 is mean response for Diet 3 1 is difference in mean response between Diet 1 and Diet 3 2 is difference in mean response between Diet 2 and Diet 3
16
DUMMY CODING IN SAS * Assume variable diet with value 1-3; x1 = 0; x2 = 0; if diet = 1 then x1 = 1; if diet = 2 then x2 = 1; PROC REG DATA = lipid; MODEL chol = x1 x2; RUN;
17
DATA lipid; INFILE DATALINES; INPUT diet chol wt; * Assume variable diet with value 1-3; x1 = 0; x2 = 0; if diet = 1 then x1 = 1; if diet = 2 then x2 = 1; DATALINES; 1 175 140 1 180 135 1 185 145 1 190 140 1 195 155 2 190 140 2 195 135 2 200 150 2 205 155 2 210 150 3 180 140 3 185 150 3 190 155 3 195 145 3 200 150 ;
18
PROC MEANS N MEAN STD; CLASS diet; PROC REG; MODEL chol = x1 x2; RUN; PROC MEANS OUTPUT Analysis Variable : chol diet Obs N Mean Std Dev 1 5 5 185.0000000 7.9056942 2 5 5 200.0000000 7.9056942 3 5 5 190.0000000 7.9056942 PROC REG OUTPUT Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 190.00000 3.53553 53.74 <.0001 x1 1 -5.00000 5.00000 -1.00 0.3370 x2 1 10.00000 5.00000 2.00 0.0687
19
PROC REG; MODEL chol = x1 x2 ; MODEL chol = x1 x2 wt; RUN; PROC REG OUTPUT Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 190.00000 3.53553 53.74 <.0001 x1 1 -5.00000 5.00000 -1.00 0.3370 x2 1 10.00000 5.00000 2.00 0.0687 Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 84.28571 36.91070 2.28 0.0433 x1 1 -1.42857 4.13890 -0.35 0.7365 x2 1 11.42857 3.97892 2.87 0.0152 wt 1 0.71429 0.24868 2.87 0.0152
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.