Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression.

Similar presentations


Presentation on theme: "Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression."— Presentation transcript:

1 Multiple regression

2 Problem: to draw a straight line through the points that best explains the variance Regression

3 Problem: to draw a straight line through the points that best explains the variance Regression

4 Problem: to draw a straight line through the points that best explains the variance Regression

5 Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression Variance explained (change in line lengths 2 ) Variance unexplained (residual line lengths 2 )

6 Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression In regression, each x-variable will normally have 1 df

7 Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression Essentially a cost: benefit analysis – Is the benefit in variance explained worth the cost in using up degrees of freedom?

8 Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. 1.What is the R 2 ? 2.What is the F ratio? Regression example

9 Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. 1.What is the R 2 ? 2.What is the F ratio? Regression example R 2 = 150/300 = 0.5 F 1,30 = 150/1 = 30 150/30 Why is df error = 30?

10 Multiple regression Tree age Herbivore damage Higher nutrient trees Lower nutrient trees Damage= m 1 *age + b

11 Tree age Herbivore damage Tree nutrient concentration Residuals of herbivore damage

12 Tree age Herbivore damage Tree nutrient concentration Residuals of herbivore damage Damage= m 1 *age + m 2 *nutrient + b

13 Damage= m 1 *age + m 2 *nutrient + m3*age*nutrient +b No interaction (additive):Interaction (non-additive): yy

14 Non-linear regression? Just a special case of multiple regression! Y = m 1 x +m 2 x 2 +b XX 2 Y 111.1 242.0 393.6 4163.1 5255.2 6366.7 74911.3 X2X2 X1X1 Y = m 1 x 1 +m 2 x 2 +b

15 STEPWISE REGRESSION

16 811109 Jump height (how high ball can be raised off the ground) Feet off ground Total SS = 11.11

17 X variableparameterSSF 1,13 p Height+0.9439.96112<0.0001 of player

18 X variableparameterSSp Weight+0.0407.9232<0.0001 of player F 1,13

19 Why do you think weight is + correlated with jump height?

20 An idea Perhaps if we took two people of identical height, the lighter one might actually jump higher? Excess weight may reduce ability to jump high…

21 How could we test this idea?

22 lighter heavier X variableparameterSSF p Height+2.1339.956803<0.0001 Weight-0.0591.008 81<0.0001

23 Questions: Why did the parameter estimates change? Why did the F tests change?

24 Heavy people often tall (tall people often heavy) Tall people can jump higher People light for their height can jump a bit more Weight Height Jump + + -

25 The problem: The parameter estimate and significance of an x-variable is affected by the x-variables already in the model! How do we know which variables are significant, and which order to enter them in model?

26 Solutions 1) Use a logical order. For example in ANCOVA it makes sense to test the interaction first 2) Stepwise regression: “tries out” various orders of removing variables.

27 Stepwise regression Enters or removes variables in order of significance, checks after each step if the significance of other variables has changed Enters one by one: forward stepwise Enters all, removes one by one: backwards stepwise

28 Forward stepwise regression Enter the variable with the highest correlation with y-variable first (p>p enter). Next enter the variable to explains the most residual variation (p>p enter). Remove variables that become insignificant (p> p leave) due to other variables being added. And so on…


Download ppt "Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression."

Similar presentations


Ads by Google