Download presentation

Presentation is loading. Please wait.

Published byKaylyn Hucke Modified about 1 year ago

1
Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062

2
Multiple Linear Regression and Path Analysis Multiple linear regression –assumptions –parameter estimation –hypothesis tests –selecting independent variables –collinearity –polynomial regression Path analysis

3
Regression One Dependent VariableY Independent VariablesX 1,X 2,X 3,...

4
Purposes of Regression 1. Relationship between Y and X's 2. Quantitative prediction of Y 3. Relationship between Y and X controlling for C 4. Which of X's are most important? 5. Best mathematical model 6. Compare regression relationships: Y 1 on X, Y 2 on X 7. Assess interactive effects of X's

5
Simple regression: one X Multiple regression: two or more X's Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) ß k X(k) + E

6
Multiple linear regression: assumptions (1) For any specific combination of X's, Y is a (univariate) random variable with a certain probability distribution having finite mean and variance (Existence) Y values are statistically independent of one another (Independence) Mean value of Y given the X's is a straight linear function of the X's (Linearity)

7
Multiple linear regression: assumptions (2) The variance of Y is the same for any fixed combinations of X's (Homoscedasticity) For any fixed combination of X's, Y has a normal distribution (Normality) There are no measurement errors in the X's (Xs measured without error)

8
Multiple linear regression: parameter estimation Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) ß k X(k) + E Estimate the ß 's in multiple regression using least squares Sizes of the coefficients not good indicators of importance of X variables Number of data points in multiple regression –at least one more than number of X’s –preferably 5 times number of X’s

9
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß) Log(Mass)-0.49(0.70) Log(Fat)-0.07(0.10) Log(Muscle)1.03(0.54) Log(Heart)0.42(0.22) Log(Bone)-0.07(0.30) N=39

10
Multiple linear regression: hypothesis tests Usually test: H0: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß j ⋅ X(j) + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß j ⋅ X(j) ß k ⋅ X(k) + E F-test with k-j, n-(k-j)-1 degrees of freedom (“partial F-test”) H0: variables X(j+1),…,X(k) do not help explain variability in Y

11
Multiple linear regression: hypothesis tests e.g. Test significance of overall multiple regression H0: Y = ß 0 + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) ß k ⋅ X(k) + E Test significance of –adding independent variable –deleting independent variable

12
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß)P Log(Mass)-0.49(0.70)0.49 Log(Fat)-0.07(0.10)0.52 Log(Muscle)1.03(0.54)0.07 Log(Heart)0.42(0.22)0.06 Log(Bone)-0.07(0.30)0.83 Tests whether removal of variable reduces fit

13
Multiple linear regression: selecting independent variables Reasons for selecting a subset of independent variables (X’s): –cost (financial and other) –simplicity –improved prediction –improved explanation

14
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward) Mallow’s C(p) AIC

15
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection Mass, Bone, Heart, Muscle, Fat –forward selection based upon improvement in fit –backward selection based upon improvement in fit –Stepwise (backward/forward)

16
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward)

17
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Forward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. Constant (r 2 =0.00) –2. Constant + Muscle (r 2 =0.97) –3. Constant + Muscle + Heart (r 2 =0.97) –4. Constant + Muscle + Heart + Mass (r 2 =0.97) xMass +1.24xMuscle xHeart

18
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Backward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. All (r 2 =0.97) –2. Remove Bone (r 2 =0.97) –3. Remove Fat (r 2 =0.97) xMass +1.24xMuscle xHeart

19
Comparing models Mallow’s C(p) –C(p) = (k-p).F(p) + (2p-k+1) k parameters in full model; p parameters in restricted model F(p) is the F value comparing the fit of the restricted model with that of the full model –Lowest C(p) is best model Akaike Information Criteria (AIC) –AIC=n.Log(σ 2 ) +2p –Lowest AIC indicates best model –Can compare models not included in one another

20
Comparing models

21
Collinearity If two (or more) X’s are linearly related: –they are collinear –the regression problem is indeterminate X(3)=5.X(2)+16, or X(2)=4.X(1)+ 16.X(4) If they are nearly linearly related (near collinearity), coefficients and tests are very inaccurate

22
What to do about collinearity? Centering (mean = 0) Scaling (SD =1) Regression on first few Principal Components Ridge Regression

23
Curvilinear (Polynomial) Regression Y = ß 0 + ß 1 ⋅ X + ß 2 ⋅ X² + ß 3 ⋅ X ß k ⋅ X k + E Used to fit fairly complex curves to data ß’s estimated using least squares Use sequential partial F-tests, or AIC, to find how many terms to use –k>3 is rare in biology Better to transform data and use simple linear regression, when possible

24
Curvilinear (Polynomial) Regression Y= X Y= X X² Y= X X² X 3 From Sokal and Rohlf

25
Path Analysis

26
Models with causal structure Represented by path diagram All variables quantitative All path relationships assumed linear –(transformations may help) A B C D E

27
Path Analysis All paths one way –A => C –C => A No loops Some variables may not be directly observed: –residual variables (U) Some variables not observed but known to exist –latent variables (D) A B C D E U

28
Path Analysis Path coefficients and other statistics calculated using multiple regressions Variables are: –centered (mean = 0) so no constants in regressions –often standardized (SD = 1) So: path coefficients usually between -1 and +1 Paths with coefficients not significantly different from zero may be eliminated A B C D E U

29
Path Analysis: an example Isaak and Hubert “Production of stream habitat gradients by montane watersheds: hypothesis tests based on spatially explicit path analyses” Can. J. Fish. Aquat. Sci.

30
- - - Predicted negative interaction ________ Predicted positive interaction

31

32

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google