Download presentation

Presentation is loading. Please wait.

Published byKaylyn Hucke Modified over 2 years ago

1
Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062

2
Multiple Linear Regression and Path Analysis Multiple linear regression –assumptions –parameter estimation –hypothesis tests –selecting independent variables –collinearity –polynomial regression Path analysis

3
Regression One Dependent VariableY Independent VariablesX 1,X 2,X 3,...

4
Purposes of Regression 1. Relationship between Y and X's 2. Quantitative prediction of Y 3. Relationship between Y and X controlling for C 4. Which of X's are most important? 5. Best mathematical model 6. Compare regression relationships: Y 1 on X, Y 2 on X 7. Assess interactive effects of X's

5
Simple regression: one X Multiple regression: two or more X's Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) +... + ß k X(k) + E

6
Multiple linear regression: assumptions (1) For any specific combination of X's, Y is a (univariate) random variable with a certain probability distribution having finite mean and variance (Existence) Y values are statistically independent of one another (Independence) Mean value of Y given the X's is a straight linear function of the X's (Linearity)

7
Multiple linear regression: assumptions (2) The variance of Y is the same for any fixed combinations of X's (Homoscedasticity) For any fixed combination of X's, Y has a normal distribution (Normality) There are no measurement errors in the X's (Xs measured without error)

8
Multiple linear regression: parameter estimation Y = ß 0 + ß 1 X(1) + ß 2 X(2) + ß 3 X(3) +... + ß k X(k) + E Estimate the ß 's in multiple regression using least squares Sizes of the coefficients not good indicators of importance of X variables Number of data points in multiple regression –at least one more than number of X’s –preferably 5 times number of X’s

9
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß) Log(Mass)-0.49(0.70) Log(Fat)-0.07(0.10) Log(Muscle)1.03(0.54) Log(Heart)0.42(0.22) Log(Bone)-0.07(0.30) N=39

10
Multiple linear regression: hypothesis tests Usually test: H0: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) +... + ß j ⋅ X(j) + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) +... + ß j ⋅ X(j) +... + ß k ⋅ X(k) + E F-test with k-j, n-(k-j)-1 degrees of freedom (“partial F-test”) H0: variables X(j+1),…,X(k) do not help explain variability in Y

11
Multiple linear regression: hypothesis tests e.g. Test significance of overall multiple regression H0: Y = ß 0 + E H1: Y = ß 0 + ß 1 ⋅ X(1) + ß 2 ⋅ X(2) +... + ß k ⋅ X(k) + E Test significance of –adding independent variable –deleting independent variable

12
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Multiple regression of Y [Log (CNS)] on: X’ sßSE(ß)P Log(Mass)-0.49(0.70)0.49 Log(Fat)-0.07(0.10)0.52 Log(Muscle)1.03(0.54)0.07 Log(Heart)0.42(0.22)0.06 Log(Bone)-0.07(0.30)0.83 Tests whether removal of variable reduces fit

13
Multiple linear regression: selecting independent variables Reasons for selecting a subset of independent variables (X’s): –cost (financial and other) –simplicity –improved prediction –improved explanation

14
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward) Mallow’s C(p) AIC

15
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection Mass, Bone, Heart, Muscle, Fat –forward selection based upon improvement in fit –backward selection based upon improvement in fit –Stepwise (backward/forward)

16
Multiple linear regression: selecting independent variables Partial F-test –predetermined forward selection –forward selection based upon improvement in fit –backward selection based upon improvement in fit –stepwise (backward/forward)

17
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Forward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. Constant (r 2 =0.00) –2. Constant + Muscle (r 2 =0.97) –3. Constant + Muscle + Heart (r 2 =0.97) –4. Constant + Muscle + Heart + Mass (r 2 =0.97) -0.18 - 0.82xMass +1.24xMuscle + 0.39xHeart

18
Why do Large Animals have Large Brains? (Schoenemann Brain Behav. Evol. 2004) Complete model (r 2 =0.97): Backward stepwise (α-to-enter=0.15; α-to-remove=0.15): –1. All (r 2 =0.97) –2. Remove Bone (r 2 =0.97) –3. Remove Fat (r 2 =0.97) -0.18 - 0.82xMass +1.24xMuscle + 0.39xHeart

19
Comparing models Mallow’s C(p) –C(p) = (k-p).F(p) + (2p-k+1) k parameters in full model; p parameters in restricted model F(p) is the F value comparing the fit of the restricted model with that of the full model –Lowest C(p) is best model Akaike Information Criteria (AIC) –AIC=n.Log(σ 2 ) +2p –Lowest AIC indicates best model –Can compare models not included in one another

20
Comparing models

21
Collinearity If two (or more) X’s are linearly related: –they are collinear –the regression problem is indeterminate X(3)=5.X(2)+16, or X(2)=4.X(1)+ 16.X(4) If they are nearly linearly related (near collinearity), coefficients and tests are very inaccurate

22
What to do about collinearity? Centering (mean = 0) Scaling (SD =1) Regression on first few Principal Components Ridge Regression

23
Curvilinear (Polynomial) Regression Y = ß 0 + ß 1 ⋅ X + ß 2 ⋅ X² + ß 3 ⋅ X 3 +... + ß k ⋅ X k + E Used to fit fairly complex curves to data ß’s estimated using least squares Use sequential partial F-tests, or AIC, to find how many terms to use –k>3 is rare in biology Better to transform data and use simple linear regression, when possible

24
Curvilinear (Polynomial) Regression Y=0.066 + 0.00727.X Y=0.117 + 0.00085.X + 0.00009.X² Y=0.201 - 0.01371.X + 0.00061.X² - 0.000005.X 3 From Sokal and Rohlf

25
Path Analysis

26
Models with causal structure Represented by path diagram All variables quantitative All path relationships assumed linear –(transformations may help) A B C D E

27
Path Analysis All paths one way –A => C –C => A No loops Some variables may not be directly observed: –residual variables (U) Some variables not observed but known to exist –latent variables (D) A B C D E U

28
Path Analysis Path coefficients and other statistics calculated using multiple regressions Variables are: –centered (mean = 0) so no constants in regressions –often standardized (SD = 1) So: path coefficients usually between -1 and +1 Paths with coefficients not significantly different from zero may be eliminated A B C D E U

29
Path Analysis: an example Isaak and Hubert. 2001. “Production of stream habitat gradients by montane watersheds: hypothesis tests based on spatially explicit path analyses” Can. J. Fish. Aquat. Sci.

30
- - - Predicted negative interaction ________ Predicted positive interaction

Similar presentations

OK

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on kingdom monera characteristics Ppt on economy of india Ppt on asymptotic notation of algorithms to solve Training ppt on soft skills Free download ppt on food security in india Ppt on transportation in human beings higher Ppt on ozone layer Ppt on recycling of wastewater effluent Ppt on job evaluation techniques Ppt on combination of resistances eve