Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENGR 610 Applied Statistics Fall 2007 - Week 12 Marshall University CITE Jack Smith.

Similar presentations


Presentation on theme: "ENGR 610 Applied Statistics Fall 2007 - Week 12 Marshall University CITE Jack Smith."— Presentation transcript:

1 ENGR 610 Applied Statistics Fall 2007 - Week 12 Marshall University CITE Jack Smith

2 Overview for Today Review Multiple Linear Regression, Ch 13 (1-5) Go over problem 13.62 Multiple Linear Regression, Ch 13 (6-11) Quadratic model Dummy-variable model Using transformations Collinearity (VIF) Modeling building Stepwise regression Best sub-set regression with C p statistic Homework assignment

3 Multiple Regression Linear model - multiple dependent variables Y i =  0 +  1 X 1i + … +  j X ji + … +  k X ki +  i X ji = value of independent variable Y i = observed value of dependent variable  0 = Y-intercept (Y at X=0)  j = slope (  Y/  X j )  i = random error for observation i Y i ’ = b 0 + b 1 X i + … + b k X ki (predicted value) The b j ’s are called the regression coefficients e i = Y i - Y i ’ (residual) Minimize  e i 2 for sample with respect to all b j j = 1,k

4 Partitioning of Variation Total variation Regression variation Random variation (Mean response) SST = SSR + SSE Coefficient of Multiple Determination R 2 Y.12..k = SSR/SST Standard Error of the Estimate

5 Adjusted R 2 To account for sample size (n) and number of dependent variables (k) for comparison purposes

6 Residual Analysis Plot residuals vs Y i ’ (predicted values) X 1, X 2,…,X k Time (for autocorrelation) Check for Patterns Outliers Non-uniform distribution about mean See Figs 12.18-19, p 597-8

7 F Test for Multiple Regression F = MSR / MSE Reject H 0 if F > F U ( ,k,n-k-1) [or p<  ] k = number of independent variables One-Way ANOVA Summary SourceDegrees of Freedom (df) Sum of Squares (SS) Mean Square (MS) (Variance) Fp-value RegressionkSSRMSR = SSR/kMSR/ MSE Errorn-k-1SSEMSE = SSE/(n-k-1) Totaln-1SST

8 Alternate F-Test Compared to F U ( ,k,n-k-1)

9 t Test for Slope H 0 :  j = 0 Critical t value based on chosen level of significance, , and n-k-1 degrees of freedom See output from PHStat

10 Confidence and Prediction Intervals Confidence Interval Estimate for the Slope Confidence Interval Estimate for the Mean and Prediction Interval Estimate for Individual Response Beyond the scope of this text

11 Partial F Tests Significance test for contribution from individual independent variable Measure of incremental improvement All others already taken into account F j = SSR(X j |{X i≠j }) / MSE SSR(X j |{X i≠j }) = SSR - SSR({X i≠j }) Reject H 0 if F j > F U ( ,1,n-k-1) [or p<  ] Note: t 2 ( ,n-k-1) = F U ( ,1,n-k-1)

12 Coefficients of Partial Determination See PHStat output in Fig 13.10, p 637

13 Quadratic Curvilinear Regression Model Y i =  0 +  1 X 1i +  2 X 1i 2 +  i Treat the X 2 term just like any other independent variable Same R 2, F tests, t tests, etc. Generally need linear term as well

14 Dummy-Variable Models Treatment of categorical variables Each possible value represented by a dummy variable with value of 0 or 1 Treat added terms like any other terms Often confounded with other variables, so model may need interaction terms Add interaction term and perform partial F test and t test for added term

15 Using Transformations Square-root Multiplicative - logY-logX model Exponential - logY model Others Higher polynomials Trigonometric functions Inverse

16 Collinearity (VIF) Test for linearly dependent variables VIF - Variance Inflationary Factor VIF j = 1/(1-R j 2 ) R j = coefficient of multiple determination of variable X j with all other X variables VIF > 5 suggests linear dependence (R 2 > 0.8) Full treatment involves analysis of correlation (covariance) matrix, such as Principle Component Analysis (PCA) To determine dimensionality and orthogonal factors Factor Analysis (FA) To determine rotated factors

17 Model Building Stepwise regression Add or delete one variable at a time Use partial F and/or t tests (p > 0.05) Best-subset regression Start with model including all variables (< n/10) Eliminate highest variables with VIF > 5 Generate all models with remaining variables (T) Select best models using R 2 and C p statistic C p = (1-R k 2 )(n-T)/(1-R T 2 ) - (n-2(k+1)) C p ≤ k+1 Evaluate each term using t test Add interaction term, transformed variables, and higher order terms based on residual analysis See flow chart in text, Fig 13.25 (p 663)

18 Homework Work and hand in Problem 13.63 Fall break (Thanksgiving) – 11/22 Review session – 11/29 (“dead” week) “Linear Regression”, Ch 12-13 Exam #3 Linear regression (Ch 12-13) Take-home Due by 12/6 Final grades due by 12/13


Download ppt "ENGR 610 Applied Statistics Fall 2007 - Week 12 Marshall University CITE Jack Smith."

Similar presentations


Ads by Google