Summary of the Statistics used in Multiple Regression.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Topic 12: Multiple Linear Regression
Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Linear regression models
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
The General Linear Model. The Simple Linear Model Linear Regression.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Classical Regression III
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
PSY 307 – Statistics for the Behavioral Sciences
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Introduction to Regression Analysis, Chapter 13,
Variance and covariance Sums of squares General linear models.
Lecture 5 Correlation and Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Example of Simple and Multiple Regression
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
EQT 272 PROBABILITY AND STATISTICS
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
Linear Regression Hypothesis testing and Estimation.
PSY 307 – Statistics for the Behavioral Sciences Chapter 16 – One-Factor Analysis of Variance (ANOVA)
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Fitting Equations to Data. A Common situation: Suppose that we have a single dependent variable Y (continuous numerical) and one or several independent.
1Spring 02 First Derivatives x y x y x y dy/dx = 0 dy/dx > 0dy/dx < 0.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Simple Linear Regression ANOVA for regression (10.2)
Chapter 13 Multiple Regression
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Hypothesis testing and Estimation
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
The p-value approach to Hypothesis Testing
Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Comparing k Populations Means – One way Analysis of Variance (ANOVA)
Stats Methods at IC Lecture 3: Regression.
Chapter 14 Introduction to Multiple Regression
CHAPTER 7 Linear Correlation & Regression Methods
Multiple Regression Analysis and Model Building
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Prepared by Lee Revere and John Large
Hypothesis testing and Estimation
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Presentation transcript:

Summary of the Statistics used in Multiple Regression

The Least Squares Estimates: - The values that minimize

The Analysis of Variance Table Entries a) Adjusted Total Sum of Squares (SS Total ) b) Residual Sum of Squares (SS Error ) c) Regression Sum of Squares (SS Reg ) Note: i.e. SS Total = SS Reg +SS Error

The Analysis of Variance Table SourceSum of Squaresd.f.Mean SquareF RegressionSS Reg pSS Reg /p = MS Reg MS Reg /s 2 ErrorSS Error n-p-1SS Error /(n-p-1) =MS Error = s 2 TotalSS Total n-1

Uses: 1.To estimate  2 (the error variance). - Use s 2 = MSError to estimate  2. 2.To test the Hypothesis H 0 :  1 =  1 =  2 =...  =  p = 0. Use the test statistic F = MS Reg / s 2 = [(1/p)SS Reg ]/[(1/(n-p-1))SS Error ]. - Reject H 0 if F > F a (p,n-p-1).

3.To compute other statistics that are useful in describing the relationship between Y (the dependent variable) and X 1, X 2,...,X p (the independent variables). a)R 2 = the coefficient of determination = SS Reg /SS Total = = the proportion of variance in Y explained by X 1, X2,...,X p 1 - R 2 = the proportion of variance in Y that is left unexplained by X 1, X2,..., X p = SSError/SSTotal.

b)R a 2 = "R 2 adjusted" for degrees of freedom. = 1 -[the proportion of variance in Y that is left unexplained by X 1, X 2,..., X p adjusted for d.f.] = 1 - [(1/(n-p-1))SS Error ]/[(1/(n-1))SS Total ]. = 1 - [(n-1)SS Error ]/[(n-p-1)SS Total ]. = 1 - [(n-1)/(n-p-1)] [1 - R 2 ].

c) R=  R 2 = the Multiple correlation coefficient of Y with X 1, X 2,...,X p = = the maximum correlation between Y and a linear combination of X 1, X 2,...,X p Comment: The statistics F, R 2, R a 2 and R are equivalent statistics.

Properties of the Least Squares Estimators: 1.Normally distributed ( If there error terms are Normally distributed) 2.Unbiased Estimators of the Linear Parameters  0,  1,  2,...  p. 3.Minimum Variance (Minimum Standard Error) of all Unbiased Estimators of the Linear Parameters  0,  1,  2,...  p.

Comments: 1.The Error Variance s 2 (and s). 2.s X i, the standard deviation of X i (the i th independent variable). 3.The sample size n. 4.The correlations between all pairs of variables.

decreases as s decreases. decreases as s X i increases. decreases as n increases. increases as the correlation between pairs of independent variables increases. –In fact the standard error of the least squares estimates can be extremely high if there is a high correlation between one of the independent variables and a linear combination of the remaining independent variables. (the problem of Multicollinearity). The standard error of  ˆ i, S.E.   ˆ i   s  ˆ i

The Covariance Matrix,Correlation and X T X inverse matrix The Covariance Matrix where and

The Correlation Matrix

The X T X inverse matrix

If we multiply each entry in the X T X inverse matrix by s 2 = MS Error this matrix turns into the covariance matrix for :

These matrices can be used to compute standard Errors for linear combinations of the regression coefficients Namely

An Example Suppose one is interested in how the cost per month (Y) of heating a plant is determined the average atmospheric temperature in the Month (X 1 ) and the number of operating days in the month (X 2 ). The data on these variables was collected for n = 25 months selected at random and is given on the following page. Y = cost per month of heating a plant X 1 = average atmospheric temperature in the month X 2 = the number of operating days for the plant in the month.

The Least Squares Estimates: ConstantX1X1 X2X2 Estimate Standard Error The Covariance Matrix ConstantX1X1 X2X X1X X2X The Correlation Matrix ConstantX1X1 X2X X1X X2X The X T X Inverse matrix ConstantX1X1 X2X X1X x x10 -3 X2X

The Analysis of Variance Table SourcedfSSMSF Regression Error Total

Summary Statistics (R 2, R adjusted 2 = R a 2 and R) R 2 = / =.8491 (explained variance in Y %) R a 2 = 1 - [1 - R 2 ][(n-1)/(n-p-1)] = 1 - [ ][24/22] =.8354 (83.54 %) R = =.9215 = Multiple correlation coefficient

Three-dimensional Scatter-plot of Cost, Temp and Days.

Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.

Select Analysis->Regression->Linear

To print the correlation matrix or the covariance matrix of the estimates select Statistics

Check the box for the covariance matrix of the estimates.

Here is the table giving the estimates and their standard errors.

Here is the table giving the correlation matrix and covariance matrix of the regression estimates: What is missing in SPSS is covariances and correlations with the intercept estimate (constant).

This can be found by using the following trick 1.Introduce a new variable (called constnt) 2.The new “variable” takes on the value 1 for all cases

Select Transform->Compute

The following dialogue box appears Type in the name of the target variable - constnt Type in ‘1’ for the Numeric Expression

This variable is now added to the data file

Add this new variable (constnt) to the list of independent variables

Under Options make sure the box – Include constant in equation – is unchecked The coefficient of the new variable will be the constant.

Here are the estimates of the parameters with their standard errors Note the agreement with parameter estimates and their standard errors as previously calculated.

Here is the correlation matrix and the covariance matrix of the estimates.

Testing for Hypotheses related to Multiple Regression. The General Linear Hypothesis H 0 :h 11  1 + h 12  2 + h 13  h 1p  p = h 1 h 21  1 + h 22  2 + h 23  h 2p  p = h 2... h q1  1 + h q2  2 + h q3  h qp  p = h q where h 11  h 12, h 13,..., h qp and h 1  h 2, h 3,..., h q are known coefficients.

Examples 1.H 0 :  1 = 0 2.H 0 :  1 = 0,  2 = 0,  3 = 0 3.H 0 :  1 =  2 4.H 0 :  1 =  2,  3 =  4 5.H 0 :  1 = 1/2(  2 +  3 ) 6.H 0 :  1 = 1/2(  2 +  3 ),  3 = 1/3(  4 +  5 +  6 )

The Complete Model Y =  0 +  1 X 1 +  2 X 2 +  3 X  p X p +  The Reduced Model The model implied by H 0. You are interested in knowing whether the complete model can be simplified to the reduced model.

Testing the General Linear Hypothesis The F-test for H 0 is performed by carrying out two runs of a multiple regression package.

Run 1: Fit the complete model. Resulting in the following Anova Table: SourcedfSum of Squares RegressionpSS Reg Residual (Error)n-p-1SS Error Totaln-1SS Total

Run 2: Fit the reduced model (q parameters eliminated) Resulting in the following Anova Table: SourcedfSum of Squares Regressionp-qSS 1 Reg Residual (Error)n-p+q-1SS 1 Error Totaln-1SS Total

The Test: The Test is carried out using the Test Statistic where SS H 0 = SS 1 Error - SS Error = SS Reg - SS 1 Reg and s 2 = SS Error /(n-p-1). The test statistic, F, has an F-distribution with 1 = q d.f. in the numerator and 2 = n – p - 1 d.f. in the denominator if H 0 is true.

Distribution when H 0 is true

The Critical Region Reject H 0 if F > F  (q, n – p – 1) F  (q, n – p – 1)

The Anova Table for the Test: SourcedfSum of SquaresMean SquareF Regressionp-qSS 1 Reg [1/(p-q)]SS 1 Reg MS 1 Reg /s 2 (for the reduced model) DepartureqSS H0 (1/q)SS H0 MS H0 /s 2 from H 0 Residual n-p-1SS Error s 2 (Error) Totaln-1SS Total

Some Examples: Four independent Variables X 1, X 2, X 3, X 4 The Complete Model Y =  0 +  1 X 1 +  2 X 2 +  3 X 3 +  4 X 4 + 

1)a)H 0 :  3 = 0,  4 = 0 (q = 2) b)The Reduced Model: Y =  0 +  1 X 1 +  2 X 2 +  Dependent Variable:Y Independent Variables: X 1, X 2

2)a)H 0 :  3 = 4.5,  4 = 8.0 (q = 2) b)The Reduced Model: Y – 4.5X 3 – 8.0X 4 =  0 +  1 X 1 +  2 X 2 +  Dependent Variable:Y – 4.5X 3 – 8.0X 4 Independent Variables: X 1, X 2

Example Motor Vehicle example Variables 1.(Y) mpg – Mileage 2.(X 1 ) engine – Engine size. 3.(X 2 ) horse – Horsepower. 4.(X 3 ) weight – Weight.

Suppose we want to test: H 0 :  1 = 0 against H A :  1 ≠ 0 i.e. engine size(engine) has no effect on mileage(mpg). The Full model: Y =  0 +  1 X 1 +  2 X 2 +  1 X 3 +  (mpg) (engine)(horse) (weight) The reduced model: Y =  0 +  2 X 2 +  1 X 3 + 

The ANOVA Table for the Full model:

The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:

The ANOVA Table for testing H 0 :  1 = 0 against H A :  1 ≠ 0

Now suppose we want to test: H 0 :  1 = 0,  2 = 0 against H A :  1 ≠ 0 or  2 ≠ 0 i.e. engine size (engine) and horsepower (horse) have no effect on mileage (mpg). The Full model: Y =  0 +  1 X 1 +  2 X 2 +  1 X 3 +  (mpg) (engine)(horse) (weight) The reduced model: Y =  0 +  1 X 3 + 

The ANOVA Table for the Full model

The reduction in the residual sum of squares = = The ANOVA Table for the Reduced model:

The ANOVA Table for testing H 0 :  1 = 0,  2 = 0 against H A :  1 ≠ 0 or  2 ≠ 0