Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.

Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A

What are we doing?

12-1 Multiple Linear Regression Models Many applications of regression analysis involve situations in which there are more than one regressor variable. A regression model that contains more than one regressor variable is called a multiple regression model.

12-1.1 Introduction For example, suppose that the effective life of a cutting tool depends on the cutting speed and the tool angle. A possible multiple regression model could be where Y – tool life x 1 – cutting speed x 2 – tool angle

The Model Y =   +   X 1 +   X 2 + … +  k X k +  More than one regressor or predictor variable. Linear in the unknown parameters – the  ’s. The   - intercept,  i - partial regression coefficients,  – errors. Can handle nonlinear functions as predictors, e.g. X 3 = Z 2. Interactions can be present, e.g.   X 1 X 2.

The Data The data collection step in a regression analysis

A Data Example Example – Oakland games won: 13 =   +   *2285 +    45.3 +   *1903 +   Similar equation for every data point. More equations than beta’s

Least Squares Estimation of the Parameters The least squares function is given by The least squares estimates must satisfy

The least squares normal Equations The solution to the normal Equations are the least squares estimators of the regression coefficients.

The Matrix Approach where Vector of predicted values Our observations – the predictor variables Unknown vector of error terms – possibly normally distributed The vector of coefficients we must estimate.

Solving those normal equations

Least-Squares in Matrix Form

More Matrix Approach

Example 12-2 Wire bonding is a method of making interconnections between a microchip and other electronics as part of semiconductor device fabrication.

Example 12-2

Some Basic Terms and Concepts Residuals are estimators of the error term in the regression model: We use an unbiased estimator of the variance of the error term. SS E is called the residual sums of squares and n-p is the residual degrees of freedom. ‘residual’ – what remains after the regression explains all of the variability in the data it can.

Estimating  2 An unbiased estimator of  2 is

Properties of the Least Squares Estimators Note that in this treatment, the elements of X are not random variables. They are the observed values of the x ij. We treat them as though they are constants, often coefficients of random variables like the  i. The first result says that the estimators are unbiased. The second result shows the covariance structure of the estimators – diagonal and off-diagonal elements It is important to remember that in a typical multiple regression model the estimates of the coefficients are not independent of one another.

Properties of the Least Squares Estimators Unbiased estimators: Covariance Matrix:

Covariance Matrix of the Regression Coefficients In general, we do not know  2. We estimate it by the mean square error of the residuals (estimated standard error) the quality of our estimates of the regression coefficients is very much related to (X’X) -1. the estimates of the coefficients are not independent

Test for Significance of Regression The appropriate hypotheses are The test statistic is

ANOVA The basic idea is that the data (the y i values) has some variability – if it didn’t there would be nothing to explain. A successful model explains most of the variability, leaving little to be carried by the error term.

R2R2 The coefficient of multiple determination

The Adjusted R 2 The adjusted R 2 statistic penalizes the analyst for adding terms to the model. It can help guard against overfitting (including regressors that are not really useful)

Tests on Individual Regression Coefficients and Subsets of Coefficients The test statistic is Reject H 0 if |t 0 | > t  /2,n-p. This is called a partial or marginal test H 0 :  j =  j0 H 1 :  j =  j0

Linear Independence of the Predictors - some random thoughts Instabilities in regression coefficients will occur where the values of one of the predictors are ‘nearly’ a linear combination of other predictors. It would be incredibly unlikely that you would get an exact linear dependence. Coming close is bad enough. What is the dimension of the space you are working in? It is n, where n is the number of data points in your sample. The prediction you are trying to match is an n dimensional vector. You are trying to match it with a set of k (k << n) predictors. The predictors had better be related to the prediction if this is going to be successful!

Interactions and Higher Order Terms – still thinking randomly Including interaction terms (products of two predictors), higher order terms, or functions of predictors does not make the model nonlinear. Suppose you believe that the following relation may apply: Y =  0 +  1 X 1 +  22 X 2 X 2 +  23 X 2 X 3 +  4 exp(X 4 ) +  This is still a linear regression model – linear in the beta’s. After recording the values of X 1 through X 4, you simply calculate the values of the predictors into the columns of the worksheet for the regression software. The model would become nonlinear if you were trying to estimate a parameter inside of the exponential function, e.g.  4 exp(  4e X 4 ).

The NFL Again – problem 12-15 Predictor variables Att pass attempts Comp – completed passes Pct Comp = percent completed passes Yds – yards gained passing Yds per Att – yards gained per pass attempt Pct TD = percent of attempts that are TDs Long – longest pass completion Int – number of interceptions Pct Int – percentage of attempts that are interceptions Response Variable – quarterback rating

The NFL Again – problem 12-15

Fit a multiple regression model using Pct Comp, Pct TD, and the Pct Int Estimate  2 Determine the standard errors of the regression coefficients Predict the rating when Pct Comp = 60%, Pct TD is 4%, and the Pct Int = 3%

Now the solutions

More NFL – problem 12-31 Test the regression model for significance using  =.05 Find the p-value conduct a t-test on each regression coefficient These are very good problems to answer.

Again with the answers

Even more answers

Next Time Confidence Intervals, again Modeling and Model Adequacy Also, Doing it with Computers Computers are good.

Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.

Similar presentations

Presentation on theme: "Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.

Similar presentations

Presentation on theme: "Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A."— Presentation transcript:

Similar presentations

About project

Feedback