Lecture 6 Notes Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Published byModified over 5 years ago
Presentation on theme: "Lecture 6 Notes Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from."— Presentation transcript:
Lecture 6 Notes Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis (Chapter 4.2) In multiple regression analysis, we consider more than one independent variable x 1,…,x K. We are interested in the conditional mean of y given x 1,…,x K.
Automobile Example A team charged with designing a new automobile is concerned about the gas mileage that can be achieved. The design team is interested in two things: (1) Which characteristics of the design are likely to affect mileage? (2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage. The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.
Best Single Predictor To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate. If we use simple linear regression with each of the four independent variables, which provides the best predictions?
Best Single Predictor Answer: The simple linear regression that has the highest R 2 gives the best predictions because recall that Weight gives the best predictions of GPM1000Hwy based on simple linear regression. But we can obtain better predictions by using more than one of the independent variables.
Multiple Linear Regression Model Assumptions about : –The expected value of the disturbances is zero for each, –The variance of each is equal to,i.e., –The are normally distributed. –The are independent.
Point Estimates for Multiple Linear Regression Model We use the same least squares procedure as for simple linear regression. Our estimates of are the coefficients that minimize the sum of squared prediction errors: Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.
Root Mean Square Error Estimate of : = Root Mean Square Error in JMP For simple linear regression of GP1000MHWY on Weight,. For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,
Residuals and Root Mean Square Errors Residual for observation i = prediction error for observation i = Root mean square error = Typical size of absolute value of prediction error As with simple linear regression model, if multiple linear regression model holds –About 95% of the observations will be within two RMSEs of their predicted value For car data, about 95% of the time, the actual GP1000M will be within 2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.
Inferences about Regression Coefficients Confidence intervals: confidence interval for : Degrees of freedom for t equals n-(K+1). Standard error of,, found on JMP output. Hypothesis Test: Decision rule for test: Reject H 0 if or where p-value for testing is printed in JMP output under Prob>|t|.
Inference Examples Find a 95% confidence interval for ? Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.
Partial Slopes vs. Marginal Slopes Multiple Linear Regression Model: The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed. A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.
Partial Slopes vs. Marginal Slopes Example In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years. Data is in rainfall.JMP.
Higher rainfall is associated with lower temperature.
Rainfall is estimated to be beneficial once temperature is held fixed. Multiple regression provides a better picture of the benefits of an irrigation scheme because temperature would be held fixed in an irrigation scheme.