Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Regression Chapter 14.

Similar presentations


Presentation on theme: "Multiple Regression Chapter 14."— Presentation transcript:

1 Multiple Regression Chapter 14

2 14 Chapter Goals When you have completed this chapter, you will be able to: 1. Understand the importance of an appropriate model specification and multiple regression analysis 2. Comprehend the nature and technique of multiple regression models and the concept of partial regression coefficients. 3. Use the estimation techniques for multiple regression models. 4. Conduct an analysis of variance of an estimated model and...

3 14 Chapter Goals 5. Explain the goodness of fit of an estimated model. 6. Draw inferences about the assumed (true) model though a joint test of hypothesis (F test) on the coefficients of all variables 7. Draw inferences about the importance of the independent variables through tests of hypothesis (t-tests) 8. Identify the problems raised, and the remedies thereof, by the presence of multicollinearity in the data sets 9. Identify the problems raised, and the remedies thereof, by the presence of outliers/influential observations in the data sets and...

4 14 Chapter Goals 10. Identify the violation of model assumptions, including linearity, homoscedasticity, autocorrelation, and normality through simple diagnosic procedures. 11. Use some simple remedial measures in the presence of violations of the model assumptions. 12. Write a research report on an investigation using multiple regression analysis 13. Comprehend the concept of partial correlations and its importance in multiple regression analysis

5 14 Chapter Goals 14. Draw inferences about the importance of a subset of the importance in multiple regression analysis 15. Use qualitative variables, as well as their interactions with other independent variables through a joint test of hypothesis 16. Apply some advanced diagnostic checks and remedies in multiple regression analysis

6 y a b x = + 1 2 Multiple Regression Analysis
For two independent variables, the general form of the multiple regression equation is: y a b x = + 1 2 x1 and x2 are the independent variables. a is the y-intercept. b1 is the net change in y for each unit change in x1 holding x2 constant. It is called …a partial regression coefficient, …a net regression coefficient, or …just a regression coefficient.

7 The least squares criterion is used to develop this equation.
Multiple Regression Analysis The general multiple regression with k independent variables is given by: y a b x = + 1 2 k . . . The least squares criterion is used to develop this equation. Because determining b1, b2, etc. is very tedious, a software package such as Excel or MINITAB is recommended.

8 Multiple Standard Error of Estimate
… is a measure of the effectiveness of the regression equation … is measured in the same units as the dependent variable … it is difficult to determine what is a large value and what is a small value of the standard error!

9 Multiple Regression and Correlation
Assumptions … the independent variables and the dependent variables have a linear relationship … the dependent variable must be continuous and at least interval-scale … the variation in (y - y) or residual must be the same for all values of y When this is the case, we say the difference exhibits homoscedasticity … the residuals should follow the normal distribution with mean of 0 … successive values of the dependent variable must be uncorrelated

10 … reports the variation in the dependent variable
The AVOVA Table … reports the variation in the dependent variable … the variation is divided into two components: a. … the Explained Variation is that accounted for by the set of independent variable b. … the Unexplained or Random Variation is not accounted for by the independent variables

11 …the matrix is useful for locating correlated independent variables.
Correlation Matrix A correlation matrix is used to show all possible simple correlation coefficients among the variables …the matrix is useful for locating correlated independent variables. … it shows how strongly each independent variable is correlated with the dependent variable.

12 Global Test ... : b H = equal s all Not : b H
The global test is used to investigate whether any of the independent variables have significant coefficients. The hypotheses are: ... : 2 1 b H k = equal s all Not : 1 b H … continued

13 n-(k+1) degrees of freedom, where n is the sample size
Global Test … continued The test statistic is the … F distribution with k (number of independent variables) and n-(k+1) degrees of freedom, where n is the sample size

14 Test for Individual Variables
This test is… used to determine which independent variables have nonzero regression coefficients … the variables that have zero regression coefficients are usually dropped from the analysis … the test statistic is the t distribution with n-(k+1) degrees of freedom.

15 A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: … total family income (Income) in $00, … size of family (Size), and … whether the family has children in college (College) and...

16 Note: … the following regarding the regression equation
… continued Note: … the following regarding the regression equation … the variable college is called a dummy or indicator variable (It can take only one of two possible outcomes, i.e. a child is a college student or not) Other examples of dummy variables include… … gender … the part is acceptable or unacceptable … the voter will or will not vote for the incumbent We usually code one value of the dummy variable as “1” and the other “0”

17 … continued Family Food Income Size Student 1 3900 376 4 2 5300 515 5
2 5300 515 5 3 4300 516 4900 468 6400 538 6 7300 626 7 543 8 437 9 6100 608 10 513 11 7400 493 12 5800 563

18 … continued Use a computer software package, such as MINITAB or Excel, to develop a correlation matrix. From the analysis provided by MINITAB, write out the regression equation: What food expenditure would you estimate for a family of 4, with no college students, and an income of $50, (which is input as 500)?

19 … continued y = 954 +1.09x1 + 748x2 + 565x3 The regression equation is
Food = Income Size Student Predictor Coef SE Coef T P Constant Income Size Student S = R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total y = x x x3

20 From the regression output we note:
… continued The regression equation is Food = Income Size Student Predictor Coef SE Coef T P Constant Income Size Student S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total From the regression output we note: The coefficient of determination is 80.4 percent. This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student

21 … continued The regression equation is Food = Income Size Student Predictor Coef SE Coef T P Constant Income Size Student S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total An additional family member will increase the amount spent per year on food by $748 A family with a college student will spend $565 more per year on food than those without a college student

22 The correlation matrix is as follows:
… continued The correlation matrix is as follows: Food Income Size Income Size Student The strongest correlation between the dependent variable (Food) and an independent variable is between family size and amount spent on food. None of the correlations among the independent variables should cause problems All are between –.70 and .70

23 The regression equation is…
… continued Find the estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student. The regression equation is… Food = Income Size Student y = (500) + 748(4) + 565(0) = $4,491

24 = b : H ¹ b one least at : H … continued
The regression equation is Food = Income Size Student Predictor Coef SE Coef T P Constant Income Size Student S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero = b : 3 2 1 H b one least at : 1 H H0 is rejected if F>4.07 …from the MINITAB output, the computed value of F is 10.94 Decision: H0 is rejected. Not all the regression coefficients are zero

25 … Using the 5% level of significance, reject H0 if the P-value<.05
… continued The regression equation is Food = Income Size Student Predictor Coef SECoef T P Constant Income Size Student Conduct an individual test to determine which coefficients are not zero (This is the hypothesis for the independent variable family size) H 2 : b = H 1 2 : b … Using the 5% level of significance, reject H0 if the P-value<.05 From the MINITAB output, the only significant variable is FAMILY (family size) using the P-values (The other variables can be omitted from the model)

26 … continued Regression Analysis:
Rerun the analysis using only the significant independent family size The regression equation is Food = Income Size Student Predictor Coef SECoef Income Size Student S=572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Regression Analysis: Food versus Size The regression equation is Food = Size Predictor Coef SECoef Constant Size S = R-Sq = 76.8% R-Sq(adj) = 74.4% …the new regression equation is: y = X 2 ...the coefficient of determination is 76.8 percent (the two independent variables are dropped) …and the R-square term was reduced by only 3.6 percent

27 Residuals should be approximately normally distributed
Analysis Residuals of A residual is the difference between the actual value of y and the predicted value y Residuals should be approximately normally distributed … histograms and stem-and-leaf charts are useful in checking this requirement … a plot of the residuals and their corresponding y values is used for showing that there are no trends or patterns in the residuals

28 Analysis Residuals of 4500 7500 6000 -500 500 1000 y Residuals Plot

29 Analysis Residuals Histograms of -600 -200 200 600 1000 8 7 6 5 4 3 2
8 7 6 5 4 3 2 1 Frequency Residuals Histograms

30 Test your learning… Click on… www.mcgrawhill.ca/college/lind
Online Learning Centre for quizzes extra content data sets searchable glossary access to Statistics Canada’s E-Stat data …and much more!

31 This completes Chapter 14


Download ppt "Multiple Regression Chapter 14."

Similar presentations


Ads by Google