Download presentation

Presentation is loading. Please wait.

Published byAspen Bateman Modified over 2 years ago

1
Lecture 6 Multiple Regression Analysis

2
Lecture 6 Objectives: 1.Explain and conduct multiple regression analysis in SPSS; 2.Interpret a multiple regression model; and 3.Check the assumptions and conditions of a multiple regression model

3
For simple regression, the predicted value depends on only one predictor variable: For multiple regression, we write the regression model with more predictor variables: The Multiple Regression Model

4
The variation in Bedrooms accounts for only 21% of the variation in Price. Perhaps the inclusion of another factor can account for a portion of the remaining variation. Simple Regression Example

5
Multiple Regression: Include Living Area as a predictor in the regression model. Now the model accounts for 58% of the variation in Price. Multiple Regression Example

6
NOTE: The meaning of the coefficients in multiple regression can be subtly different than in simple regression. Price = 28986.10 – 7483.10*Bedrooms + 93.84*Living Area Price drops with increasing bedrooms? How can this be correct? Multiple Regression Coefficients

7
In a multiple regression, each coefficient takes into account all the other predictor(s) in the model. For houses with similar sized Living Areas: more bedrooms means smaller bedrooms and/or smaller common living space. Cramped rooms may decrease the value of a house. Multiple Regression Coefficients

8
So, whats the correct answer to the question: Do more bedrooms tend to increase or decrease the price of a home? Correct answer: increase if Bedrooms is the only predictor (more bedrooms may mean bigger house, after all!) decrease if Bedrooms increases for fixed Living Area (more bedrooms may mean smaller, more-cramped rooms) Multiple Regression Coefficients Multiple regression coefficients must be interpreted in terms of the other predictors in the model!

9
Ticket Prices On a typical night about 15,000 people attend a Concert at Newcastle Entertainment Centre, paying an average price of more than $75 per ticket. Data for most weeks of 2009-20011 consider the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts ($million). Consider the regression model for these variables. Dependent variable is: Receipts($M) R squared = 99.9% R squared (adjusted) = 99.9% s = 0.0931 with 74 degrees of freedom Source Sum of Squares df Mean Square F-ratio P-value Regression 484.789 3 161.596 18634 < 0.0001 Residual 0.641736 74 0.008672 Example

10
Ticket Prices Write the regression model for these variables. Interpret the coefficient of Paid Attendance. Estimate receipts when paid attendance was 200,000 customer attending 30 shows at an average ticket price of $70. Is this likely to be a good prediction? Why or why not? Variable Coeff SE(Coeff) t-ratio P-value Intercept –18.320 0.3127 –58.6 0.0001 Paid Attend 0.076 0.0006 126.7 0.0001 # Shows 0.0070 0.0044 1.6 0.116 Average 0.24 0.0039 61.5 0.0001 Ticket Price Example

11
Ticket Prices Write the regression model for these variables. Interpret the coefficient of Paid Attendance. If the number of shows and ticket price are fixed, an increase of 1000 customers generates an average increase of $76,000 in receipts. Estimate receipts when paid attendance was 200,000 customer attending 30 shows at an average ticket price of $70. $13.89 million Is this likely to be a good prediction? Yes, R 2 (adjusted) is 99.9% so this model explains most of the variability in Receipts. Example

12
Linearity Assumption Linearity Condition: Check each of the predictors. Home Prices Example: Linearity Condition is well- satisfied for both Bedrooms and Living Area. Assumptions and Conditions

13
Linearity Assumption Linearity Condition: Also check the residual plot. Home Prices Example: Linearity Condition is well-satisfied. Assumptions and Conditions

14
Independence Assumption As usual, there is no way to be sure the assumption is satisfied. But, think about how the data were collected to decide if the assumption is reasonable. Randomization Condition: Does the data collection method introduce any bias? Assumptions and Conditions

15
Equal Variance Assumption Equal Spread Condition: The variability of the errors should be about the same for each predictor. Use scatterplots to assess the Equal Spread Condition. Residuals vs. Predicted Values: Home Prices Assumptions and Conditions

16
Normality Assumption Nearly Normal Condition: Check to see if the distribution of residuals is unimodal and symmetric. Home Price Example: The tails of the distribution appear to be non-normal. Assumptions and Conditions

17
Summary of Multiple Regression Model and Condition Checks: 1.Check Linearity Condition with a scatterplot for each predictor. If necessary, consider data re-expression. 2.If the Linearity Condition is satisfied, fit a multiple regression model to the data. 3.Find the residuals and predicted values. 4.Inspect a scatterplot of the residuals against the predicted values. Check for nonlinearity and non-uniform variation. Assumptions and Conditions

18
Summary of Multiple Regression Model and Condition Checks: 5.Think about how the data were collected. Do you expect the data to be independent? Was suitable randomization utilized? Are the data representative of a clearly identifiable population? Is autocorrelation an issue? Assumptions and Conditions

19
Summary of Multiple Regression Model and Condition Checks: 6.If the conditions check, feel free to interpret the regression model and use it for prediction. 7.Check the Nearly Normal Condition by inspecting a residual distribution histogram and a Normal plot. If the sample size is large, the Normality is less important for inference. Watch for skewness and outliers. Assumptions and Conditions

20
There are several hypothesis tests in multiple regression Each is concerned with whether the underlying parameters (slopes and intercept) are actually zero. The hypothesis for slope coefficients: Test the hypothesis with an F-test (a generalization of the t- test to more than one predictor). Testing the Model

21
The F-distribution has two degrees of freedom: k, where k is the number of predictors n – k – 1, where n is the number of observations The F-test is one-sided – bigger F-values mean smaller P-values. If the null hypothesis is true, then F will be near 1. Testing the Model

22
If a multiple regression F-test leads to a rejection of the null hypothesis, then check the t-test statistic for each coefficient: Note that the degrees of freedom for the t-test is n – k – 1. Confidence interval: Testing the Model

23
Tricky Parts of the t-tests: SEs are harder to compute (let technology do it!) The meaning of a coefficient depends on the other predictors in the model (as we saw in the Home Price example). If we fail to reject based on its t-test, it does not mean that x j has no linear relationship to y. Rather, it means that x j contributes nothing to modeling y after allowing for the other predictors. Testing the Model

24
In Multiple Regression, it looks like each tells us the effect of its associated predictor, x j. BUT The coefficient can be different from zero even when there is no correlation between y and x j. It is even possible that the multiple regression slope changes sign when a new variable enters the regression. Testing the Model

25
More Ticket Prices On a typical night about 15,000 people attend a Concert at Newcastle Entertainment Centre, paying an average price of more than $75 per ticket. Data for most weeks of 2009-20011 consider the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts($million). State hypothesis, the test statistic and p-value, and draw a conclusion for an F-test for the overall model. Dependent variable is: Receipts($M) R squared = 99.9% R squared (adjusted) = 99.9% s = 0.0931 with 74 degrees of freedom Source Sum of Squares df Mean Square F-ratio P-value Regression 484.789 3 161.596 18634 < 0.0001 Residual 0.641736 74 0.008672 Example

26
More Ticket Prices State hypothesis for an F-test for the overall model. State the test statistic and p-value. The F-statistic is the F-ratio = 18634. The p-value is < 0.0001. Draw a conclusion. The p-value is small, so reject the null hypothesis. At least one of the predictors accounts for enough variation in y to be useful. Example

27
More Ticket Prices Since the F-ratio suggests that at least one variable is a useful predictor, determine which of the following variables contribute in the presence of the others. Recall the variables Paid Attendance (thousands), # shows, Average Ticket Price ($) to predict Receipts($million). Variable Coeff SE(Coeff) t-ratio P-value Intercept 18.320 0.3127 58.6 0.0001 Paid Attend 0.076 0.0006 126.7 0.0001 # Shows 0.0070 0.0044 1.6 0.116 Average 0.24 0.0039 61.5 0.0001 Ticket Price Example

28
More Ticket Prices Since the F-ratio suggests that at least one variable is a useful predictor, determine which of the following variables contribute in the presence of the others. Paid Attendance (p = 0.0001) and Average Ticket Price (p = 0.0001) both contribute, even when all other variables are in the model. # Shows however, is not significant (p = 0.116) and should be removed from the model. Variable Coeff SE(Coeff) t-ratio P-value Intercept 18.320 0.3127 58.6 0.0001 Paid Attend 0.076 0.0006 126.7 0.0001 # Shows 0.0070 0.0044 1.6 0.116 Average 0.24 0.0039 61.5 0.0001 Ticket Price Example

29
R 2 in Multiple Regression: R 2 = fraction of the total variation in y accounted for by the model (all the predictor variables included) Adding new predictor variables to a model never decreases R 2 and may increase it. But each added variable increases the model complexity, which may not be desirable. Adjusted R 2 imposes a penalty on the correlation strength of larger models, depreciating their R 2 values to account for an undesired increase in complexity. Example Adjusted R 2 permits a more equitable comparison between models of different sizes.

30
Multiple Regression in SPSS Words Analyze Regression Linear Select the Dependent Variable - use the > button to move into the Dependent: box Select the Independent Variables - use the > button to move into the Independent(s): box Click Statistics Select Descriptives

31
1. 3. 2. M ULTIPLE R EGRESSION IN SPSS V ISUALS

32
Use the > button to move variable into the Dependent: box Click Statistics 6. 5. 7.4.Select Variables Use the > button to move variables into the Independent(s): box Select Descriptives8. M ULTIPLE R EGRESSION IN SPSS V ISUALS

33
This tells us that 99.9% of the variation in Receipts can be explained by our linear regression model Note: R Square is the Coefficient of multiple determination. It shows the strength of the association between the Dependent Variable (Y) and two or more Independent Variables (Xs) (From 0 to 1, usually reported as a percentage) R Square adjusted for the number of Independent variables and the sample size Is the relationship Significant? That is, is it strong enough to indicate there is also a relationship in the population? P value = 0.000 < 0.05 Therefore, the relationship is significant M ULTIPLE R EGRESSION IN SPSS O UTPUT

34
Multiple Regression Output Partial Regression Coefficients These can be used to construct the regression equation for Receipts. Receipts = a + b 1 X 1 + b 2 X 2 + b 3 X 3 + … + b k X k Receipts = -18.320 + 0.076*PaidAttendance + 0.007*Shows + 0.238*AvgTicketPrice If we know the values for the three predictors we can use the regression equation to predict the Receipts value.

35
Multiple Regression Output Testing the significance of the Regression Coefficients The t and Sig t values given in the Coefficients table tell us which partial regression coefficients (slopes) differ significantly from zero. In this example the variables that contribute significantly are: PaidAttendance: t= 120.751, p=0.000 AvgTicketPrice: t=61.014, p=0.000 p-values < 0.05

36
In this example the variables that contribute significantly are: PaidAttendance: t= 120.751, p=0.000 AvgTicketPrice: t=61.014, p=0.000 p-values < 0.05 The regression equation can therefore be rewritten as: Receipts = -18.320 + 0.076*PaidAttendance + 0.238*AvgTicketPrice M ODEL W HICH PREDICTORS ARE SIGNIFICANT ?

37
Multiple Regression Output Partial Regression Coefficients – INTERPRETATION The partial regression coefficient for AvgTicketPrice might be interpreted: If PaidAttendance is statistically controlled, an increase of 1 in AvgTicketPrice will INCREASE the predicted Receipts Value by 0.238.

38
Multiple Regression Output Standardised Regression Coefficients Useful in assessing the relative importance of the predictors and comparing predictors across samples. Are coefficients that have been adjusted so that the y intercept (constant) is zero and S.D is 1. The most important predictor in this model is PaidAttendance (Beta = 0.955).

39
Multiple Regression Interpretation Multiple Regression analysis was undertaken to determine the factors that contribute to Receipts (in millions) of the Newcastle Entertainment Centre. Results indicated that the number of paid attendees (t=120.751, p=0.000) and the average ticket price (t=61.014, p=0.000) are significant predictors to this value. The most important predictor in the model was PaidAttendance (Beta = 0.955). The regression models with the significant predictors is: Receipts = -18.320 + 0.076*PaidAttendance + 0.238*AvgTicketPrice

40
Multiple Regression Interpretation Receipts = -18.320 + 0.076*PaidAttendance + 0.238*AvgTicketPrice If the average ticket price is statistically controlled (or fixed), an increase in 1000 paying customers will increase the Receipts value by $76,000. If the number of paid attendees is statistically controlled, an increase of $1 in the average ticket price will generate an average increase in Receipts of $238. This regression model explains 99.9% of the variation in the receipts generated for Newcastle Entertainment Centre. Therefore this is a good model as nearly all of the variability in Receipts is explained by this model.

41
Dont claim to hold everything else constant for a single individual. (For the predictors Age and Years of Education, it is impossible for an individual to get a year of education at constant age.) Dont interpret regression causally. Statistics assesses correlation, not causality. Be cautious about interpreting a regression as predictive. That is, be alert for combinations of predictor values that take you outside the ranges of these predictors.

42
Be careful when interpreting the signs of coefficients in a multiple regression. The sign of a variable can change depending on which other predictors are in or out of the model. The truth is more subtle and requires that we understand the multiple regression model. If a coefficients t-statistic is not significant, dont interpret it at all. Dont fit a linear regression to data that arent straight. Usually, we are satisfied when plots of y against the xs are straight enough.

43
Watch out for changing variance in the residuals. The most common check is a plot of the residuals against the predicted values. Make sure the errors are nearly normal. Watch out for high-influence points and outliers.

44
Review Fundamentals of Quantitative Analysis

45
Lecture Plan Week 1: The Role, Collection and Presentation of Quantitative Data in the Business Decision Making Process Week 2: Examining Data Characteristics: Descriptive Statistics and Data Screening Week 3: Estimation and Hypothesis Testing Week 4: Testing for Differences: One sample, Independent and Paired Sample t-tests and ANOVA Week 5: Testing for Associations: Chi Square, Correlation and Simple Regression Analysis Week 6: Multiple Regression Analysis

46
Lecture 1 Objectives: 1.Explain the use of quantitative techniques in business; 2.Discuss the role of quantitative data analysis in the Business Decision Making Process; 3.Explain different sources of quantitative data and how it is collected; 4.Define and describe different types of data; 5.Recognise the potential for using different methods of data presentation in business; 6.Outline the major alternative methods of data presentation; 7.Select between the major alternative methods; and 8.Describe the limitations of data presentation methods.

47
Lecture 2 Objectives: 1.Describe and display categorical data; 2.Generate and interpret frequency tables, bar charts and pie charts; 3.Generate and interpret histograms to display the distribution of a quantitative variable; 4.Describe the shape, centre and spread of a distribution; 5.Compute descriptive statistics and select between mean/median and standard deviation / interquartile range; and 6.Explain data screening and its purpose, and be able to assess a distribution for normality.

48
Lecture 3 Objectives: 1.Formulate a null and alternate hypothesis for a question of interest; 2.Explain what a test statistic is; 3.Explain p-values; 4.Describe the reasoning of hypothesis testing; 5.Determine and check assumptions for the sampling distribution model; 6.Compare p-values to a pre-determined significance level to decide whether to reject the null hypothesis; 7.Recognise the value of estimating and reporting the effect size; and 8.Explain Type I and Type II errors when testing hypotheses.

49
Lecture 4 Objectives: 1.Recognise when to use a one sample t-test, independent samples t- test, paired samples t-test and ANOVA; 2.Explain and check the assumptions and conditions for each test; 3.Run and interpret a 'One sample t-test' to show a sample mean is different from some hypothesised value; 4.Run and interpret an independent samples t-test to show the difference between two groups on one attribute; 5.Run and interpret a 'one-way ANOVA' to show the difference between more than two groups on one attribute; and 6.Run and interpret a paired samples t-test to show the difference between two attributes as assessed by one sample.

50
Lecture 5 Objectives: 1.Recognise when a chi-square test of independence is appropriate; 2.Check the assumptions and corresponding conditions for a chi- square test of independence; 3.Run and interpret a chi-square test of independence; 4.Produce and explain a scatter plot to display the relationship between two quantitative variables; 5.Interpret the association between two quantitative variables using a Pearson's correlation coefficient; 6.Model a linear relationship with a least squares regression model; 7.Explain and Check the assumptions and conditions for inference about regression models; and 8.Examine the residuals from a linear model to assess the quality of the model.

51
Lecture 6 Objectives: 1.Explain and conduct multiple regression analysis in SPSS; 2.Interpret a multiple regression model; and 3.Check the assumptions and conditions of a multiple regression model

52
Quantitative vs Qualitative Data http://www.youtube.com/watch?v=ddx9PshVWXI&featu re=related http://www.youtube.com/watch?v=ddx9PshVWXI&featu re=related

53
End of Quantitative Model

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google