We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byWaylon Over
Modified about 1 year ago
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Multiple Regression Predicting a single Y variable from two or more X variables Describe and Understand the Relationship Understand the effect of one X variable while holding the others fixed Forecast (Predict) a New Observation Lets you use all available information (X variables) to find out about what you don’t know (the Y variable for this new situation) Adjust and Control a Process because the regression equation (you hope) tells you what would happen if you made a change
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Input Data n cases (elementary units) k explanatory X variables Case 1 Case 2. Case n Y (dependent variable to be explained) X 1 (first independent or explanatory variable) X k (last independent or explanatory variable) … ……...………...…
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results Intercept: a Predicted value for Y when every X is 0 Regression Coefficients: b 1, b 2, …b k The effect of each X on Y, holding all other X variables constant Prediction Equation or Regression Equation (Predicted Y) = a+b 1 X 1 +b 2 X 2 +…+b k X k The predicted Y, given the values for all X variables Prediction Errors or Residuals (Actual Y) – (Predicted Y)
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results (continued) Standard Error of Estimate: S e or S Approximate size of errors made predicting Y Coefficient of Determination: R 2 Percentage of variability in Y explained by the X variables as a group F Test: Significant or Not Significant Tests whether the X variables, as a group, can predict Y better than just randomly
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Results (continued) t Tests for Individual Regression Coefficients Significant or not significant, for each X variable Tests whether a particular X variable has an effect on Y, holding the other X variables constant Should be performed only if the F test is significant Standard Errors of the Regression Coefficients (with n – k – 1 degrees of freedom) Indicates the estimated sampling standard deviation of each regression coefficient Used in the usual way to find confidence intervals and hypothesis tests for individual regression coefficients
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Magazine Ads Input Data To predict cost of ads from magazine characteristics Audubon Better Homes. YM Y Page Costs (color ad) $25, , ,270 X 1 Audience (thousands) 1,645 34,797. 3,109 X 3 Median Income $38,787 41, ,696 X 2 Percent Male
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Prediction, Intercept a Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Intercept a = $4,043 Essentially a base rate, representing the cost of advertising in a magazine that has no audience, no male readers, and zero income level But there are no such magazines intercept a is merely there to help achieve best predictions
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 1 Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 1 = 3.79 All else equal: The effect of Audience on Page Costs, while holding Percent Male and Median Income constant The effect of Audience on Page Costs, adjusted for Percent Male and Median Income On average, Page Costs are estimated to be $3.79 higher for a magazine with one more (thousand) Audience, as compared to another magazine with the same Percent Male and Median Income
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 2 Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 2 = – 124 All else equal: The effect of Percent Male on Page Costs, while holding Audience and Median Income constant The effect of Percent Male on Page Costs, adjusted for Audience and Median Income On average, Page Costs are estimated to be $124 lower for a magazine with one more percentage point of male readers, as compared to another magazine with the same Audience and Median Income But don’t believe it! We will see that it is not significant
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coefficient b 3 Predicted Page Costs = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) Regression coefficient b 3 = All else equal: The effect of Median Income on Page Costs, while holding Audience and Percent Male constant The effect of Median Income on Page Costs, adjusted for Audience and Percent Male On average, Page Costs are estimated to be $0.903 higher for a magazine with one more dollar of Median Income, as compared to another magazine with the same Audience and Percent Male
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Prediction and Residual Predicted Page Costs for Audubon = a + b 1 X 1 + b 2 X 2 + b 3 X 3 = $4, (Audience) – 124(Percent Male) (Median Income) = $4, (1,645) – 124(51.1) (38,787) = $38,966 Actual Page Costs are $25,315 Residual is $25,315 – 38,966 = –$13,651 Audubon has Page Costs $13,651 lower than you would expect for a magazine with its characteristics (Audience, Percent Male, and Median Income) Residual = Actual – Predicted
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Standard Error Standard Error of Estimate S e Indicates the approximate size of the prediction errors About how far are the Y values from their predictions? For the magazine data S e = S = $21,578 Actual Page Costs are about $21,578 from their predictions for this group of magazines (using regression) Compare to S Y = $45,446: Actual Page Costs are about $45,446 from their average (not using regression) Using the regression equation to predict Page Costs (instead of simply using) the typical error is reduced from $45,446 to $21,578
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: Coeff. of Determination Coefficient of Determination R 2 Indicates the percentage of the variation in Y that is explained by (or attributed to) all of the X variables How well do the X variables explain Y? For the magazine data R 2 = = 78.7% The X variables (Audience, Percent Male, and Median Income) taken together explain 78.7% of the variance of Page Costs This leaves 100% – 78.7% = 21.3% of the variation in Page Costs unexplained
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Multiple Regression Linear Model Linear Model for the Population Y = ( + 1 X 1 + 2 X 2 + … + k X k ) + = (Population relationship) + Randomness Where has a normal distribution with mean 0 and constant standard deviation , and this randomness is independent from one case to another An assumption needed for statistical inference
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Population and Sample Quantities Table Intercept or constant Regression coefficients Uncertainty in Y 12...k12...k a b 1 b 2. b k S or S e Population (parameters: fixed and unknown) Sample (estimators: random and known)
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and The F test Is the regression significant? Do the X variables, taken together, explain a significant amount of the variation in Y? The null hypothesis claims that, in the population, the X variables do not help explain Y; all coefficients are 0 H 0 : 1 = 2 = … = k = 0 The research hypothesis claims that, in the population, at least one of the X variables does help explain Y H 1 : At least one of 1, 2, …, k 0
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Performing the F test Three equivalent methods for performing F test; they always give the same result Use the p-value If p < 0.05, then the test is significant Same interpretation as p-values in Chapter 10 Use the R 2 value If R 2 is larger than the value in the R 2 table, then the result is significant Do the X variables explain more than just randomness? Use the F statistic If the F statistic is larger than the value in the F table, then the result is significant
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: F test For the magazine data, The X variables (Audience, Percent Male, and Median Income) explain a very highly significant percentage of the variation in Page Costs The p-value, listed as 0.000, is less than , and is therefore very highly significant (since it is less than 0.001) The R 2 value, 78.7%, is greater than 27.1% (from the R 2 table at level 0.1% with n = 55 and k = 3), and is therefore very highly significant The F statistic, 62.84, is greater than the value (between and 6.171) from the F table at level 0.1%, and is therefore very highly significant
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and t Tests A t test for each regression coefficient To be used only if the F test is significant If F is not significant, you should not look at the t tests Does the j th X variable have a significant effect on Y, holding the other X variables constant? Hypotheses are H 0 : j = 0,H 1 : j 0 Test using the confidence interval use the t table with n – k – 1 degrees of freedom Or use the t statistic compare to the t table value with n – k – 1 degrees of freedom Significant if 0 is not in the interval
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Example: t Tests Testing b 1, the coefficient for Audience b 1 = 3.79, t = 13.5, p = Audience has a very highly significant effect on Page Costs, after adjusting for Percent Male and Median Income Testing b 2, the coefficient for Percent Male b 2 = – 124, t = – 0.90, p = Percent Male does not have a significant effect on Page Costs, after adjusting for Audience and Median Income Testing b 3, the coefficient for Median Income b 3 = 0.903, t = 2.44, p = Median Income has a significant effect on Page Costs, after adjusting for Audience and Percent Male p < p > 0.05 p < 0.05
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Comparing the X variables Standardized Regression Coefficients Indicate relative importance of the information each X variable brings in addition to the others Ordinary regression coefficients are in different units And cannot be compared without standardization Defined as for the j th X variable Compare the absolute values Correlation Coefficients Indicate relative importance of the information each X variable brings without adjusting for the other X variables
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Problems with Multiple Regression Multicollinearity When some X variables are too similar to one another Might do a good job of explaining and predicting Y But t tests might not significant because no X variable is bringing new information Variable Selection How to choose from a long list of X variables? Too many: waste the information in the data Too few: risk ignoring useful predictive information Model Misspecification Perhaps the multiple regression linear model is wrong Unequal variability? Nonlinearity? Interaction?
Multiple Regression Analysis Multiple Regression Model Sections
Regression Analysis: A statistical procedure used to find relationships among a set of variables.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
MULTIPLE REGRESSION ANALYSIS. ENGR. DIVINO AMOR P. RIVERA STATISTICAL COORDINATION OFFICER I NSO LA UNION CONTENTS Table for the types of Multiple Regression.
Chapter 9: Simple Regression Continued Hypothesis Testing and Confidence Intervals.
Multiple Regression. Introduction Describe some of the differences between the multiple regression and bi-variate regression Assess the importance of.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Chapter 4 To accompany Quantitative Analysis for Management, Eleventh Edition, by Render, Stair, and Hanna Power Point slides created by Brian Peterson.
Multiple Linear Regression Laurens Holmes, Jr. Nemours/A.I.duPont Hospital for Children Nothing explains everythin g.
Chapter 5 Multiple Linear Regression 1. Introduction Fit a linear relationship between a quantitative dependent variable and a set of predictors. Assume.
Korelasi Diri (Auto Correlation) Pertemuan 15 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
21-1 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 21 Hypothesis.
Agresti/Franklin Statistics, 1 of 141 Chapter 12 Multiple Regression Learn…. T o use Multiple Regression Analysis to predict a response variable using.
Introduction to Hypothesis Testing. What is a Hypothesis Test? A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 9 Introduction to the t-statistic PSY295 Spring 2003 Summerfelt.
Things to do in Lecture 1 Outline basic concepts of causality Go through the ideas & principles underlying ordinary least squares (OLS) estimation Derive.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Parametric vs Non-Parametric Statistical Tests. Single Sample.
Statistical inference Ian Jolliffe University of Aberdeen CLIPS module 3.4b.
T-tests, ANOVA and regression Methods for Dummies February 1 st 2006 Jon Roiser and Predrag Petrovic.
Will G Hopkins Auckland University of Technology Auckland NZ Quantitative Data Analysis Summarizing Data: variables; simple statistics; effect statistics.
Chapter 2 Overview of the Data Mining Process 1. Introduction Data Mining – Predictive analysis Tasks of Classification & Prediction Core of Business.
Analyzing Significant Differences between Means Dr. K. A. Korb University of Jos.
Effect Sizes and Power Review. Statistical Power Statistical power refers to the probability of finding a particular sized effect Specifically, it is.
Statistics. Be able to state the null and alternative hypotheses for testing the difference between two population proportions. Know how to examine.
Presentation of Data Tables and graphs are convenient for presenting data. They present the data in an organized format, enabling the reader to find information.
1 MRes Wednesday 11 th March 2009 Logistic regression.
X,Y scatterplot These are plots of X,Y coordinates showing each individual's or sample's score on two variables. When plotting data this way we are usually.
Lecture 4. Linear Models for Regression. Outline Linear Regression Least Square Solution Subset Least Square subset selection/forward/backward Penalized.
Questions From Yesterday Equation 2: r-to-z transform –Equation is correct –Comparable to other p-value estimates (z = r sqrt[n]) ANOVA will not be able.
SECTION 13.2 Comparing Two Proportions. In this scenario, we desire to compare two populations or the responses to two treatments based on two independent.
© 2016 SlidePlayer.com Inc. All rights reserved.