Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.

Similar presentations


Presentation on theme: "Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation."— Presentation transcript:

1 Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation

2 The Business of Prediction Important to know if relationships exist among variables. Egs. Amount of gas & the mileage Advertisement budget & actual sales Population vs. precipitation Recognizing and modeling the relationship between two variables can be useful in predicting. Eg. Predicting how much the sales revenue would be if a certain dollar amount is spent on advertising. Dependent Variable The Dependent Variable is the variable being predicted or estimated. Independent Variable The Independent Variable provides the basis for estimation. It is the predictor variable.

3 Correlation Analysis Measurement of association between two variables. Scatter Diagram A Scatter Diagram is a chart that portrays the relationship between two variables. If you suspect two variables to have a relationship, start with drawing a scatter plot.

4 Using Excel to create a Scatter Plot (Chart Wizard) Example on Page 379-81

5 CorrelationCoefficient Correlation Coefficient (Pearson R) Measures strength of the relationship between two variables. It requires interval or ratio-scaled data. It can range from -1.00 to 1.00. Positive values indicate a direct relationship & negative values indicate an inverse relationship. Values of -1.00 or 1.00 indicate perfect and strong correlation. Values close to 0.0 indicate weak correlation.

6

7  It is the square of the coefficient of correlation ( R ).  It also ranges from 0 to 1.  The proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). Eg. 80% of the variation in miles driven is accounted by number of gallons in the tank. The 20% is influenced by road conditions, number of passengers, etc. (more discussion later!). coefficient of determination The coefficient of determination (R 2 )

8 Correlation and Cause A high correlation indicates a strong relationship between the variables. But they don’t necessarily mean, one variable influences the other. Eg: Higher SAT scores lead to better college grades Another eg: A study measured the number of TV sets per person (say, X) & the life expectancy (say, Y) for every country. The study found a high correlation. On this basis, it was concluded that countries with more TV sets have a higher life expectancies.

9 Regression Analysis If there is a strong relationship (r value) between two variables, one can estimate a linear model of the form: Y’= a + bX [a=Estimate of α; b=Estimate of β] where Y’ is the predicted value and Y is the actual value for a given X. a is the Y-intercept (it is value of Y’ when X=0). b is the slope of the line, or the average change in Y’ for each change of one unit in X The least squares principle is used to fit the line. ie., Σ(Y – Y’) 2 is minimized. a and b are calculated as: b = r sysxsysx a = Y – bX { Regression line always passes through (X,Y) } { If r=1, slope is similar to Δy/Δx } Error in prediction ε

10 (Actual) (Predicted) (Y’ – Y) is the error in prediction. Error in prediction

11 Example (page 400) The production supervisor of XYZ Inc. looked at the number of units produced by 5 of his employees during a week. He also looked at how long they had been working for the company. (Years) (#ofUnits) The supervisor wants to know (i) if there is a correlation between X and Y (ie. R ) (ii) the equation to the regression line (ie. Y’ = a + bX) (iii) how much of variation in Y is explained by X (ie. R 2 )

12 Using Excel for Regression 1. What is the independent variable? 2. What is the dependent variable? 3. What is the regression equation? 4. Is it a significant predictor of #Units? 5. Is Years a sig. predictor of #Units? 6. If one had Years=20, predict #Units. 7. Construct a 95% CI around it. Use SE to calculate CI Watch the screencam tutorial in the book CD to learn how to use Excel for regression See also lab handout. The equation will be correct in 96% of the cases

13 Calculating Total Variation Page 402 Now, we want to find out how much of this variation is contributed by Years on Job. (Years) (#ofUnits) The sample mean is 6. Total Variation is given by Σ(Y-Y) 2. [see Chapter 3, pages 78 & 80]

14 When we came up with the Regression equation, Y’ = 2 + 0.4X we added the assumption that Years on the job & Production are related. Let us see how well this equation fits our data. It can be seen that the ‘fit’ between Y’ and Y is not ‘perfect’. Let us calculate the error variation as shown in next slide. Catching the ‘Error’ (Actual) (Predicted)

15

16 Calculating Unexplained Variation (Error in prediction) Page 401

17 Unexplained variation Total variation 1 - R 2 = Calculating Coefficient of Determination Substituting the values for the Unexplained & Total variations from our example problem, we get = 1 – 4/20 = 16/20 = 0.8* Thus, we say that 80% of the variation in weekly production is explained by years of experience on the job. * Compare this with the computer output on the next slide R 2 = Explained variation / Total variation  (Equation 1) Explained variation = [ Total variation - Unexplained variation ]  (Equation 2) SSR = SST - SSE Substituting Equation2 in Equation1, R 2 = [ Total variation - Unexplained variation ] / Total variation  (Equation 3)

18 Interpreting Excel Regression Output Make sure you know how to interpret the Excel output. (No kidding!) p-value Use this for CI F, R 2 & SE tell if the regression model is really useful for prediction.

19 Multiple Regression You can extend the idea of linear regression and make an independent variable dependent on more than one variable. Eg. The price of a house can be dependent on Sq ft, Number of bedrooms, Baths, Pool, Garage, etc. [see page 503]. The general multiple regression equation is: Y’ = a + b 1. X 1 + b 2. X 2 + … + b n. X n

20 Practice! 1. What are the independent variables? 2. What is the dependent variable? 3. What is the regression equation? 4. Is it a significant predictor of Price? 5. Is Bedrooms a sig. predictor of Price? 6. Is Baths a sign. predictor of Price? 7. If one had 8 Bedrooms, predict Price. 8. Construct a 95% CI around it.


Download ppt "Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation."

Similar presentations


Ads by Google