Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter Thirteen McGraw-Hill/Irwin

Similar presentations


Presentation on theme: "Chapter Thirteen McGraw-Hill/Irwin"— Presentation transcript:

1 Chapter Thirteen McGraw-Hill/Irwin
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.

2 Linear Regression and Correlation
Chapter Thirteen Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: ONE Draw a scatter diagram. TWO Understand and interpret the terms dependent variable and independent variable. THREE Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate. FOUR Conduct a test of hypothesis to determine if the population coefficient of correlation is different from zero. Goals

3 Linear Regression and Correlation
Chapter Thirteen continued Linear Regression and Correlation GOALS When you have completed this chapter, you will be able to: FIVE Calculate the least squares regression line and interpret the slope and intercept values. SIX Construct and interpret a confidence interval and prediction interval for the dependent variable. SEVEN Set up and interpret an ANOVA table. Goals

4 The Dependent Variable is the variable being predicted or estimated.
Correlation Analysis is a group of statistical techniques to measure the association between two variables. A Scatter Diagram is a chart that portrays the relationship between two variables. The Independent Variable provides the basis for estimation. It is the predictor variable. The Dependent Variable is the variable being predicted or estimated. Correlation Analysis

5 The Coefficient of Correlation, r
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables. Also called Pearson’s r and Pearson’s product moment correlation coefficient. It requires interval or ratio-scaled data. It can range from -1.00 to 1.00. Values of or 1.00 indicate perfect and strong correlation. Negative values indicate an inverse relationship and positive values indicate a direct relationship. Values close to 0.0 indicate weak correlation. The Coefficient of Correlation, r

6 Perfect Negative Correlation
10 9 8 7 6 5 4 3 2 1 Y X Perfect Negative Correlation

7 Perfect Positive Correlation
10 9 8 7 6 5 4 3 2 1 Y X Perfect Positive Correlation

8 10 9 8 7 6 5 4 3 2 1 Y X Zero Correlation

9 Strong Positive Correlation
10 9 8 7 6 5 4 3 2 1 Y X Strong Positive Correlation

10 S(X – X)(Y – Y) r= (n-1)sxsy
We calculate the coefficient of correlation from the following formula. S(X – X)(Y – Y) (n-1)sxsy r= Formula for r

11 Coefficient of Determination
The coefficient of determination (r2) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). It is the square of the coefficient of correlation. It ranges from 0 to 1. It does not give any information on the direction of the relationship between the variables. Coefficient of Determination

12 Dan Ireland, the student body president at Toledo State University, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book. To provide insight into the problem he selects a sample of eight textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient. Example 1

13 Introduction to History 500 84 Basic Algebra 700 75
Book Page Price($) Introduction to History Basic Algebra Introduction to Psychology Introduction to Sociology Business Management Introduction to Biology Fundamentals of Jazz Principles of Nursing Example 1 continued

14 Example 1 continued

15 E X C L Example 1 continued

16 S(X – X)(Y – Y) Example 1 continued

17 = .657 r2 = .6572 = .432 S(X – X)(Y – Y) r = (n-1)sxsy 7800
7(138.87)(12.21) = = .657 r2 = = .432 The correlation between the number of pages and the selling price of the book is This indicates a moderate association between the variable. Example 1 continued

18 T-test of significance of r
Did a computed r come from a population of paired observations with zero correlation? Ho: r = 0 (The correlation in the population is zero.) H1: r ≠ 0 (The correlation in the population is different from zero. t test for the coefficient of correlation r  n- 2  1- r2 t = With n-2 d.f. T-test of significance of r

19 H0: the correlation in the population is zero.
Computed r = Test the hypothesis that there is no correlation in the population. Use a .02 significance level. Step 1 H0: the correlation in the population is zero. H1:The correlation in the population is not zero. Step 2 Significance level is .02. Step 4 H0 is rejected if t>3.143 or if t<-3.143 or if p < There are 6 degrees of freedom, found by n – 2 = 8 – 2 = 6. Step 3 The statistic to use follows the t distribution.

20 Find the value of the test statistic.
Step 5 Find the value of the test statistic. r  n- 2  1- r2 .657  8 – 2 t = = = 2.135 p(t > 2.135) = .077 H0 is not rejected. We cannot reject the hypothesis that there is no correlation in the population. The amount of association could be due to chance. Example 1 continued

21 Both variables must be at least interval scale.
In Regression Analysis we use the independent variable (X) to estimate the dependent variable (Y). Both variables must be at least interval scale. The relationship between the variables is linear. The least squares criterion is used to determine the equation. That is the term S(Y – Y’)2 is minimized. Regression Analysis

22 The regression equation is Y’= a + bX
where Y’ is the average predicted value of Y for any X. a is the Y-intercept. It is the estimated Y value when X=0 b is the slope of the line, or the average change in Y’ for each change of one unit in X The least squares principle is used to obtain a and b. Regression Analysis

23 The least squares principle is used to obtain a and b
The least squares principle is used to obtain a and b. The equations to determine a and b are: sy sx b = r a = Y – bX Regression Analysis

24 Develop a regression equation for the information given in example 1 that can be used to estimate the selling price based on the number of pages. sy sx b = r 12.21 138.87 = (.657) = .0578 a = Y – bX = *625 = 43.39 Example 1 revisited

25 The regression equation is: Y’ = 43.39 + .0578X
The sign of the b value and the sign of r will always be the same. The slope of the line is Each addition page costs about a nickel. The equation crosses the Y-axis at $ A book with no pages would cost $43.39. Example 1 revisited

26 The estimated selling price of an 800 page book is $89.61, found by
We can use the regression equation to estimate values of Y. The estimated selling price of an 800 page book is $89.61, found by Price = $ (Number of Pages) = $ (800) = $89.61 Example 1 revisited

27 The Standard Error of Estimate
The Standard Error of Estimate measures the scatter, or dispersion, of the observed values around the line of regression The formula that is used to compute the standard error: The Standard Error of Estimate

28 Find the standard error of estimate for the problem involving the number of pages in a book and the selling price. = 9.944 Example 1 revisited

29 Assumptions Underlying Linear Regression
For each value of X, there is a group of Y values, and these Y values are normally distributed. The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values. The means of these normal distributions of Y values all lie on the straight line of regression. The standard deviations of these normal distributions are the same.

30 Y’is the predicted value for any selected X value
The confidence interval for the mean value of Y for a given value of X is given by: Y’is the predicted value for any selected X value X is an selected value of X X is the mean of the Xs n is the number of observations Sy.x is the standard error of the estimate t is the value of t at n-2 degrees of freedom Confidence Interval

31 For our earlier price estimate of $89
For our earlier price estimate of $89.61, the confidence interval, assuming a desired 95% confidence, is calculated as follows. Example 1

32 Y’ the predicted value, is $89.61 X is 800 pages
X is 625, the mean of the pages n is 8, the number of observations Sy.x is 9.944, the standard error of the estimate t is at 8-2 degrees of freedom and 95% confidence Example 1 revisited

33 The prediction interval for an individual value of Y for a given value of X

34 Summarizing The Results
The estimated selling price for a book with 800 pages is $89.61. The standard error of estimate is $9.94. The 95 percent confidence interval for all books with 800 pages is $ $ This means the limits are between $75.18 and $ The 95 percent prediction interval for a particular book with 800 pages is $ $ The means the limits are between $61.32 and $ These results appear in the following Minitab and Excel outputs. Example 1 revisited

35 M I N T A B Example 1 revisited Regression Analysis
The regression equation is Price = No of Pages Predictor Coef StDev T P Constant No of Pages S = R-Sq = 43.2% R-Sq(adj) = 33.7% Analysis of Variance Source DF SS MS F P Regression Error Total Fit StDev Fit % CI % PI ( , ) ( , ) M I N T A B Example 1 revisited

36 E X C L Example 1 revisited


Download ppt "Chapter Thirteen McGraw-Hill/Irwin"

Similar presentations


Ads by Google