Presentation is loading. Please wait.

Presentation is loading. Please wait.

13 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Similar presentations


Presentation on theme: "13 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved."— Presentation transcript:

1 13 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

2 13 - 2 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. When you have completed this chapter, you will be able to: Identify a relationship between variables on a scatter diagram Measure and interpret a degree of relationship by a coefficient of correlation Conduct a test of hypothesis about the coefficient of correlation in a population 1. 2. 3. 4. 13 - 2 Identify the roles of dependent and independent variables, the concept of regression, and its distinction from the concept of correlation.

3 13 - 3 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Conduct a test of hypothesis for a regression model and each coefficient of regression. 6. 7. 8. Conduct analysis of variance and calculate coefficient of determination. Estimate confidence and prediction intervals 5. 13 - 3 Measure and interpret the strength of relationship between two variables through a regression line and the technique of least squares.

4 13 - 4 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. T erminology …is a chart that portrays the relationship between the two variables. Scatter Diagram Correlation Analysis …is a group of statistical techniques used to measure the strength of the association between two variables. Dependent Variable …is the variable being predicted or estimated. …provides the basis for estimation. It is the predictor variable. Independent Variable

5 13 - 5 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The Coefficient of Correlation… r … Is a measure of strength of the relationship between two variables … It requires interval or ratio-scaled data … It can range from -1.00 to 1.00 …Values of -1.00 or 1.00 indicate perfect and strong correlation …Values close to 0.0 indicate weak correlation … Negative values indicate an inverse relationship and positive values indicate a direct relationship

6 13 - 6 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Perfect Negative Correlation

7 13 - 7 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Perfect Positive Correlation

8 13 - 8 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Zero Correlation

9 13 - 9 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 0 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 0 X Y Example Strong Positive Correlation

10 13 - 10 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chart 13-6 13 - 10

11 13 - 11 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Chart 13.4

12 13 - 12 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. How Income and Well-Being of Canadians are Related (1971-97) r = 0.7415 Estimate r

13 13 - 13 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Formula for Correlation Coefficient Formula for Correlation Coefficient sX SYsX SY x ( x - r =  )( y - ) ( n – 1 )  n xy – ( x )( y )  n x 2 – ( x ) 2  n y 2 – ( y ) 2  y _

14 13 - 14 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. … represented by r 2 … is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the variation in the independent variable (X). … it is the square of the coefficient of correlation … it ranges from 0 to 1 … it does not give any information on the direction of the relationship between the variables Coefficient of Determination

15 13 - 15 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample of eight (8) textbooks currently on sale in the bookstore. Draw a scatter diagram. Compute the correlation coefficient. Correlation Coefficient

16 13 - 16 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Book # Pages Price ($) Into to History500 84 Basic Algebra700 75 Intro. to Psych.800 99 Intro. to Sociology600 72 Bus. Mgmt.400 69 Intro to Biology500 81 Fund. of Jazz600 63 Intro. to Nursing800 93 Data Correlation Coefficient

17 13 - 17 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. 400500600700800 60 70 90 100 80 Price ($) Pages Scatter Diagram of Number of Pages and Selling Price of Text Scatter Diagram

18 13 - 18 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Scatter Diagram Excel Printout Scatter Diagram Excel Printout

19 13 - 19 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Book # Pages Price ($) Into to History 50084 Basic Algebra 700 75 Intro. to Psych. 800 99 Intro. to Sociology 600 72 Bus. Mgmt. 400 69 Intro to Biology 500 81 Fund. of Jazz 600 63 Intro. to Nursing 800 93 Total 4900 636 Correlation Coefficient x y xy x 2 y 2 42 000 250 000 7 056 52 500490 000 5 625 79 200640 000 9 801 43,200360 000 5 184 27 600160 000 4 761 4 050250 000 6 561 37 800360 000 3 969 74 400640 000 8 649 397 2003150 000 51 606 ))( ( yx xy n r 2 )( x  2 xn     2 )( y  2 yn 

20 13 - 20 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Correlation Coefficient x y xy x 2 y 2 4 900 636 397 200 3 150 00051 606  The correlation coefficient is 61.4%. This indicates a moderate association between the variables. 2 )636()606,51(8  2 315 000)9004((8  )636)(4 900(  r = 0.614 ))( ( yx xy n r 2 )( x  2 xn     2 )( y  2 yn  )200397(8 

21 13 - 21 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. H 0 is rejected if t>3.143 or if t<-3.143. There are 6 df, found by n – 1 = 8 – 2 = 6. Let’s test the hypothesis that there is no correlation in the population. Use a.02 significance level. H 0 : r = 0 H 1 : r  0  = 0.02 State the null and alternate hypotheses Step 1 Select the level of significance Step 2 Identify the test statistic Step 3 State the decision rule Step 4 Compute the test statistic and make a decision Step 5...Step 5

22 13 - 22 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Compute the test statistic and make a decision Step 5 H 0 is not rejected. We cannot reject the hypothesis...that there is no correlation in the population. The amount of association could be due to chance. continued… Let’s test the hypothesis that there is no correlation in the population. Use a.02 significance level. 905.1 2 )614(.1 28614.    

23 13 - 23 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. We use the independent variable (X) to estimate the dependent variable (Y) Regression Analysis … both variables must be at least interval scale … the relationship between the variables is linear … least squares criterion is used to determine the equation… i.e. the term  (y – y) 2 is minimized ^

24 13 - 24 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Regression Equation …a is the Y-intercept … it is the estimated y value when x = 0 …the least squares principle is used to obtain a and b y = a + bx …y is the average predicted value of y for any x …b is the slope of the line, or the average change in y for each change of one unit in x

25 13 - 25 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. a y n b x n   b nxyxy nxx    ()()() ()()   22 Regression Equation y = a + bx

26 13 - 26 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Dan Ireland, the student body president, is concerned about the cost to students of textbooks. He believes there is a relationship between the number of pages in the text and the selling price of the book! To provide insight into the problem he selects a sample of eight (8) textbooks currently on sale in the bookstore. Develop a regression equation that can be used to estimate the selling price based on the number of pages !

27 13 - 27 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. x y xy x 2 y 2 4 900 636 397 200 3 150 00051 606  8(397 200) – (4 900)(636) 8(3 150 000) – (4 900) 2 = =.05143 = 636 8 - 0.05143 4 900 8 = 48.0 = 48.0 + 0.05x Suggests …each extra page adds $0.05 to the price of a book; the y-intercept suggests that a book with 0 pages would cost $48. b nxyxyxy nx x    ()()() ()( )    2 2 a y n b x n   y = a + bx

28 13 - 28 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. …continued Find the estimated selling price of an 800 page book. Substituting 800 for x, The estimated selling price of an 800 page book is $89.14 y = 48 + 0.05x y = 48 + 0.05(800) = 89.14

29 13 - 29 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using Excel

30 13 - 30 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using Excel Click on CHART WIZARD See

31 13 - 31 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Click on XY (Scatter) Using Excel

32 13 - 32 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. INPUT DATA range Click Next Using Excel

33 13 - 33 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Complete INPUTTING of TITLES Click Next Click Finish Using Excel

34 13 - 34 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. To “format the axes scales”… Right mouse click on one of the axes Complete INPUTTING of VALUES Click OK See Using Excel Click on Format Axis

35 13 - 35 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. To remove the Legend on the right side… Right mouse click and Click on Clear Using Excel

36 13 - 36 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. To add the Regression Line and equation to this scatter plot… Right mouse click on one of the data points... Scroll down to Add Trendline... Click Using Excel See

37 13 - 37 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. … then CLICK on OPTIONS TAB Using Excel Click OK Choose Linear See

38 13 - 38 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Check EQUATION and R-squared Value Click OK Using Excel See

39 13 - 39 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. You can now interpret your results! Using Excel Concerned about the y intercept?

40 13 - 40 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Alternate Solution Formatting the axes… Resulted in …. a distortion of the y-intercept

41 13 - 41 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using Excel

42 13 - 42 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using Excel See Click on Tools Click on DATA ANALYSIS See

43 13 - 43 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Highlight REGRESSION Using Excel See …Click OK

44 13 - 44 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. INPUT NEEDS Using Excel See …Click OK

45 13 - 45 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Using Excel See

46 13 - 46 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The regression equation is: y = - 0.07x +22.6 The regression equation is: y = - 0.07x +22.6 Using Excel

47 13 - 47 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The Standard Error of Estimate …this measures the scatter, or dispersion, of the observed values around the line of regression The formulas that are used to compute the standard error are:   S e 2 2     n xybyay 2 )( 2    n yy

48 13 - 48 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The Standard Error of Estimate Find the standard error of estimate for the problem involving the number of pages in a book and the selling price. 10.408  28 )200,397(05143.0)636(48606,51    x y xy x 2 y 2 4 900 636 397 200 3 150 00051 606  Previously:  S e 2 2     n xyxybyay

49 13 - 49 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Assumptions Underlying Linear Regression  For each value of x, there is a group of y values, and these y values are normally distributed  The means of these normal distributions of y values all lie on the straight line of regression  The standard deviations of these normal distributions are equal  The y values are statistically independent. This means that in the selection of a sample the y values chosen for a particular x value do not depend on the y values for any other x values

50 13 - 50 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Confidence Interval The confidence interval for the mean value of y for a given value of x is given by: 31.1514.89  8 )4 900( 0001503 )5.612800( 8 1 2 2    )408.10(447.214.89  Previously: x y xy x 2 y 2 4 900 636 397 200 3 150 00051 606  )( )(1 2 2 2     n x n e y 0 t α/2(n-2)  S xx 0  x

51 13 - 51 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Prediction Interval The prediction interval for an individual value of y for a given value of x is given by: )408.10(447.214.89  8 )4 900( 0001503 )5.612800( 8 1 2 2    1 + 72.2914.89  )( )(1 2 2 2     n x n e y 0 t α/2(n-2)  S xx 0  x Previously: x y xy x 2 y 2 4 900 636 397 200 3 150 00051 606 

52 13 - 52 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The estimated selling price for a book with 800 pages is $89.14 The standard error of estimate is $10.41 The 95 percent confidence interval for all books with 800 pages is $89.14 + $15.31 This means the limits are between $73.83 and $104.45 The 95 percent prediction interval for a particular book with 800 pages is $89.14 + $29.72 The means the limits are between $59.42 and $118.86 Summarizing the Results These results appear in the following MINITAB output.

53 13 - 53 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. The regression equation is Price = 48.0 + 0.0514 Pages Predictor Coef SE Coef T P Constant 48.00 16.94 2.83 0.030 Pages 0.05143 0.02700 1.90 0.105 S = 10.41 R-Sq = 37.7% R-Sq(adj) = 27.3% Analysis of Variance Source DF SS MS F P Regression 1 393.4 393.4 3.63 0.105 Residual Error 6 650.6 108.4 Total 7 1044.0 Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI 1 89.14 6.26 (73.82,104.46) (59.41,118.88) Regression Analysis: Price versus Pages

54 13 - 54 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. EXCEL output: Price vs. Pages

55 13 - 55 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Test your learning … www.mcgrawhill.ca/college/lind Click on… Online Learning Centre for quizzes extra content data sets searchable glossary access to Statistics Canada’s E-Stat data …and much more!

56 13 - 56 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. This completes Chapter 13


Download ppt "13 - 1 Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved."

Similar presentations


Ads by Google