1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.

Slides:



Advertisements
Similar presentations
Chapter 10 Correlation and Regression
Advertisements

Sections 10-1 and 10-2 Review and Preview and Correlation.
Lesson 10: Linear Regression and Correlation
Section 10-3 Regression.
Inference for Regression
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-4 Variation and Prediction Intervals.
Correlation and Regression
Chapter 4 The Relation between Two Variables
1 Objective Investigate how two variables (x and y) are related (i.e. correlated). That is, how much they depend on each other. Section 10.2 Correlation.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 12 Simple Regression
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Correlation and Regression Analysis
Correlation & Regression
STATISTICS ELEMENTARY C.M. Pascual
Descriptive Methods in Regression and Correlation
Linear Regression.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Chapter 10 Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Statistics Class 7 2/11/2013. It’s all relative. Create a box and whisker diagram for the following data(hint: you need to find the 5 number summary):
Variation and Prediction Intervals
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Slide Slide 1 Warm Up Page 536; #16 and #18 For each number, answer the question in the book but also: 1)Prove whether or not there is a linear correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Correlation Section The Basics A correlation exists between two variables when the values of one variable are somehow associated with the values.
Chapter 10 Correlation and Regression Lecture 1 Sections: 10.1 – 10.2.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Slide Slide 1 Chapter 10 Correlation and Regression 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Regression and Correlation
Review and Preview and Correlation
CHS 221 Biostatistics Dr. wajed Hatamleh
Elementary Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 10 Correlation and Regression
Lecture Slides Elementary Statistics Eleventh Edition
Lecture Slides Elementary Statistics Tenth Edition
Correlation and Regression
Correlation and Regression Lecture 1 Sections: 10.1 – 10.2
Lecture Slides Elementary Statistics Eleventh Edition
Algebra Review The equation of a straight line y = mx + b
Created by Erin Hodgess, Houston, Texas
Chapter 9 Correlation and Regression
Presentation transcript:

1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they depend on each other.

2 Example x is the height of mother y is the height of daughter Question: are the heights of daughters independent of the height of their mothers? Or is there a correlation between the heights of mothers and those of daughters? If yes, how strong is it?

3 Example: This table includes a random sample of heights of mothers, fathers, and their daughters. Heights of mothers and their daughters in this sample seem to be strongly correlated… But heights of fathers and their daughters in this sample seem to be weakly correlated (if at all).

4 Section 10-2 Correlation between two variables (x and y)

5 Definition A correlation exists between two variables when the values of one somehow affect the values of the other in some way.

6 Key Concept Linear correlation coefficient, r, is a numerical measure of the strength of the linear relationship between two variables, x and y, representing quantitative data. Then we use that value to conclude that there is (or is not) a linear correlation between the two variables. Note: r always belongs in the interval (-1,1), i.e., –1  r  1

7 Exploring the Data We can often see a relationship between two variables by constructing a scatterplot.

8 Scatterplots of Paired Data

9

10 Scatterplots of Paired Data

11 Requirements 1. The sample of paired ( x, y ) data is a random sample of quantitative data. 2. Visual examination of the scatterplot must confirm that the points approximate a straight-line pattern. 3. The outliers must be removed if they are known to be errors. (We will not do that in this course…)

12 Notation for the Linear Correlation Coefficient n = number of pairs of sample data  denotes the addition of the items indicated.  x denotes the sum of all x- values.  x 2 indicates that each x - value should be squared and then those squares added. (  x ) 2 indicates that the x- values should be added and then the total squared.

13 Notation for the Linear Correlation Coefficient  xy indicates that each x -value should be first multiplied by its corresponding y- value. After obtaining all such products, find their sum. r = linear correlation coefficient for sample data.  = linear correlation coefficient for population data, i.e. linear correlation between two populations.

14 n(  xy) – (  x)(  y) n(  x 2 ) – (  x) 2 n(  y 2 ) – (  y) 2 r =r = The linear correlation coefficient r measures the strength of a linear relationship between the paired values in a sample. We should use computer software or calculator to compute r Formula

15 Enter x-values into list L 1 and y-values into list L 2 Press STAT and select TESTS Scroll down to LinRegTTest press ENTER Make sure that XList: L 1 and YList: L 2 choose:  &  ≠ 0 Press on Calculate Read r 2 =… and r =… Also read the P-value p = … Linear correlation by TI-83/84

16 Interpreting r Using Table A-6: If the absolute value of the computed value of r, denoted | r |, exceeds the value in Table A-6, conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Note: In most cases we use the significance level  = 0.05 (the middle column of Table A-6).

17 Interpreting r Using P-value computed by calculator: If the P-value is ≤ , conclude that there is a linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of a linear correlation. Note: In most cases we use the significance level  = 0.05.

18 Caution Know that the methods of this section apply only to a linear correlation. If you conclude that there is no linear correlation, it is possible that there is some other association that is not linear.

19  Round to three decimal places so that it can be compared to critical values in Table A-6.  Use calculator or computer if possible. Rounding the Linear Correlation Coefficient r

20 Properties of the Linear Correlation Coefficient r 1. –1  r  1 2.if all values of either variable are converted to a different scale, the value of r does not change. 3. The value of r is not affected by the choice of x and y. Interchange all x- and y- values and the value of r will not change. 4. r measures strength of a linear relationship. 5. r is very sensitive to outliers, they can dramatically affect its value.

21 Example: The paired pizza/subway fare costs from Table 10-1 are shown here in a scatterplot. Find the value of the linear correlation coefficient r for the paired sample data.

22 Example - 1: Using software or a calculator, r is automatically calculated:

23 Interpreting the Linear Correlation Coefficient r Critical Values from Table A-6 and the Computed Value of r

24 Using a 0.05 significance level, interpret the value of r = found using the 62 pairs of weights of discarded paper and glass listed in Data Set 22 in Appendix B. Is there sufficient evidence to support a claim of a linear correlation between the weights of discarded paper and glass? Example - 2:

25 Example: Using Table A-6 to Interpret r : If we refer to Table A-6 with n = 62 pairs of sample data, we obtain the critical value of (approximately) for  = Because |0.117| does not exceed the value of from Table A-6, we conclude that there is not sufficient evidence to support a claim of a linear correlation between weights of discarded paper and glass.

26 Interpreting r : Explained Variation The value of r 2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

27 Using the pizza subway fare costs, we have found that the linear correlation coefficient is r = What proportion of the variation in the subway fare can be explained by the variation in the costs of a slice of pizza? With r = 0.988, we get r 2 = We conclude that (or about 98%) of the variation in the cost of a subway fares can be explained by the linear relationship between the costs of pizza and subway fares. This implies that about 2% of the variation in costs of subway fares cannot be explained by the costs of pizza. Example:

28 Common Errors Involving Correlation 1. Causation: It is wrong to conclude that correlation implies causality. 2. Linearity: There may be some relationship between x and y even when there is no linear correlation.

29 Caution Know that correlation does not imply causality. There may be correlation without causality.

30 Section 10-3 Regression

31 Regression The typical equation of a straight line y = mx + b is expressed in the form y = b 0 + b 1 x, where b 0 is the y -intercept and b 1 is the slope. ^ The regression equation expresses a relationship between x (called the explanatory variable, predictor variable or independent variable), and y (called the response variable or dependent variable). ^

32 Definitions  Regression Equation Given a collection of paired data, the regression equation  Regression Line The graph of the regression equation is called the regression line (or line of best fit, or least squares line). y = b 0 + b 1 x ^ algebraically describes the relationship between the two variables.

33 Example:

34 Notation for Regression Equation y -intercept of regression equation Slope of regression equation Equation of the regression line Population Parameter Sample Statistic  0 b 0  1 b 1 y =  0 +  1 x y = b 0 + b 1 x ^

35 Requirements 1. The sample of paired ( x, y ) data is a random sample of quantitative data. 2. Visual examination of the scatterplot shows that the points approximate a straight-line pattern. 3. Any outliers must be removed if they are known to be errors. Consider the effects of any outliers that are not known errors.

36 Rounding the y -intercept b 0 and the Slope b 1  Round to three significant digits.  If you use the formulas from the book, do not round intermediate values.

37 Example: Refer to the sample data given in Table 10-1 in the Chapter Problem. Use technology to find the equation of the regression line in which the explanatory variable (or x variable) is the cost of a slice of pizza and the response variable (or y variable) is the corresponding cost of a subway fare. (CPI=Consumer Price Index, not used)

38 Example: Requirements are satisfied: simple random sample; scatterplot approximates a straight line; no outliers Here are results from four different technologies technologies

39 Example: Note: in TI-83/84, a means b 0 and b means b 1 All of these technologies show that the regression equation is y = x, where y is the predicted cost of a subway fare and x is the cost of a slice of pizza. We should know that the regression equation is an estimate of the true regression equation. ^ ^

40 Example: Graph the regression equation (from the preceding Example) on the scatterplot of the pizza/subway fare data and examine the graph to subjectively determine how well the regression line fits the data.

41 Example:

42 1.Predicted value of y is y = b 0 + b 1 x 2.Use the regression equation for predictions only if the graph of the regression line on the scatterplot confirms that the regression line fits the points reasonably well. Using the Regression Equation for Predictions 3.Use the regression equation for predictions only if the linear correlation coefficient r indicates that there is a linear correlation between the two variables. ^

43 4.Use the regression line for predictions only if the value of x does not go much beyond the scope of the available sample data. (Predicting too far beyond the scope of the available sample data is called extrapolation, and it could result in bad predictions.) Using the Regression Equation for Predictions 5.If the regression equation does not appear to be useful for making predictions, the best predicted value of a variable is its point estimate, which is its sample mean, y. _

44 Strategy for Predicting Values of Y

45 If the regression equation is not a good model, the best predicted value of y is simply y, the mean of the y values. Remember, this strategy applies to linear patterns of points in a scatterplot. Using the Regression Equation for Predictions _

46 For a pair of sample x and y values, the residual is the difference between the observed sample value of y and the y-value that is predicted by using the regression equation. That is, Definition residual = observed y – predicted y = y – y ^

47 Residuals

48 A straight line satisfies the least-squares property if the sum of the squares of the residuals is the smallest sum possible. Definition