Correlation and Regression

Slides:



Advertisements
Similar presentations
Section 10-3 Regression.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-4 Variation and Prediction Intervals.
Chapter 4 The Relation between Two Variables
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Chapter 12 Simple Linear Regression
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Chapter 9: Correlation and Regression
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
SIMPLE LINEAR REGRESSION
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Simple Linear Regression and Correlation
Linear Regression/Correlation
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Correlation and Linear Regression
STATISTICS ELEMENTARY C.M. Pascual
Linear Regression.
Chapter 10 Correlation and Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Correlation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 10 Correlation and Regression
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Free Powerpoint Templates ROHANA BINTI ABDUL HAMID INSTITUT E FOR ENGINEERING MATHEMATICS (IMK) UNIVERSITI MALAYSIA PERLIS.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
REGRESSION AND CORRELATION SIMPLE LINEAR REGRESSION 10.2 SCATTER DIAGRAM 10.3 GRAPHICAL METHOD FOR DETERMINING REGRESSION 10.4 LEAST SQUARE METHOD.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-1 Overview Overview 10-2 Correlation 10-3 Regression-3 Regression.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Review and Preview and Correlation
Elementary Statistics
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Chapter 10 Correlation and Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Eleventh Edition
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Lecture Slides Elementary Statistics Eleventh Edition
Created by Erin Hodgess, Houston, Texas
Chapter 9 Correlation and Regression
Presentation transcript:

Correlation and Regression 9-2 / 9.3 Correlation and Regression

Linear Correlation Coefficient r Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample nxy - (x)(y) r = n(x2) - (x)2 n(y2) - (y)2

Formula for b0 and b1 b0 = (y-intercept) b1 = (slope) (y) (x2) - (x) (xy) b0 = (y-intercept) n(x2) - (x)2 n(xy) - (x) (y) b1 = (slope) n(x2) - (x)2 Encourage the use of calculators for these formulas. Most inexpensive non-graphics calculators will compute these two values after the data has been entered into the calculator.

Review Calculations 0.27 2 1.41 3 2.19 2.83 6 4 1.81 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Find the Correlation and the Regression Equation (Line of Best Fit)

r = 0.842 Review Calculations b0 = 0.549 b1= 1.48 y = 0.549 + 1.48x 0.27 2 1.41 3 2.19 2.83 6 4 1.81 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household Using a calculator: b0 = 0.549 b1= 1.48 y = 0.549 + 1.48x r = 0.842

Notes on correlation r represents linear correlation coefficient for a sample  (ro) represents linear correlation coefficient for a population -1  r  1 r measures strength of a linear relationship. -1 is perfect negative correlation & 1 is perfect positive correlation

Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. Discussion should be held regarding what value r needs to be in order to have a significant linear correlation.

Formal Hypothesis Test Two methods Both methods let H0: = (no significant linear correlation) H1:  (significant linear correlation)

Method 1: Test Statistic is t (follows format of earlier chapters) n - 2 Critical values: use Table A-3 with degrees of freedom = n - 2 This is the first example where the degrees of freedom for Table A-3 is different from n - 1. Special note should be made of this.

Method 2: Test Statistic is r (uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Much easier This method is preferred by some instructors because the calculations are easier.

TABLE A-6 Critical Values of the Pearson Correlation Coefficient r = .05 = .01 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256

Is there a significant linear correlation? 0.27 2 1.41 3 2.19 2.83 6 4 1.81 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household n = 8  = 0.05 H0:  = 0 H1 :  0 Test statistic is r = 0.842 Using Method 2 to solve this problem.

Is there a significant linear correlation? 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 n .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .335 .312 .294 .279 .254 .236 .220 .207 .196 = .05 = .01 n = 8  = 0.05 H0:  = 0 H1 :  0 Test statistic is r = 0.842 Critical values are r = - 0.707 and 0.707 (Table A-6 with n = 8 and  = 0.05) TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Reject = 0 Fail to reject  = 0 Reject = 0 - 1 1 r = - 0.707 r = 0.707 Sample data: r = 0.842

Is there a significant linear correlation? 0.842 > 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0:  = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject = 0 Fail to reject  = 0 Reject = 0 - 1 1 r = - 0.707 r = 0.707 Sample data: r = 0.842

Regression Definition y = b0 + b1x + e y = b0 + b1x Regression Model Regression Equation y = b0 + b1x + e y = b0 + b1x ^ Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables

Notation for Regression Equation Population Parameter Sample Statistic y-intercept of regression equation 0 b0 Slope of regression equation 1 b1 Equation of the regression line y = 0 + 1 x + e y = b0 + b1 ^ x

Regression Definition Regression Equation y = b0 + b1x Regression Line Given a collection of paired data, the regression equation y = b0 + b1x ^ algebraically describes the relationship between the two variables Regression Line (line of best fit or least-squares line) is the graph of the regression equation

Assumptions & Observations 1. We are investigating only linear relationships. 2. For each x value, y is a random variable having a normal distribution. 3. There are many methods for determining normality. 3. The regression line goes through (x, y)

Guidelines for Using The Regression Equation 1. If there is no significant linear correlation, don’t use the regression equation to make predictions. 2. Stay within the scope of the available sample data when making prediction.

Definitions Outlier Influential Points a point lying far away from the other data points Influential Points points which strongly affect the graph of the regression line The slope b1 in the regression equation represents the marginal change in y that occurs when x changes by one unit.

Residuals and the Least-Squares Property Definitions Residual (error) for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation. Least-Squares Property A straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible. ^ ^

Residuals and the Least-Squares Property x 1 2 4 5 ^ y = 5 + 4x y 4 24 8 32 y • Residual = 7 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 1 3 5 • Residual = 11 • Residual = -13 • Residual = -5 x

Definitions Total Deviation from the mean of the particular point (x, y) the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean y Explained Deviation the vertical distance y - y, which is the distance between the predicted y value and the horizontal line passing through the sample mean y Unexplained Deviation the vertical distance y - y, which is the vertical distance between the point (x, y) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.) ^ ^ ^

Unexplained, Explained, and Total Deviation y 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 (5, 32) • Unexplained deviation (y - y) Total deviation (y - y) (5, 25) ^ • Explained deviation (y - y) ^ • (5, 17) y = 17 y = 5 + 4x ^ x 1 2 3 4 5 6 7 8 9

Σ(y - y) 2 = Σ (y - y) 2 + Σ (y - y) 2 (total deviation) = (explained deviation) + (unexplained deviation) (y - y) = (y - y) + (y - y) ^ ^ (total variation) = (explained variation) + (unexplained variation) Σ(y - y) 2 = Σ (y - y) 2 + Σ (y - y) 2 ^ ^