2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Regression Greg C Elvers.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Definition  Regression Model  Regression Equation Y i =  0 +  1 X i ^ Given a collection of paired data, the regression equation algebraically describes.
Chapter 6: Exploring Data: Relationships Lesson Plan
Chapter 4 Describing the Relation Between Two Variables
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation Analysis
REGRESSION AND CORRELATION
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Math 227 Elementary Statistics Math 227 Elementary Statistics Sullivan, 4 th ed.
SIMPLE LINEAR REGRESSION
CHAPTER 3 Describing Relationships
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
Least Squares Regression
Correlation & Regression Math 137 Fresno State Burger.
Regression, Residuals, and Coefficient of Determination Section 3.2.
1 1 Slide Simple Linear Regression Chapter 14 BA 303 – Spring 2011.
Linear Regression Analysis
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
Descriptive Methods in Regression and Correlation
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Relationship of two variables
Correlation Scatter Plots Correlation Coefficients Significance Test.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Chapter 4 Correlation and Regression Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Biostatistics Unit 9 – Regression and Correlation.
Chapter 6: Exploring Data: Relationships Lesson Plan Displaying Relationships: Scatterplots Making Predictions: Regression Line Correlation Least-Squares.
Chapter 6 & 7 Linear Regression & Correlation
Ch4 Describing Relationships Between Variables. Pressure.
Anthony Greene1 Regression Using Correlation To Make Predictions.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
1.6 Linear Regression & the Correlation Coefficient.
Chapter 10 Correlation and Regression
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Simple Linear Regression. The term linear regression implies that  Y|x is linearly related to x by the population regression equation  Y|x =  +  x.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
LECTURE 9 Tuesday, 24 FEBRUARY STA291 Fall Administrative 4.2 Measures of Variation (Empirical Rule) 4.4 Measures of Linear Relationship Suggested.
Chapter 4 Summary Scatter diagrams of data pairs (x, y) are useful in helping us determine visually if there is any relation between x and y values and,
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Least Squares Regression.   If we have two variables X and Y, we often would like to model the relation as a line  Draw a line through the scatter.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 3 Association: Contingency, Correlation, and Regression Section 3.3 Predicting the Outcome.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
4.2 – Linear Regression and the Coefficient of Determination Sometimes we will need an exact equation for the line of best fit. Vocabulary Least-Squares.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Copyright © Cengage Learning. All rights reserved. 8 4 Correlation and Regression.
The simple linear regression model and parameter estimation
Department of Mathematics
Lecture 9 Sections 3.3 Objectives:
Correlation & Regression
Chapter 5 STATISTICS (PART 4).
CHAPTER 3 Describing Relationships
Correlation and Regression
Presentation transcript:

2.2 Correlation Correlation measures the direction and strength of the linear relationship between two quantitative variables.

Idea: Given two quantitative variables, we would like to be able to associate some number to those variables which tells us “how close” the data is to forming a straight line. We will call this number the correlation coefficient. It is denoted by r.

Which graph has stronger correlation? We would like to be able to answer this mathematically rather than just appeal to the graphs.

The Formula The formula for the correlation coefficient is difficult to motivate, so take it for granted: Given n pairs of data values, the correlation coefficient r is given by the formula r=.636 in the first graph, and r=.543 in the second.

What does r tell us? Now that we have a paradigm case, we may discuss some properties of r, the correlation coefficient. r is always between -1 and 1 (inclusive). Hence if your correlation coefficient falls outside of this range, something has gone awry. If r=1 or -1, then all the points of the scatter diagram lie on the regression line. When r>0, the slope of the regression line is positive (positive association). When r 0, the slope of the regression line is positive (positive association). When r<0, the slope of the regression line is negative (negative association). Thus the closer r is to -1 or 1, the stronger the relationship. If r=0, then there is no linear relationship. If r is close to 0, then there is little to no linear relationship. Let’s draw a few examples on the board to illustrate.

Other Properties of r r is not a resistant measure. r does not distinguish between the explanatory and the response variable. This is easily seen from looking at the formula for r.

2.3 Least-Squares Regression

The Questions Given a data set, does it seem to conform to some sort of pattern? In particular, can we find an equation that more or less “fits” the data? If so, this can be used to predict values. The easiest equation is a linear equation (a line), so this is what we concentrate on in this section. Here there is a distinction between explanatory and response variables.

Linear Equations Suppose Irving Oil charges a $40 flat rate to send someone out to a job and $6 for each hour they work on that job. What equation models the data? This is an example of a linear equation; that is, any equation of the form y=b 0 x+b 1 where b 0 and b 1 are fixed numbers. The number m is the slope of the linear equation and b is the y- intercept. When b 0 >0, we say there is a positive linear relationship between x and y. When b 0 0, we say there is a positive linear relationship between x and y. When b 0 <0, we say there is a negative linear relationship between x and y.

Tables and Graphs Let’s make a table for the linear equation we just found on the board. Now we’ll plot the points, and draw the line through them. Identify graphically the slope and y-intercept. Now we’ll plot the points, and draw the line through them. Identify graphically the slope and y-intercept. In this case, there is a perfect linear relationship between the x and y values.

Consider the following table of data values. Draw a scatter plot of the data and a line that approximately fits the data. Use a simple method to write down an equation for the approximating line. Note that we can now make rudimentary predictions. How do we mathematically find an equation of such a line, and how do we find the best one? XY

Residuals We use the notation y to denote an observed value and ŷ (y hat) to denote an estimate of the observed value. It follows that the closer y is to ŷ, the better our estimate is. Hence, we define the residual (or error) of an estimate to be y- ŷ=e. Compute the residuals in the example on the board. It now becomes clear that coming up with the line of best fit is equivalent to minimizing the residuals in some way.

How to minimize What do we mean by “minimizing the residuals”? One idea is to add up all the residuals. But recall that when we were discussing variance, we ran into the problem of negatives cancelling with positives when we summed over all differences. The case here is similar. We agree to sum over the squares of the residuals. This is the idea of the least squares regression line of y on x which is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

Line of Best Fit Using technical machinery including calculus and magic, the line of best fit can be found as follows: Suppose we have a scatter diagram with n points. The line of best fit or regression line has the form ŷ=b 0 +b 1 x where

Example Find the regression line for the data XY y= x

Correlation vs. Regression Recall that the correlation r ignores the distinction between explanatory and response variables, while regression does not. But r is in the formula for the regression line. It turns out that r 2 is the fraction of the variation in the values of y that is explained by the least- squares regression of y on x. Let’s look at an example.

Example Time (min)Length (cm) Time (min)Length (cm)

r=.996 The straight-line relationship between length of icicles and the time it takes them to grow that length explains about r 2 =(.996) 2 = % of the vertical scatter in time.