Chapter 4. Correlation and Regression Correlation is a technique that measures the strength of the relationship between two continuous variables. For.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.2.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 4 The Relation between Two Variables
MAT 105 SPRING 2009 Quadratic Equations
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
The Simple Regression Model
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Linear Correlation To accompany Hawkes lesson 12.1 Original content by D.R.S.
Lecture 3: Bivariate Data & Linear Regression 1.Introduction 2.Bivariate Data 3.Linear Analysis of Data a)Freehand Linear Fit b)Least Squares Fit c)Interpolation/Extrapolation.
The Line of Best Fit Linear Regression. Definition - A Line of Best or a trend line is a straight line on a Scatter plot that comes closest to all of.
Descriptive Methods in Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation and Regression
Relationship of two variables
Sections 9-1 and 9-2 Overview Correlation. PAIRED DATA Is there a relationship? If so, what is the equation? Use that equation for prediction. In this.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Researchers, such as anthropologists, are often interested in how two measurements are related. The statistical study of the relationship between variables.
Introduction A correlation between two events simply means that there is a consistent relationship between two events, and that a change in one event implies.
2-5 Using Linear Models Make predictions by writing linear equations that model real-world data.
Section 12.1 Scatter Plots and Correlation HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems,
Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Correlation – Pearson’s. What does it do? Measures straight-line correlation – how close plotted points are to a straight line Takes values between –1.
Association between 2 variables
Lesson Scatterplots and Correlation. Knowledge Objectives Explain the difference between an explanatory variable and a response variable Explain.
© 2010 Pearson Prentice Hall. All rights reserved. CHAPTER 12 Statistics.
Chapter 10 Correlation and Regression
Objectives (IPS Chapter 2.1)
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Correlation Correlation is used to measure strength of the relationship between two variables.
Section 12.2 Linear Regression HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc. All.
Section 4.1 Scatter Diagrams and Correlation. Definitions The Response Variable is the variable whose value can be explained by the value of the explanatory.
+ Warm Up Tests 1. + The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 3: Describing Relationships Section 3.1 Scatterplots.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
12/5/2015 V. J. Motto 1 Chapter 1: Linear Models V. J. Motto M110 Modeling with Elementary Functions 1.4 Linear Data Sets and “STAT”
 Describe the association between two quantitative variables using a scatterplot’s direction, form, and strength  If the scatterplot’s form is linear,
2-7 Curve Fitting with Linear Models Warm Up Lesson Presentation
Correlation The apparent relation between two variables.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
2.5 Using Linear Models P Scatter Plot: graph that relates 2 sets of data by plotting the ordered pairs. Correlation: strength of the relationship.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.2.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Scatterplots and Linear Regressions Unit 8. Warm – up!! As you walk in, please pick up your calculator and begin working on your warm – up! 1. Look at.
Section 1.3 Scatter Plots and Correlation.  Graph a scatter plot and identify the data correlation.  Use a graphing calculator to find the correlation.
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Over Lesson 4–5 5-Minute Check 1 A.positive B.negative C.no correlation The table shows the average weight for given heights. Does the data have a positive.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Unit 5E Correlation and Causality. CORRELATION Heights and weights Study Time and Test Score Available Gasoline and Price of Gasoline A correlation exists.
Copyright © 2017, 2014 Pearson Education, Inc. Slide 1 Chapter 4 Regression Analysis: Exploring Associations between Variables.
Topics
Review and Preview and Correlation
Objectives Fit scatter plot data using linear models with and without technology. Use linear models to make predictions.
SIMPLE LINEAR REGRESSION MODEL
Using the TI84 Graphing Calculator
Correlation and Regression
Topic 8 Correlation and Regression Analysis
Section 11.1 Correlation.
Presentation transcript:

Chapter 4

Correlation and Regression Correlation is a technique that measures the strength of the relationship between two continuous variables. For example, we could measure how strong the relationship is between people’s heights and their weights. A chart that defines a person’s “ideal” weight for a given height is constructed through the statistical technique called regression. Regression is a statistical technique that produces a model of the relationship (correlation) between two variables.

Linear Correlation Why would we want to know if there is a relationship between two variables? Possible correlation questions: Is there a relationship between a person’s income and intelligence? Is there a relationship between a country’s food supply and mortality rate? Is there a relationship between the average length of schooling for citizens in a country and the country’s life expectancy? Linear correlation will help us determine if there is a relationship, and how strong, or weak, that relationship is.

The Scatter Diagram Consider a sample of 12 randomly selected females attending Nassau Community College. We measure each female’s height and weight. Height and weight are the two continuous variables. We’ll label the height variable “x” and the weight variable “y.” For each female, we have a pair of numbers, height and weight, x and y. The pair of numbers can also be written as (x,y) which is called an ordered pair. The ordered pair (63, 123) would indicate that this student has height 63 inches and weight 123 pounds.

The Scatter Diagram A scatter diagram is a graph representing the ordered pairs of data on a set of axes. We start with two lines, a horizontal and vertical line, to represent the two axes. The x-axis (horizontal) represents the x-values; these are the heights. The x-axis is labeled “height” The y-axis (vertical) represents the y-values; these are the weights. The y-axis is labeled “weight.”

The Scatter Diagram We place a “dot” (a point) on the diagram to represent each ordered pair of measurements. For the first female with the ordered pair (62, 123), We go to 62 on the x-axis (height) Then up to 123 on the y-axis (weight)

The Scatter Diagram Do you think that the scatter diagram shows a relationship between a female’s height and her weight? Yes. Visual inspection of a scatter diagram can help to determine whether there is an apparent relationship (correlation) between the two variables and what type of relationship this is. There are 3 basic types of relationship we will encounter. Positive correlation Negative correlation No linear correlation

Positive correlation A positive correlation between two variables, x and y, occurs when high measurements for the x variable tend to be associated with high measurements for the y-variable and Low measurements for the x-variable tend to be associated with low measurements for the y-variable. The female height/weight example is an example of positive correlation because High measurements of the x-variable (height) tend to be associated with high measurements of the y-variable (weight) And low measurements for height tend to be associated with low measurements for weight.

Positive correlation The appearance of positive correlation is one in which the points move up towards the right of the scatter diagram. If we approximate a line through the dots of the scatter diagram, we can see that they follow a straight-line path. A linear relationship has a graph is forms a line.

Negative correlation A negative correlation between two variables, x and y, occurs when high measurements for the x variable tend to be associated with low measurements for the y-variable and Low measurements for the x-variable tend to be associated with high measurements for the y-variable.

No Linear Correlation No linear correlation means there is no linear relationship between the two variables. That is, high and low measurements for the two variables are not associated in any predictable straight line pattern.

Scatter Diagram on the TI 83/84 Put the x values into L1 on your calculator. (STAT->EDIT) Put the y-values into L2.

Scatter Diagram on the TI 83/84 Turn on STAT PLOT (2 nd Stat Plot) make sure only one stat plot is on Choose the scatter diagram from the Type menu. Xlist should be the list containing the x values. Ylist should be the list containing the y values.

Scatter Diagram on the TI 83/84 Clear out data from Y= Click Zoom-> 9: Zoom Stat Do these variables, x and y have a positive correlation, negative correlation, or no linear correlation? Positive correlation.

Example 4.2 Use the sample data to construct a scatter diagram on your calculator. Indicate the type of correlation, if any exist.

Example 4.2 As we examine the scatter diagram from left to right, the pattern of the points are going in a downward direction. As the values for x increase, the values for y decrease. The scatter diagram indicates a negative correlation.

The Coefficient of Linear Correlation What type of correlation is shown in these two scatter diagrams? Both show negative correlations. What is the difference between them? The negative correlation for the scatter diagram on the right is stronger than that on the left. The closer the points of the scatter diagram approximate a straight line, the stronger the linear correlation.

The Coefficient of Linear Correlation To measure how close the points on a scatter diagram come to forming a straight line, we use the following formula: r, is Pearson’s Correlation Coefficient, or just correlation coefficient, and it measures the strength of a linear relationship between two variables for a sample. x represents the data values for the first variable y represents the data values for the second variable n represents the number of pairs of data values

The Coefficient of Linear Correlation The values for r can range from -1 to 1. A value of r=1 represents the strongest positive linear correlation possible. It indicates a perfect positive linear correlation. This means that all points of the scatter diagram will lie on a straight line which is sloping upward from left to right. A value of r= -1 represents the strongest negative linear correlation possible. It indicates a perfect negative linear correlation. This means that all points of the scatter diagram will lie on a straight line which is sloping downward from left to right. A value of r=0 represents no linear correlation between the two variables.

The Coefficient of Linear Correlation r = 1 r = -1 r = 0 r ≈ -.98r ≈.97

Correlation Coefficient on the Calculator You can use the formula for r as shown on pages 177 through 179, but an equivalent method is to use LinReg on the calculator. Before starting, you must turn “Diagnostics On.” Once turned on, you won’t have to adjust this setting again unless you reset your calculator. 2 nd -> Catalog -> D -> DiagnosticOn -> Enter -> Enter

Example 4.5 Use the sample data in the table to calculate the sample correlation coefficient, r. 1. Put the x values in L1 and the y values in L2. 2. Press STAT -> CALC -> 4: LinReg(ax+b) -> ENTER 3. Enter the two lists seperated by a comma. LinReg (L1, L2) 4. Enter

Example 4.5 The correlation coefficient, r, is Remember, the correlation coefficient is a number between -1 and 1 and represents how strong a linear relationship the two variables have. The closer the number is to 1, the stronger the positive linear relationship. The closer to -1, the stronger the negative linear relationship. r≈-.97 is close to -1 and represents a strong negative linear relationship.

Real World Application Use the data in the table to calculate the correlation coefficient, r, to measure the strength of the relationship between the two variables. The data in the table is from 2005 and was gathered from the Earth Trends web site:

CountryAverage length of schooling (in years) Life expectancy Australia Bolivia Botswana China Ethiopia650.7 Iraq Mexico India Romania Rwanda943.4 South Africa Spain Sweden United States1677.4

Real World Application Rounded to 2 decimal places, the correlation coefficient is r ≈.76. What type of correlation is this? Because r is positive, this is a positive correlation. To get a better view of the data, look at the scatter diagram.

Real World Application We can see that the dots are moving upward as we look at this diagram from left to right, But it is not a perfect positive correlation because the dots do not form a straight line. Very, very rarely will real-world variables form a perfect linear relationship.

Linear Regression Analysis In the real world application, we saw that a positive linear correlation exists between a country’s average schooling length and life expectancy. What if we wanted to estimate a country’s life expectancy by simply knowing the average length of schooling? Knowing that there is a significant linear correlation from the sample data, we can create a line that “best fits” the sample data. Then we can use the line to estimate other values for countries not part of the sample. A linear model (equation of a line) can be developed to predict a value for the dependent variable (y) given a value for the dependent variable (x).

Linear Regression Analysis For example, a strong positive correlation has been shown to exist between high school students’ standardized test results and success the first year of college as measured by the students’ GPAs. By creating a linear model (equation of a line), we can predict the 1 st year college success of a student with particular standardized test score.

Linear Regression Analysis Linear regression analysis provides us with a linear model (an equation) that can be used to predict the value of the y variable (college GPA) given the value of the x variable (standardized test scores). The predicted value for y may not be exactly correct, but it will be a “close” estimate. The line that is created by linear regression analysis is the “best fit” line between the points that is positioned closely among all the sample points. The line that is created is called the regression line.

Linear Regression Analysis Regression line formula: y’ = ax + b Where y’ is the predicted value of y, the dependent variable given the value of x, the independent variable. a and b are regression coefficients obtained by the formula:

Real World Application The data in the following table is from 2005 and was gathered from the Earth Trends web site: CountryAverage length of schooling (in years) Life expectancy Australia Bolivia Botswana China Ethiopia650.7 Iraq Mexico India Romania Rwanda943.4 South Africa Spain Sweden United States1677.4

Real World Example We’ve shown that there is a positive correlation between the average length of schooling and life expectancy for a country’s population. The data pairs in the previous table represent data from a random sample of 12 countries. Use the sample to develop a regression line to prediction the life expectancy given the average length of schooling of a country. Use this line to predict the life expectancy for a country whose average length of schooling is: 15 years 17 years Graph the scatter plot and regression line together.

Real World Example Which variable, schooling years or life expectancy, do we want to predict? This is the dependent variable. We want to predict life expectancy. It makes sense to try to predict the life expectancy for a given length of schooling. Values to be predicted (dependent variable) = life expectancies = y Values given (independent variable) = length of schooling = x

Real World Example The formulas for finding a and b are lengthy. To avoid errors that can occur by doing the calculations by hand, we’ll use the calculator to find a and b. Put all the x values into L1 and all the y values into L2. STAT -> CALC -> 4: LinReg(ax+b) -> ENTER Remember to always use the order LinReg x-list, y-list

Real World Example The values for a and b that you get can be rounded to 2 decimal places. a = 2.79 b = The equation of the regression line, y’=ax+b, becomes: y’= 2.79x

Real World Example For the regression line found y’= 2.79x predict the life expectancy for a country if the average length of schooling is: (a) 15 years (b) 17 years (a) The average length of schooling is the x-variable, so we will substitute 15 for x in our equation: y’= 2.79x y’= 2.79(15) = 71.3 The predicted life expectancy for a country where the average length of schooling is 15 years is about 71.3 years.

Real World Example (b) 17 years (b) The average length of schooling is the x-variable, so we will substitute 17 for x in our equation: y’= 2.79x y’= 2.79(17) = The predicted life expectancy for a country where the average length of schooling is 17 years is about 78.9 years.

Real World Example Use the regression line equation and the sample data pairs from example the real world example to graph, on the same axes, the scatter diagram of the sample data and the regression line. Make sure all the setting for STAT Plot are correct and are using the 2 lists used for this problem. You can check the scatter diagram first by pressing Zoom->9: Zoom Stat

Real World Example Press “Y=“ at the top left. Put in the equation 2.79x Press Graph at the top right. Notice that the regression line “best fits” the sample data.

The Coefficient of Determination We have shown that there is a positive linear correlation between the average length of schooling and life expectancy of a country’s population. But there are also other factors that influence the life expectancy that exist outside of our data. The degree of influence that one variable (schooling) has on another variable (life expectancy) can be found with a number called the coefficient of determination.

The Coefficient of Determination In other words, how much of an influence does average schooling length have on life expectancy? The answer to this question will be a percentage, visually shown here as the part of the pie chart in blue. The coefficient of determination measures the proportion of the variance of the dependent variable y that can be accounted for by the variance of the independent variable x. Simply put, how much does y (life expectancy) depend on x (average length of schooling)? We find the coefficient of determination by squaring the coefficient of correlation, r.

The Coefficient of Determination In our example, r=0.76. The coefficient of determination: Expressed as a percentage, r 2 = 58% To interpret the meaning of the coefficient determination, we can form the following general explanation: ___% of the variability in (dependent variable y) can be accounted for by the variability in (independent variable x). 58% of the variability in a country’s life expectancy can be accounted for by the variability in the average length of schooling.

The Coefficient of Determination The coefficient of determination, r 2 = 58%, suggests that there is some other reasons why a country’s life expectancy is a certain amount. Since the coefficient of determination is 58%, we may conclude that the remaining 42% of variability is due to other unexplained factors. The unexplained amount is out of the scope of the problem. We can just accept that there are other factors that contribute to the variable life expectancy.

A note of caution regarding the interpretation of correlation results Two variables may have a significant linear relationship, but it doesn’t imply that there is a cause-and-effect relationship. In other words, the presence of one variable does not (necessarily) cause the presence of the variable. For example, the number of storks nesting in various European towns in the early 1900’s and the number of human babies born in the same towns during this period have a very high correlation. However, we can’t conclude that an increase in storks will cause an increase in babies (or vice versa). A significant linear correlation should not be interpreted to mean that a change in one variable caused a change in the other variable. Rather, changes in one variable are accompanied by changes in the other variable.