# Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression.

## Presentation on theme: "Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression."— Presentation transcript:

Overview 4.2 Introduction to Correlation 4.3 Introduction to Regression

Scatterplots Used to summarize the relationship between two quantitative variables that have been measured on the same element Graph of points (x, y) each of which represents one observation from the data set One of the variables is measured along the horizontal axis and is called the x variable The other variable is measured along the vertical axis and is called the y variable

Predictor Variable and Response Variable The value of the x variable can be used to predict or estimate the value of the y variable The x variable is referred to as the predictor variable The y variable is called the response variable

Scatterplot Terminology Note the terminology in the caption to Figure 4.2. When describing a scatterplot, always indicate the y variable first and use the term versus (vs.) or against the x variable. This terminology reinforces the notion that the y variable depends on the x variable.

FIGURE 4.2 Scatterplot of sales price versus square footage.

Positive relationship As the x variable increases in value, the y variable also tends to increase. FIGURE 4.3 (a) Scatterplot of a positive relationship

Negative relationship As the x variable increases in value, the y variable tends to decrease FIGURE 4.3 (b) scatterplot of a negative relationship

No apparent relationship As the x variable increases in value, the y variable tends to remain unchanged FIGURE 4.3 (c) scatterplot of no apparent relationship.

4.2 Introduction to Correlation Objective: By the end of this section, I will be able to… 1) Calculate and interpret the value of the correlation coefficient.

Correlation Coefficient r Measures the strength and direction of the linear relationship between two variables. s x is the sample standard deviation of the x data values. s y is the sample standard deviation of the y data values.

Example 4.5 - Calculating the correlation coefficient r Find the value of the correlation coefficient r for the temperature data in Table 4.11. Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Interpreting the Correlation Coefficient r 1) Values of r close to 1 indicate a positive relationship between the two variables.  The variables are said to be positively correlated.  As x increases, y tends to increase as well.

Interpreting the Correlation Coefficient r 2) Values of r close to -1 indicate a negative relationship between the two variables.  The variables are said to be negatively correlated.  As x increases, y tends to decrease.

Interpreting the Correlation Coefficient r 3) Other values of r indicate the lack of either a positive or negative linear relationship between the two variables.  The variables are said to be uncorrelated  As x increases, y tends to neither increase nor decrease linearly.

Guidelines for Interpreting the Correlation Coefficient r If the correlation coefficient between two variables is greater than 0.7, the variables are positively correlated. between 0.33 and 0.7, the variables are mildly positively correlated. between –0.33 and 0.33, the variables are not correlated. between –0.7 and –0.33, the variables are mildly negatively correlated. less than –0.7, the variables are negatively correlated.

Example 4.6 - Interpreting the correlation coefficient Interpret the correlation coefficient found in Example 4.5.

Example 4.6 continued Solution In Example 4.5, we found the correlation coefficient for the relationship between high and low temperature to be r = 0.9761. r = 0.9761 very close to 1. We would therefore say that high and low temperatures for these 10 American cities are strongly positively correlated. As low temperature increases, high temperatures also tend to increase.

Equivalent Computational Formula for Calculating the Correlation Coefficient r

Example 4.7 Use the computational formula to calculate the correlation coefficient r for the relationship between square footage and sales price of the eight home lots for sale in Glen Ellyn from Table 4.6 (Example 4.3 in Section 4.1).

Summary Section 4.2 introduces the correlation coefficient r, a measure of the strength of linear association between two numeric variables. Values of r close to 1 indicate that the variables are positively correlated. Values of r close to –1 indicate that the variables are negatively correlated. Values of r close to 0 indicate that the variables are not correlated.

4.3 Introduction to Regression Objectives: By the end of this section, I will be able to… 1) Calculate the value and understand the meaning of the slope and the y intercept of the regression line. 2) Predict values of y for given values of x.

Equation of the Regression Line Approximates the relationship between x and y The equation is where the regression coefficients are the slope, b 1, and the y intercept, b 0. The “hat” over the y (pronounced “y-hat”) indicates that this is an estimate of y and not necessarily an actual value of y.

Example 4.8 - Calculating the regression coefficients b 0 and b 1 Find the value of the regression coefficients b 0 and b 1 for the temperature data in Table 4.11. Table 4.11 High and low temperatures, in degrees Fahrenheit, of 10 American cities

Example 4.8 continued Step 4: Thus, the equation of the regression line for the temperature data is

Example 4.8 continued Since y and x represent high and low temperatures, respectively, this equation is read as follows: “The estimated high temperature for an American city is 10.0533 degrees Fahrenheit plus 0.9865 times the low temperature for that city.”

Using the Regression Equation to Make Predictions For any particular value of x, the predicted value for y lies on the regression line. Example 4.11 Suppose we are considering moving to a city that has a low temperature of 47 degrees Fahrenheit (ºF) on this particular winter’s day. What would the estimated high temperature be for this city?

Example 4.11 continued Solution Plug the value of 47ºF for the variable low into the regression equation from Example 4.8: We would say: “The estimated high temperature for an American city with a low of 47ºF, is 56.4188ºF.”

Interpreting the Slope Relationship Between Slope and Correlation Coefficient The slope b 1 of the regression line and the correlation coefficient r always have the same sign. b 1 is positive if and only if r is positive. b 1 is negative if and only if r is negative.