# Chapter 3: Examining Relationships

## Presentation on theme: "Chapter 3: Examining Relationships"— Presentation transcript:

Chapter 3: Examining Relationships
AP Statistics

Quantitative Variables vs. Categorical Variables
Quantitative Variables take numerical values for which arithmetic operations such as adding and subtracting make sense. Categorical Variables place an individual into one of several groups or categories. Sometimes we want to know more than this though; sometimes we want to know if the two things are related or if one is causing the other to happen. Introduction

Introduction Response Variable measures an outcome of a study.
Explanatory Variables attempt to explain the observed outcomes. Explanatory Variables sometimes called independent variables. Response Variables sometimes called dependent variables. The response variable depends on the explanatory variable. Introduction

Introduction Example 3.1: Effect of Alcohol on Body Temperature
Alcohol has many effects on the body. One effect is a drop in body temperature. To study this effect, researchers give several different amounts of alcohol to mice, then measure the change in each mouse’s body temperature in the 15 minutes after taking the alcohol. Explanatory Variable? Response Variable? Introduction

Introduction Example 3.2: Are SAT Math and Verbal Scores Linked
Jim wants to know how the median SAT Math and Verbal scores in the 50 states (plus the District of Columbia) are related to each other. He doesn’t think that either score explains or causes the other. Jim has two related variables , and neither is an explanatory variable. Julie looks at the same data. She asks, “Can I predict a state’s median SAT Math score if I know its median SAT Verbal score?” Introduction

Introduction Example 3.2 Continued
Julie is treating the Verbal score as the explanatory variable and the Math score as the response variable. Calling one explanatory and one response doesn’t necessarily mean that one causes the other. Introduction

A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The values of one variable appear on the horizontal axis, and the values of the other appear on the vertical axis. Each individual in the data appears as the point in the plot fixed by the values of both variables for the individual. 3.1 Scatterplots

Scatterplots Example 3.3 State SAT Scores
Using Table 1.15 on page 70. Create a scatterplot showing the relationship between the Percent of graduates taking the SAT and the State Average SAT Math Score. Scatterplots

Example 3.3

How do we describe distributions? What four things do we consider?

Scatterplots have characteristics used to describe them as well:
Overall pattern and deviations from the pattern. Overall pattern is described by: Form Direction Strength of the Relationship Deviations from the Pattern Outliers Scatterplots

Form - There are two distinct clusters with lots of space between them
Form - There are two distinct clusters with lots of space between them. ACT is taken more in some states while states with high SAT participation have lower SAT Math scores. Direction – States in which a higher percentage take the SAT tend to have lower SAT Math scores, negative association. Strength – Not strong, we will come up with a measure for strength later. Scatterplots

Two variables are positively associated when above-average values of one tend to accompany above-average values if the other and below-average values also tend to occur together. Two variables are negatively associated when above-average values of one tend to accompany below-average values of the other and vice versa. Scatterplots

Example 3.4 Heating Degree-Days
The Sanchez household is about to install solar panels to reduce the cost of heating their house. In order to see how much the solar panels help, they record their consumption of natural gas before the panels are installed. Gas consumption is higher in cold weather, so the relationship between outside temperature and gas consumption is important. Table 3.1. Example 3.4 Heating Degree-Days

Explanatory Variable? Response Variable? Example 3.4

Example 3.4 Heating Degree-Days

Form? Direction? Strength? Example 3.4

Scatterplots Tips for Drawing Scatterplots
Scale the horizontal and vertical axes. The intervals must be uniform like a histogram. If the axis does not begin at zero use a symbol on the axis to denoted the break. Label both axes. If given a grid adopt a scale that uses the entire grid. Title the scatterplot. Scatterplots

3.1 Homework 3.1 through 3.4 all 3.6, 3.7 all 3.9, 3.10, 3.11 all
Read section 3.1 and the introduction to Chapter 3. 3.11 is not linear. 3.1 Homework

When showing categories, use different symbols on scatterplots to denote the categories. See Example 3.5 and 3.6. Scatterplots

We say a graph has a strong correlation if the points lie close to a line, and weak if the points are scattered about a line. Look at Figure 3.8…two depictions of the same data, the scale of a graph can confuse our eyes about the strength of data so we need a measurement for strength. 3.2 Correlation

The correlation measures the direction and strength of the linear relationship between two quantitative variables. Correlation is usually written as r. 3.2 Correlation

The mean of the x’s is x bar and the mean of the y’s is y bar.
The standard deviation of the x’s is s sub x and the standard deviation of the y’s is s sub y. x sub i denotes each of the individual x’s andy sub i denotes each of the individual y’s. 3.2 Scatterplots

Exercise 3.24 3.24 Classifying Fossils
The measurements of the lengths of two bones in five fossils of the extinct beast Arcaeopteryx: Femur 38 56 59 64 74 Humerus 41 63 70 72 84 Exercise 3.24

Exercise 3.24 A) Find the correlation r step-by-step. That is find the mean and standard deviation of the femur lengths and the humerus lengths. Then find the five standardized values for each varaible and use the formula for r.

Correlation makes no distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating the correlation. Correlation requires that both variables be quantitative, so that it makes sense to do the arithmetic indicated by the formula r. We cannot calculate a correlation between incomes of a group of people and what city they live in, because city is a categorical variable. Scatterplots

Because r uses standardized units of measurement, its value will not change if we change the units of measurement. r has no unit of measurement, it is just a number. Positive r indicates a positive association between the variables and a negative r indicates a negative association, Scatterplots

Correlation is always between -1 and 1. r= -1 and r = 1 means that all points lie on a straight line. The closer to 1 or -1 the stronger the relationship. The closer to zero the weaker the relationship. Correlation is used only for linear relationships not curved relationship. Scatterplots

Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in a scatterplot. Figure 3.9 Scatterplots

0 to 0. 2 Very weak to negligible correlation 0. 2 to 0
0.0 to 0.2 Very weak to negligible correlation 0.2 to 0.4 Weak, low correlation (not very significant) 0.4 to 0.7 Moderate correlation 0.7 to 0.9 Strong, high correlation 0.9 to 1.0 Very strong correlation Negatives work on the same scale. Scatterplots

3.3 Least-Squares Regression
Correlation measures the strength and direction of the linear relationship between any two variables. Least-Squares regression is a method for finding a line that summarizes the relationship between two variables. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes. Regression lines are used to predict the value of y for a given value of x. Regression requires an explanatory variable and a response variable. 3.3 Least-Squares Regression

3.3 Least-Squares Regression Model, LSRL
Example 3.8 Predicting Natural Gas Consumption Read and summarize the example with a partner. 3.3 Least-Squares Regression Model, LSRL

3.3 Least-Squares Regression
error = observed – predicted error = y - ỹ The least squares regression line makes the sums of the squares of these distances as small as possible. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
Figure 3.11b Take 2 minutes and summarize how this represents a least – squares idea. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
Equation of the least squares line: We have data on an explanatory variable x and a response variable y for n individuals. From the data, calculate the means and and the standard deviations sx and sy of the two variables and their correlation r. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
The least squares regression line is ỹ = a + bx The slope y – intercept 3.3 Least-Squares Regression

3.3 Least- Squares Regression
y is the observed value and ỹ is the predicted value. Every least squares regression line goes through the point 3.3 Least- Squares Regression

3.3 Least-Squares Regression
Example 3.9 Take 3 minutes with a partner. Be able to summarize this example for the class. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
TI-84 Commands Use the catalog feature to ensure Diagnostics On. This will ensure that you see r and r2. Put data into List 1 and List 2 Stat Calc option 8 linear regression y = a + bx Lin Reg (a + bx) L1, L2, Y1 will graph your linear regression line. Round a and b to four decimal places. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
The slope of the regression line b, is the amount of change in ỹ when x increases by 1. The intercept is the ỹ value when x = 0. Plot two points at the extremes of the x- values we know against their ỹ values. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
The role of r2 Read Example 3.10 with a partner and be ready to discuss in 5 minutes. 3.3 Least-Squares Regression

3.3 Least-Squares Regression

3.3 Least-Squares Regression
r2 is the proportion of the total sample variability that is explained by the least- squares regression of y on x. r2 is the coefficient of determination. SST is the total sample variation of the observations about the mean of the y’s. SSE is the remaining unexplained sample variability after fitting the line of regression. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
Example 3.11 Read through with a partner and be ready to discuss in 5 minutes. 3.3 Least-Squares Regression

3.3 Least-Squares Regression
Facts about Least-Squares Regression Fact 1 The distinction between explanatory and response variables is essential in regression. Example 3.12 Read and be able to explain. Fact 2 There is a close connection between correlation and the slope of the least-squares regression line. A change of one standard deviation in x corresponds to a change of r standard deviations in y 3.3 Least-Squares Regression

3.3 Least –Squares Regression
Facts about least-squares regression Fact 3 The least squares regression line always passes through Fact 4 The coefficient of determination r2 is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x. 3.3 Least –Squares Regression