Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Relationships

Similar presentations


Presentation on theme: "Describing Relationships"— Presentation transcript:

1 Describing Relationships
3.1 Scatterplots

2 Questions To Ask What individuals do the data describe?
What are the variables? How are they measured? Are all of the variables quantitative or is at least one a categorical variable? What did we do before? Plot the data. Describe the overall distribution (SOCS) Look at numerical summaries Check for Normality

3 Explanatory vs. Response
Domain / Range Independent/ Dependent x / y Input / Output Cause / Effect Outcome Predicts changes in the outcome

4 Example p. 144 – Explanatory or Response?
Linking SAT Math and Critical Reading Scores Julie asks, “Can I predict a state’s mean SAT Math score if I know its mean SAT Critical Reading Score?” Jim wants to know how the mean SAT Math and Critical Reading scores this year in the 50 states related to each other. For each student, identify the explanatory variable and the response variable if possible. Julie – treating the mean SAT Critical reading score as the explanatory variable and the mean SAT Math score as the response variable. Jim – just interested in exploring the relationship between the two variables. No clear explanatory and response variables.

5 Be careful with “cause”
Be careful with “cause”. Just because two variables have a relationship, does not mean one causes the other!!!!

6 Scatterplots Shows the relationship between two quantitative variables measured on the same individuals. One variable on the horizontal axis, the other on the vertical. (eXplanatory variable goes on the x-axis) Each individual is represented by a point on the plot. We had several ways to plot one-variable distributions. Scatterplots are the only way to plot two quantitative variables.

7 How to make a Scatterplot
1. Decide which variable should go on each axis. 2. Label and scale your axes. 3. Plot individual data values. Many students lose credit because they do not label their graphs. Including the proper labels is more important than graphing each point in precisely the right place. Pick nice values to mark each axis. Will not always start at 0.

8 Example p. 148 – The Endangered Manatee
The identified point represents the year In 1996, there were 732,000 powerboat registrations in Florida. That year, 60 manatees were killed by boats. Powerboats registered in Florida (1000s) and number of manatees killed from 1977 to 2010.

9 Describing Scatterplots - FODS
Form – One big group? Clusters? Linear? Curved? Outliers – Any points that deviate significantly from the overall pattern. Direction – positively associated (+ slope) negatively associated (- slope) When were describing one-variable distributions we used SOCS. Now that we are describing two quantitative variables, we will use FODS. Strength – how closely do the points follow the overall pattern?

10 Example p. 148 – The Endangered Manatee
Form – Overall linear pattern Outliers – No clear outliers Direction – Positive association Strength – Fairly strong USE MODIFIERS! Don’t freak about strength. We will discuss this more later.

11 Example p. 149 Form – Roughly linear with two clusters
Outliers – No clear outliers Direction – Positive association Strength – Fairly strong Scatterplot shows the relationship between the duration of an eruption and the time until the next eruption. Why are there clusters? There seem to be a lot of shorter eruptions that last around 2 minutes and longer eruptions that last around 4.5 minutes.

12 Adding Categorical Variables
To add categorical variables, use different types of marks (●, ○, □, +) for your points. WV, GA, SC have lower SAT scores than we would expect. DC city rather than a state.

13 Which one is stronger? Our eyes are not always reliable when looking to see how strong a linear relationship is. This is why we rely on a number to help us.

14 Measuring Linear Association: Correlation
The correlation r measures the direction and strength of the linear relationship between two quantitative variables. r is always a number between -1 and 1 r > 0 indicates a positive association. r < 0 indicates a negative association. Values of r near 0 indicate a very weak linear relationship. strength increases as r moves away from 0 towards -1 or 1. r = -1 and r = 1 occur only in the case of a perfect linear relationship.

15

16

17

18 Correlation Practice 4. r ≈ 0.9
For each graph, estimate the correlation r and interpret it in context. Answer choices: 1. r ≈ 2. r ≈ 3. r ≈ 0.3 4. r ≈ 0.9 5. r ≈ 6. r ≈ 0.5 4. r ≈ 0.9 Pretty strong, positive relationship r ≈ 0.9

19 Correlation Practice 6. r ≈ 0.5
For each graph, estimate the correlation r and interpret it in context. Answer choices: 1. r ≈ 2. r ≈ 3. r ≈ 0.3 4. r ≈ 0.9 5. r ≈ 6. r ≈ 0.5 6. r ≈ 0.5 (b) Moderate, positive relationship r ≈ 0.5

20 Correlation Practice 3. r ≈ 0.3
For each graph, estimate the correlation r and interpret it in context. Answer choices: 1. r ≈ 2. r ≈ 3. r ≈ 0.3 4. r ≈ 0.9 5. r ≈ 6. r ≈ 0.5 3. r ≈ 0.3 (c) Weak, positive relationship r ≈ 0.3

21 Correlation Practice 2. r ≈ - 0.1
For each graph, estimate the correlation r and interpret it in context. Answer choices: 1. r ≈ 2. r ≈ 3. r ≈ 0.3 4. r ≈ 0.9 5. r ≈ 6. r ≈ 0.5 2. r ≈ - 0.1 (d) Weak, negative relationship r ≈ -0.1

22 Example, p. 153 r = 0.936 Interpret the value of r in context.
The correlation of confirms what we see in the scatterplot; there is a strong, positive linear relationship between points per game and wins in the SEC.

23 Example, p. 153 r = 0.936 The point highlighted in red on the scatterplot is Mississippi. What effect does Mississippi have on the correlation. Justify your answer. Mississippi makes the correlation closer to 1 (stronger). If Mississippi were not included, the remaining points wouldn’t be as tightly clustered in a linear pattern.

24 Calculating Correlation
How to Calculate the Correlation r Suppose that we have data on variables x and y for n individuals. The values for the first individual are x1 and y1, the values for the second individual are x2 and y2, and so on. The means and standard deviations of the two variables are x-bar and sx for the x-values and y-bar and sy for the y-values. The correlation r between x and y is: Notice what the formula has in it. Z-scores

25 Facts About Correlation
Correlation makes no distinction between explanatory and response variables. r does not change when we change the units of measurement of x, y, or both. The correlation r itself has no unit of measurement.

26 Cautions Correlation requires that both variables be quantitative.
Correlation does not describe curved relationships between variables, no matter how strong the relationship is. Because of this, r = - 1 or 1 does not guarantee a linear relationship. Correlation is not resistant. r is strongly affected by a few outlying observations. Correlation is not a complete summary of two-variable data.

27 This set of data has a correlation close to – 1, but we can see a slight curve in the scatterplot. Always plot your data!

28 Example p. 157 – Why correlation doesn’t tell the whole story
Scoring Figure Skaters. Until a scandal at the 2002 Olympics brought change, figure skating was scored by judges on a scale from 0.0 to The scores were often controversial. We have the scores awarded by two judges, Pierre and Elena, for many skaters. How well do they agree? We calculate that the correlation between their scores is r = But the mean of Pierre’s scores is 0.8 point lower than Elena’s mean. These facts don’t contradict each other. They simply give different kinds of information. The mean scores show that Pierre awards lower scores than Elena. But because Pierre gives every skater a score about 0.8 point lower than Elena does, the correlation remains high. Adding the same number to all values of either x or y does not change the correlation. If both judges score the same skaters, the competition is scored consistently because Pierre and Elena agree on which performances are better than others. The high r shows their agreement. But if Pierre scores some skaters and Elena others, we should add 0.8 point to Pierre’s scores to arrive at a fair comparison.

29 HW Due: Friday p # 5, 7, 15, 17, 21, 27, 28


Download ppt "Describing Relationships"

Similar presentations


Ads by Google