#  Chapter 7 Scatterplots, Association, and Correlation.

## Presentation on theme: " Chapter 7 Scatterplots, Association, and Correlation."— Presentation transcript:

 Chapter 7 Scatterplots, Association, and Correlation

Scatterplots  Displays the relationship between 2 quantitative valuables measured on the same cases  very common  very effective way to display relationships  see patterns and trends

Examples  Relationships between variables are often at the heart of what we would like to learn from data.  Are grades actually higher now than they used to be?  Do people tend to reach puberty at a younger age than in previous generations?  Does applying magnets to parts of the body relieve pain? If so, are stronger magnets more effective?  Do students learn better with the use of computer technology?  These questions relate two quantitative variables and ask whether there is an association between them.

Direction  Positive  Negative

Form  Straight  Curved

Strength How much scatter??  Weak  Strong

Unusual Features  Be sure to mention any outliers or subgroups

Cartesian Plane  Created by René Descartes (1596 – 1650) 

Variables x - variable  Explanatory variable  Predictor variable  Accounts for, explains, predicts or is otherwise responsible for the y – variable y - variable  Response variable  The variable you hope to predict or explain

Assigning the Variables  We want to compare peak period freeway speed to cost per person per year.  x = speed and y = cost  the slower you go, the more it costs in delays  x = cost and y = speed  the more you spend on highway improvements the speed would increase

Determining Variables  Do heavier smokers develop lung cancer at younger ages?  Is birth order an important factor in predicting future income?  Can we estimate a person’s % body fat more simply by just measuring waist or wrist size?

Examples: Describe what the scatterplot might look like.  Drug dosage and degree of pain relief  Calories consumed and weight loss  Hours of sleep and score on a test  Show size and grade point average  Time for a mile run and age  Age of car and cost of repairs

Calculator  Making scatterplots  Naming lists

Correlation  measures the strength of the linear association between two quantitative variables  The sign of the correlation coefficient gives the direction of the association  Always between -1 and 1  -1 and 1 would be a perfect straight line (possible but very rare)  Correlation treats x and y symmetrically  No units  NOT affected by changes in the center or scale of either variable  Correlation depends on the z-scores  Measures the strength of ONLY LINEAR plots  Sensitive to outliers  a single value can drastically change your coefficient

Correlation Conditions  Quantitative Variables Condition: correlation applies only to quantitative variables. Check to make sure you know the variables units and what they measure  Straight Enough Condition: the correlation coefficient tells us the strength of LINEAR scatterplots only  Outlier Conditions: outliers can distort the correlation dramatically. When you see an outlier, you should report the correlation with AND without the outlier.

Checking In  Your Statistics teacher tells you that the correlation between the scores (points out of 50) on Exam 1 and Exam 2 was.75  Before answering any questions about the correlation, what would you like to see? Why?  If she added 10 points to each Exam 1 scores, how will this change the correlation?  If she standardizes both scores, how will this affect the correlation?  In general, if someone does poorly on Exam 1, are they likely to do poorly or well on Exam 2? Explain.  If someone does poorly on Exam 1, will they definitely do poorly on Exam 2 as well?

Looking at Association  When your blood pressure is measured, it is reported at two values, systolic blood pressure and diastolic blood pressure. How are these variable related to each other? Do they tend to be both hih or both low?

Think!!  Plan  I’ll examine the relationship between two measures of blood pressure.  Variables  Systolic blood pressure and diastolic blood pressure, both measured in millimeters of mercury  W’s: 1406 participants in a health study in Framingham MA  Plot  Create a scatterplot

Check the Conditions  Quantitative Variables??  Straight Enough??  Outliers??

Show!!  Mechanics  We will calculate correlation on the calculator  Correlation =.792

Tell!!  Conclusion  The scatterplot shows a positive direction, with a higher SBP going with a higher DBP. The plot is generally straight with a moderate amount of scatter. The correlation of.792 is consistent with what I saw in the scatterplot. A few cases stand out with unusually high SBP compared with their DBP. It seems far less common for the DBP to be high by itself.