 # Essential Statistics Chapter 41 Scatterplots and Correlation.

## Presentation on theme: "Essential Statistics Chapter 41 Scatterplots and Correlation."— Presentation transcript:

Essential Statistics Chapter 41 Scatterplots and Correlation

Essential Statistics Chapter 42 Explanatory and Response Variables u Studying the relationship between two variables. u Measuring both variables on the same individuals. –a response variable measures an outcome of a study –an explanatory variable explains or influences changes in a response variable –sometimes there is no distinction

Essential Statistics Chapter 43 Case study ♦ In a study to determine whether surgery or chemotherapy results in higher survival rates for some types of cancer. ♦ Whether or not the patient survived is one variable, and whether they received surgery or chemotherapy is the other variable. ♦ Which is the explanatory variable and which is the response variable?

Essential Statistics Chapter 44 u Graphs the relationship between two quantitative (numerical) variables measured on the same individuals. u Usually plot the explanatory variable on the horizontal (x) axis and plot the response variable on the vertical (y) axis. Scatterplot

Essential Statistics Chapter 45 Relationship between mean SAT verbal score and percent of high school grads taking SAT Scatterplot

Essential Statistics Chapter 46 u Look for overall pattern and deviations from this pattern u Describe pattern by form, direction, and strength of the relationship u Look for outliers Scatterplot

Essential Statistics Chapter 47 Linear Relationship Some relationships are such that the points of a scatterplot tend to fall along a straight line -- linear relationship

Essential Statistics Chapter 48 Direction u Positive association ◙ A positive value for the correlation implies a positive association. ◙ large values of X tend to be associated with large values of Y and small valuesof X tend to be associated with small values of Y. u Negative association ◙ A negative value for the correlation implies a negative or inverse association ◙ large values of X tend to be associated with small values of Y and vice versa.

Essential Statistics Chapter 49 Examples From a scatterplot of college students, there is a positive association between verbal SAT score and GPA. For used cars, there is a negative association between the age of the car and the selling price.

Essential Statistics Chapter 410 Examples of Relationships

Essential Statistics Chapter 411 Measuring Strength & Direction of a Linear Relationship u How closely does a straight line (non- horizontal) fit the points of a scatterplot? u The correlation coefficient (often referred to as just correlation): r –measure of the strength of the relationship: the stronger the relationship, the larger the magnitude of r. –measure of the direction of the relationship: positive r indicates a positive relationship, negative r indicates a negative relationship.

Essential Statistics Chapter 412 Correlation Coefficient u special values for r :  a perfect positive linear relationship would have r = +1  a perfect negative linear relationship would have r = -1  if there is no linear relationship, or if the scatterplot points are best fit by a horizontal line, then r = 0  Note: r must be between -1 and +1, inclusive u both variables must be quantitative; no distinction between response and explanatory variables u r has no units; does not change when measurement units are changed (ex: ft. or in.)

Essential Statistics Chapter 413 Examples of Correlations

Essential Statistics Chapter 414 Examples of Correlations u Husband’s versus Wife’s ages v r =.94 u Husband’s versus Wife’s heights v r =.36 u Professional Golfer’s Putting Success: Distance of putt in feet versus percent success v r = -.94

Essential Statistics Chapter 415 Not all Relationships are Linear Miles per Gallon versus Speed u Linear relationship? u Correlation is close to zero.

Essential Statistics Chapter 416 Not all Relationships are Linear Miles per Gallon versus Speed u Curved relationship. u Correlation is misleading.

Essential Statistics Chapter 417 Problems with Correlations u Outliers can inflate or deflate correlations (see next slide) u Groups combined inappropriately may mask relationships (a third variable) –groups may have different relationships when separated

Essential Statistics Chapter 418 Outliers and Correlation For each scatterplot above, how does the outlier affect the correlation? AB A: outlier decreases the correlation B: outlier increases the correlation

Essential Statistics Chapter 419 Correlation Calculation u Suppose we have data on variables X and Y for n individuals: x 1, x 2, …, x n and y 1, y 2, …, y n u Each variable has a mean and std dev:

Essential Statistics Chapter 420 Case Study Per Capita Gross Domestic Product and Average Life Expectancy for Countries in Western Europe

Essential Statistics Chapter 421 Case Study CountryPer Capita GDP (x)Life Expectancy (y) Austria21.477.48 Belgium23.277.53 Finland20.077.32 France22.778.63 Germany20.877.17 Ireland18.676.39 Italy21.578.51 Netherlands22.078.15 Switzerland23.878.99 United Kingdom21.277.37

Essential Statistics Chapter 422 Case Study xy 21.477.48-0.078-0.3450.027 23.277.531.097-0.282-0.309 20.077.32-0.992-0.5460.542 22.778.630.7701.1020.849 20.877.17-0.470-0.7350.345 18.676.39-1.906-1.7163.271 21.578.51-0.0130.951-0.012 22.078.150.3130.4980.156 23.878.991.4891.5552.315 21.277.37-0.209-0.4830.101 = 21.52 = 77.754 sum = 7.285 s x =1.532s y =0.795

Essential Statistics Chapter 423 Case Study

Summary u Explanatory variable Vs responsible variable u Scatterplot: display the relationship between two quantitative variables measured on the same individuals u Overall pattern of scatterplot showing: ☻ Direction ☻Form☻ Strength ♦ Correlation coefficient r: Measures only straight-line linear relationship ♦ R indicate the direction of a linear relationship by sign R > 0, positive association R< 0, negative association ♦ The range of R:-1 ≤ R ≤ 1, Essential Statistics Chapter 424

Scatterplots Which of the following scatterplots displays the stronger linear relationship? a) Plot A b) Plot B c) Same for both

Correlation If two quantitative variables, X and Y, have a correlation coefficient r = 0.80, which graph could be a scatterplot of the two variables? a) Plot A b) Plot B c) Plot C Plot A ? Plot B? Plot C?