Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scatterplots, Association, and Correlation

Similar presentations


Presentation on theme: "Scatterplots, Association, and Correlation"— Presentation transcript:

1 Scatterplots, Association, and Correlation
Week 4 Lecture 1 Chapter 6. Scatterplots, Association, and Correlation

2 Relationship Between Two Quantitative Variables
Previously, we explored single quantitative variables. Now, we can look at two quantitative variables and examine their relationship (association). First tool (graphical display) is to construct a scatterplot. Scatterplot: plots values of two quantitative variables against each other. Sometimes one variable is an “outcome” or response; and, the other explains the outcome, an explanatory variable. When examining the relationship between two or more variables, first think about which variables are response (dependent) variables and which are explanatory (independent) variables. In determining which variable(s) is/are response, and which one(s) is/are explanatory, think about the context of the study and the research question(s) that the study aims at investigating.

3 Relationship Between Two Quantitative Variables
Let’s consider a data set on geographic socio-political areas in each state in the United States that includes information about: Births of low birthweight as a percent of all birth– A quantitative variable percent of adults who smoke – A quantitative variable Research Question: Is there a relationship between percent of low birthweight and percent of adults who smoke? Response variable: ? Explanatory variable: ?

4 Relationship Between Two Quantitative Variables
Research Question: Is there a relationship between percent of low birthweight and percent of adults who smoke? Or we can phrase the question as: Do percent of low birthweight depend on percent of adults smoke? In other words, can variation in percent of adults who smoke explain the variation in percent of low birthweights? Response variable: percent of low birthweights Explanatory variable: percent of adults who smoke

5 Describing a Scatterplot
Describing the Scatterplot of Percent of Low Birthweights and Percent of Adult Smokers Scatterplot of between percent of low birthweight and percent of adults who smoke Describing a Scatterplot Look for overall pattern and striking deviation form that pattern. The overall pattern of a scatterplot can be described by the form, direction, and strength of the relationship. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. Obtaining Scatterplot in StatCrunch: Graph > Scatter Plot X variable: Adult Smokers % Y variable: Low Birthweights % Click Compute

6 Describing the Scatterplot of Percent of Low Birthweights and Percent of Adult Smokers
Direction: There is an upward trend. The points are forming an upward trend. The association is, thus, positive. Form: Overall, the points are forming a trend that is close to a straight line. No curvature is observed in the overall pattern. Strength: There appears to be a moderate linear association. In the context of this study, we interpret this scatterplot as: There appears to be a moderate positive linear association between percent of adults who smoke and percent of low birth birthweights. This, furthermore, means that higher percent of adult smokers are associated with higher percent of low birthweight.

7 Correlation Between Two Quantitative Variables
Correlation measures the direction and the strength of linear association between two quantitative variables. We estimate the correlation in the population based our sample information. We denote the estimated correlation with the letter r. r is always between -1 and 1. Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0. r values close to -1 and 1 indicate that the point lie close to a straight line. r is not resistant (r is sensitive) and it is affected by extreme outliers in the data. When you calculate a correlation, it doesn’t matter which variable is 𝑥 (explanatory variable) and which is 𝑦 (response variable).

8

9 Correlation Between Percent of Low Birthweights and Percent of Adult Smokers
We can use StatCrunch to obtain the estimated correlation. Stat > Summary Stats > Correlation Select the two variables: Adult Smoker Correlation between Adult Smokers % and Low Birthweight % is: so, r = 0.38 r is a positive value here because the direction of the points are, overall, positive; thus, our interpretation about the relationship between percent of low birthweights and percent of adult smokers matches the r value. Also, r is about 0.38, which indicates moderate association between the two variables.

10 Correlation Does Not Imply Causation
Important note: Correlation does not imply causation. Why? Because, there are other variables (perhaps not included into study’s data analysis) that could contribute to understanding the variation in the Percent of Low Birthweights. These variables are often called lurking or hidden variables. Can you propose some variables that you think they could contribute to understanding the variation in percent of low birthweights?

11 Correlation Does NOT Prove Causation
In many studies of the relationship between two variables the goal is to establish that changes in the explanatory variable cause changes in response variable. Even a strong association between two variables, does not necessarily imply a casual link between the variables. Some explanations for an observed association: The dashed double arrow lines show an association. The solid arrows show a cause and effect link. The variable 𝑥 is explanatory, 𝑦 is response, and 𝑧 is a lurking variable.

12 Correlation Does Not Depend On Units of Measurement
Correlation does not depend on the units of measurement for the response and explanatory variables. Why? Consider the relationship between ACT scores and percent of adult smokers. The unit of measurement is different for ACT score (score as its unit of measurement) and percent of adult smokers (percent as its unit of measurement). Correlation between Adult Smokers % and ACT is: r is a negative value here because, overall, the direction of the points are negative. That means higher values of ACT scores go with lower values of percent of adult smokers. Note: The estimated correlation uses the standardized values of observations for the response and explanatory variables. Thus, r has no units of measurement.


Download ppt "Scatterplots, Association, and Correlation"

Similar presentations


Ads by Google