Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)

Similar presentations


Presentation on theme: "Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)"— Presentation transcript:

1 Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.) Things (and people) exist in relationships ● Here we study relationships between two quantitative variables (e.g., IQ test score and school GPA.) ● Graphical description: scatter plots – Look for direction, form, strength, and outliers ● Numerical measure: correlation coefficient – Definition – Direction and strength of linear relationships

2 Scatter Plots Convention: Dependent variable on the vertical axis, independent (explanatory) variable on the horizontal axis.

3 Scatter Plot Example: IQ score and GPA

4 Scatter Plot Example: Wealth and Health

5 What to Look For in a Scatter Plot Functional form: Nonlinear? Linear? Direction: Positive? Negative? Strength: How clear is the pattern?

6 What to Look For in a Scatter Plot: Direction

7 Outliers

8 How Strong Is the Relationship? Two scatter plots of the same data, using different scales. Visual impressions are not very reliable!

9 The Correlation Coefficient ● Numerical measure of direction and strength of a linear relationship ● Linear relationships are particularly important – Simplest, easiest to understand – Some nonlinear relationships can be transformed into linear by transforming variables (e.g. Using square terms for curvilinear patterns: Age and voting) – A “first order approximation” of arbitrary relationships.

10 Measuring Linear Correlation with r ● Direction: Does the scatter plot slope upward or downward? – Positive r indicates a positive relationship, negative r indicates a negative relationship. ● Strength: How strong is the association? How closely does a non-horizontal straight line fit the points of a scatter plot? – the stronger the relationship, the larger the magnitude of r. ● Formula:

11 Strength of linear correlation

12 Strength and Statistical Significance ● A strong relationship seen in the sample may indicate a strong relationship in the population, or the sample results may be due to chance and the relationship in the population is not strong or is zero. (We'll test whether the relationship is “significant” in the context of linear regression) ● “Statistical significance” does not imply the relationship is strong enough to be considered “practically important”. (“Non- zero” is not necessarily “big” in size.) – Even weak relationships may be labeled statistically significant if the sample size is very large. – Even very strong relationships may not be labeled statistically significant if the sample size is very small.

13 Properties of the Correlation Coefficient ● r is always between -1 and 1 ● r > 0: as one variable changes, the other variable tends to change in the same direction ● r < 0: as one variable changes, the other variable tends to change in the opposite direction ● r=+1: A perfect positive linear relationship: y=a+bx, b>0 ● r=-1: A perfect negative linear relationship y=a+bx, b<0 ● r=0: No linear relationship (the scatter plot points are best fit by a horizontal line) ● Limitations: – Outliers can inflate or deflate correlations – Correlation can be spurious due to confounding

14 The Effect of Outliers on the Correlation In this figure graphing the relationship between the length of leg bone and an upper arm bone in 6 fossil specimen of an extinct beast, moving one point in the figure changes r from.994 to.64!

15 The Effect of Outliers on the Correlation

16 Using Software ● Stata: – For scatter plot: Graphics-->Two way graph – For r: “correlate”, “pwcorr” (pair-wise) – Example: “sysuse lifeexp”; relationship between safewater access and life expectancy, country level data. notes; twoway (scatter lexp safewater); correlate lexp safewater; pwcorr; pwcorr,sig ● “Correlation and Regression Demo” applet at the book's website (explore the effect of outliers, for example.)

17 Statistical versus Deterministic Relationships ● y=a+bx is a deterministic relationship: knowing the value of x meaning knowing the value of y (assume we know the intercept and the slope). The correlation coefficient is 1 (or - 1) in this case. – e.g. Distance traveled = speed x time. Fixing speed, there's a deterministic relationship between distance and time. ● In social science data deterministic relationships are rare. e.g., time studying and exam grade. ● y=a+bx + e describes such imperfect linear relationships between two variables: the value of y is not completely determined by x (and the parameters a and b), but is also affected by something else, e. ● We discuss the fitting of straight lines to imperfect scatter plot data next time.


Download ppt "Describing Relationships: Scatter Plots and Correlation ● The world is an indivisible whole (butterfly effect and chaos theory; quantum entanglement, etc.)"

Similar presentations


Ads by Google