Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.

Similar presentations


Presentation on theme: "Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation."— Presentation transcript:

1 Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation

2 Correlation One of the most basic questions asked in behavioral science involves whether a relation exists between two variables. One of the most basic questions asked in behavioral science involves whether a relation exists between two variables. Do changes or scores on X correspond to changes or scores on Y? Do changes or scores on X correspond to changes or scores on Y? An easy way to visually ask this question is to use a scatter plot. An easy way to visually ask this question is to use a scatter plot.

3 Breast Cancer and Solar Radiation Here, it is relatively easy to see that the rate of breast cancer decreases with exposure to increasing solar radiation.

4 Life Expectancy and Per Capital Health Expenditures In other cases, it may be difficult to tell if a relationship exists by simple “eyeballing.”

5 Correlation Correlation is one statistical technique that is used to examine whether any relation exists between two variables. Correlation is one statistical technique that is used to examine whether any relation exists between two variables. Correlation coefficient provides a numerical measure of the degree and direction of the relation. Correlation coefficient provides a numerical measure of the degree and direction of the relation. Note that existence of a correlation does not imply that one variable causes the other. Note that existence of a correlation does not imply that one variable causes the other.

6 Correlation In essence, a correlation analysis fits a linear function to a scatterplot. In essence, a correlation analysis fits a linear function to a scatterplot. A linear function is one with a constant slope (i.e. a straight line). A linear function is one with a constant slope (i.e. a straight line). The function is the one that minimizes all possible prediction errors or residuals. The function is the one that minimizes all possible prediction errors or residuals. Residuals are represented by the vertical distance from the prediction line to the data points. Residuals are represented by the vertical distance from the prediction line to the data points. This is also termed the regression regression line. This is also termed the regression regression line.

7 Negative Correlation

8 Visualizing Residuals

9

10 Correlation Remember that correlations only examine linear relationships. Remember that correlations only examine linear relationships. Variable could possess a very high curvilinear relation, but the correlation coefficient could be zero. Variable could possess a very high curvilinear relation, but the correlation coefficient could be zero. Applet Applet

11 Calculating a Correlation First step in assessing a relation is to examine the covariance of two variables. First step in assessing a relation is to examine the covariance of two variables. Covariance represents the degree to which two variables vary together. Covariance represents the degree to which two variables vary together. If larger deviations occur together, the covariance is maximized.

12 Covariance Patterns (X-MeanX)(Y-MeanY)Product +++ --+ +++ --+ (X-MeanX)(Y-MeanY)Product -+- +-- +-- -+- (X-MeanX)(Y-MeanY)Product --+ +-- +++ -+- Pos Neg Null

13 Calculating a Correlation Covariance is at its maximum when X and Y are perfectly correlated (r=1). Covariance is at its maximum when X and Y are perfectly correlated (r=1). When there is no relation, the covariance will be zero. When there is no relation, the covariance will be zero. What is the covariance of a variable with itself?

14 Calculating a Correlation Interpreting covariance is difficult, as it depends on the metric and dispersion of the measures. Interpreting covariance is difficult, as it depends on the metric and dispersion of the measures. Whether a covariance of 100 is interpreted as high depends on what level of variability exists in the data. Whether a covariance of 100 is interpreted as high depends on what level of variability exists in the data. To resolve this issue, we scale the covariance by dividing by the standard deviations of both measures. To resolve this issue, we scale the covariance by dividing by the standard deviations of both measures. This is similar to what happens in z-transformations. This is similar to what happens in z-transformations.

15 Calculating a Correlation Note that the maximum of the cov xy is (s x )(s y ) (a variable correlates perfectly with itself: (s x )(s x )=var(x)

16 Interpreting Pearson Correlations Correlation coefficients do not directly translate into meaningful values. Correlation coefficients do not directly translate into meaningful values. Higher absolute variables reflect greater prediction accuracy. Higher absolute variables reflect greater prediction accuracy. Put differently, they predict smaller errors of prediction or residuals in the scatterplot. Put differently, they predict smaller errors of prediction or residuals in the scatterplot. Applet Applet

17 Correlations and Data Types Correlations are usually calculated on interval or ratio scores. Correlations are usually calculated on interval or ratio scores. However, they can be calculated using ordinal or nominal data as well. However, they can be calculated using ordinal or nominal data as well. Phi: both nominal Phi: both nominal Point-biserial: one interval one nominal Point-biserial: one interval one nominal Spearman: ranked/ordinal data Spearman: ranked/ordinal data In actuality, these are all Pearson correlations, just different computational formulas. In actuality, these are all Pearson correlations, just different computational formulas. Interpretations will slightly differ, but all provide a measure of the relation between the two variables. Interpretations will slightly differ, but all provide a measure of the relation between the two variables.

18 Factors that Affect the Correlation Range Restrictions Range Restrictions Heterogeneous Subsamples Heterogeneous Subsamples Extreme Observations Extreme Observations Two-dimensional outliers Two-dimensional outliers

19 Range Restrictions If the full range of values of a variable is not included in the sample, the resulting correlation coefficient may be attenuated. If the full range of values of a variable is not included in the sample, the resulting correlation coefficient may be attenuated. For example, if only the upper range of SAT scores are used to predict college GPA, the relation may seem relatively small. For example, if only the upper range of SAT scores are used to predict college GPA, the relation may seem relatively small. SAT may explain a good deal of variability in GPA’s, but not all. Within a small range of observed SAT scores, prediction may be minimal. SAT may explain a good deal of variability in GPA’s, but not all. Within a small range of observed SAT scores, prediction may be minimal. Applet Applet

20 Heterogeneous Subsamples If the sample contains heterogeneous subsamples, the correlation coefficient may be biased. If the sample contains heterogeneous subsamples, the correlation coefficient may be biased. Heterogeneous subsamples are distinct groups that may possess different relations among the variables (e.g., men vs. women) Heterogeneous subsamples are distinct groups that may possess different relations among the variables (e.g., men vs. women) For example, one might investigate the relation between number of dependents and psychological wellbeing (this is made-up data). For example, one might investigate the relation between number of dependents and psychological wellbeing (this is made-up data). Applet Applet

21 Extreme Values Here again, outliers may cause havoc. Here again, outliers may cause havoc. Important to remember that we are now looking for outliers in 2 dimensions. Important to remember that we are now looking for outliers in 2 dimensions. For example, the following is hypothetical data on the relation of liberal ideology to attitudes toward gay marriage. For example, the following is hypothetical data on the relation of liberal ideology to attitudes toward gay marriage.

22 Liberal Ideology and Attitudes Toward Gay Marriage

23 If the Outlier is Removed

24 Testing the Significance of a Correlation Follows the logic of null hypothesis testing. Follows the logic of null hypothesis testing. The null assumes that the correlation between the two variables is zero in the population. The null assumes that the correlation between the two variables is zero in the population. We want to test the probability of this hypothesis given the level of correlation we found in a sample. We want to test the probability of this hypothesis given the level of correlation we found in a sample. Remember, even if the correlation is zero in the population, it will not be zero in every sample drawn from the population due to sampling error. Remember, even if the correlation is zero in the population, it will not be zero in every sample drawn from the population due to sampling error.

25 Testing the Significance of a Correlation Once we have an estimate of r, we turn to the correlation table of critical values. Once we have an estimate of r, we turn to the correlation table of critical values. The degrees of freedom are N=2 when predicting one variable from another. The degrees of freedom are N=2 when predicting one variable from another. N=number of observations in the sample. N=number of observations in the sample. We select an alpha level and then determine if we can reject the null. We select an alpha level and then determine if we can reject the null. What we are doing is comparing the size of the obtained correlation to the standard error of correlation coefficients. What we are doing is comparing the size of the obtained correlation to the standard error of correlation coefficients. Time for an example. Time for an example.


Download ppt "Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation."

Similar presentations


Ads by Google