Issues in Interpreting Correlations:. Correlation does not imply causality: X might cause Y. Y might cause X. Z (or a whole set of factors) might cause.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Introductory Mathematics & Statistics for Business
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
Chapter 7 Sampling and Sampling Distributions
Proving a Premise – Chi Square Inferential statistics involve randomly drawing samples from populations, and making inferences about the total population.
Elementary Statistics
1 T-test for the Mean of a Population: Unknown population standard deviation Here we will focus on two methods of hypothesis testing: the critical value.
(This presentation may be used for instructional purposes)
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Chi-square and F Distributions
Statistical Inferences Based on Two Samples
CHAPTER 15: Tests of Significance: The Basics Lecture PowerPoint Slides The Basic Practice of Statistics 6 th Edition Moore / Notz / Fligner.
T-tests continued.
1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Hypothesis Testing Steps in Hypothesis Testing:
Statistics and Quantitative Analysis U4320
Correlation and Regression
Hypothesis Testing IV Chi Square.
Describing Relationships Using Correlation and Regression
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
 Once you know the correlation coefficient for your sample, you might want to determine whether this correlation occurred by chance.  Or does the relationship.
PSY 307 – Statistics for the Behavioral Sciences
Chapter 12 Simple Regression
The t-test:. Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) “Dependent-means.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
10-2 Correlation A correlation exists between two variables when the values of one are somehow associated with the values of the other in some way. A.
SIMPLE LINEAR REGRESSION
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
SIMPLE LINEAR REGRESSION
AM Recitation 2/10/11.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Correlation and Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
14 Elements of Nonparametric Statistics
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Hypothesis of Association: Correlation
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Elementary Statistics Correlation and Regression.
CORRELATIONS: TESTING RELATIONSHIPS BETWEEN TWO METRIC VARIABLES Lecture 18:
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Lecture 10: Correlation and Regression Model.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
1 Chapter 10 Correlation. 2  Finding that a relationship exists does not indicate much about the degree of association, or correlation, between two variables.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Chapter 13 Understanding research results: statistical inference.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Chapter 3D Chapter 3, part D Fall 2000.
STATISTICS Topic 1 IB Biology Miss Werba.
Simple Linear Regression and Correlation
Inferential Statistics
Presentation transcript:

Issues in Interpreting Correlations:

Correlation does not imply causality: X might cause Y. Y might cause X. Z (or a whole set of factors) might cause both X and Y.

Factors affecting the size of a correlation coefficient: 1. Sample size and random variation: The larger the sample, the more stable the correlation coefficient. Correlations obtained with small samples are unreliable.

Conclusion: you need a large sample before you can be really sure that your sample r is an accurate reflection of the population r. Limits within which 80% of sample r's will fall, when the true (population) correlation is 0: Sample size:80% limits for r: to to to to to to +.09

2. Linearity of the relationship: Pearson’s r measures the strength of the linear relationship between two variables; r will be misleading if there is a strong but non-linear relationship. e.g.:

3. Range of talent (variability): The smaller the amount of variability in X and/or Y, the lower the apparent correlation. e.g.:

4. Homoscedasticity (equal variability): r describes the average strength of the relationship between X and Y. Hence scores should have a constant amount of variability at all points in their distribution. in this region: low variability of Y (small Y-Y' ) in this region: high variability of Y (large Y-Y' ) regression line

5. Effect of discontinuous distributions: A few outliers here distort things considerably. There is no real correlation between X and Y.

Deciding what is a "good" correlation: A moderate correlation could be due to either (a) sampling variation (and hence a "fluke"); or (b) a genuine association between the variables concerned. How can we tell which of these is correct?

Large negative correlations: unlikely to occur by chance Large positive correlations: unlikely to occur by chance r = 0 Small correlations: likely to occur by chance Distribution of r's obtained using samples drawn from two uncorrelated populations of scores:

r = -.44r = +.44 r = For an N of 20: 5 out of 100 random samples are likely to produce an r of.44 or larger, merely by chance (i.e., even though in the population, there was no correlation at all!).

Thus we arbitrarily decide that: (a) If our sample correlation is so large that it would occur by chance only 5 times in a hundred, we will assume that it reflects a genuine correlation in the population from which the sample came. (b) If a correlation like ours is likely to occur by chance more often than this, we assume it has arisen merely by chance, and that it is not evidence for a correlation in the parent population.

How do we know how likely it is to obtain a sample correlation as large as ours by chance? Tables (in the back of many statistics books and on my website) give this information for different sample sizes. An illustration of how to use these tables: Suppose we take a sample of 20 people, and measure their eye-separation and back hairiness. Our sample r is.75. Does this reflect a true correlation between eye-separation and hairiness in the parent population, or has our r arisen merely by chance (i.e. because we have a freaky sample)?

Step 1: Calculate the "degrees of freedom" (df = the number of pairs of scores, minus 2). Here, we have 20 pairs of scores, so df = 18. Step 2: Find a table of "critical values for Pearson's r".

Part of a table of "critical values for Pearson's r": Level of significance (two-tailed) df With 18 df, a correlation of.4438 or larger will occur by chance with a probability of.05: i.e., if we took 100 samples of 20 people, about 5 of those samples are likely to produce an r of.4438 or larger (even though there is actually no correlation in the population!).

Part of a table of "critical values for Pearson's r": Level of significance (two-tailed) df With 18 df, a correlation of.5614 or larger will occur by chance with a probability of.01: i.e., if we took 100 samples of 20 people, about 1 of those 100 samples is likely to give an r of.5614 or larger (again, even though there is actually no correlation in the population!).

Part of a table of "critical values for Pearson's r": Level of significance (two-tailed) df With 18 df, a correlation of.6787 or larger will occur by chance with a probability of.001: i.e., if we took 1000 samples of 20 people, about 1 of those 1000 samples is likely to give an r of.6787 or larger (again, even though there is actually no correlation in the population!).

The table shows that an r of.6787 is likely to occur by chance only once in a thousand times. Our obtained r is.75. This is larger than Hence our obtained r of.75 is likely to occur by chance less than one time in a thousand (p<.001). Conclusion: Any sample correlation could in principle occur due to chance or because it reflects a true relationship in the population from which the sample was taken. Because our r of.75 is so unlikely to occur by chance, we can safely assume that there really is a relationship between eye-separation and back-hairiness.

Important point: Do not confuse statistical significance with practical importance. We have just assessed "statistical significance" - the likelihood that our obtained correlation has arisen merely by chance. Our r of.75 is "highly significant" (i.e., highly unlikely to have arisen by chance). However, a weak correlation can be statistically significant, if the sample size is large enough...

e.g. with 100 df, an r of.1946 is "significant" in the sense that it is unlikely to have arisen by chance (r's bigger than this will occur by chance only 5 in a 100 times). The coefficient of determination (r 2 ) shows that this is not a strong relationship in a practical sense; knowledge of one of the variables would account for only 3.79% of the variance in the other - completely useless for predictive purposes!