Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

Correlation and Linear Regression.
Review ? ? ? I am examining differences in the mean between groups
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Chapter 6: Correlational Research Examine whether variables are related to one another (whether they vary together). Correlation coefficient: statistic.
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Correlation Chapter 9.
PPA 415 – Research Methods in Public Administration
Lecture 11 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
REGRESSION AND CORRELATION
Data Analysis Statistics. Inferential statistics.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Correlation and Regression Analysis
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Relationships Among Variables
Smith/Davis (c) 2005 Prentice Hall Chapter Eight Correlation and Prediction PowerPoint Presentation created by Dr. Susan R. Burns Morningside College.
Correlation & Regression
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.
Chapter 8: Bivariate Regression and Correlation
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Introduction to Linear Regression and Correlation Analysis
Linear Regression and Correlation
Simple Covariation Focus is still on ‘Understanding the Variability” With Group Difference approaches, issue has been: Can group membership (based on ‘levels.
Introduction to Regression Analysis. Two Purposes Explanation –Explain (or account for) the variance in a variable (e.g., explain why children’s test.
Correlation and Regression
Chapter 14 – Correlation and Simple Regression Math 22 Introductory Statistics.
Chapter 15 Correlation and Regression
Regression and Correlation. Bivariate Analysis Can we say if there is a relationship between the number of hours spent in Facebook and the number of friends.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Correlation and Regression PS397 Testing and Measurement January 16, 2007 Thanh-Thanh Tieu.
Hypothesis of Association: Correlation
Basic Statistics Correlation Var Relationships Associations.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Figure 15-3 (p. 512) Examples of positive and negative relationships. (a) Beer sales are positively related to temperature. (b) Coffee sales are negatively.
Investigating the Relationship between Scores
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Correlation & Regression
Examining Relationships in Quantitative Research
Correlation.
By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
CORRELATION. Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson’s coefficient of correlation.
Psychology 820 Correlation Regression & Prediction.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Chapter 14 Correlation and Regression
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Chapter 16: Correlation. So far… We’ve focused on hypothesis testing Is the relationship we observe between x and y in our sample true generally (i.e.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
4 basic analytical tasks in statistics: 1)Comparing scores across groups  look for differences in means 2)Cross-tabulating categoric variables  look.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 3 Investigating the Relationship of Scores.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Correlation and Regression
Review I am examining differences in the mean between groups How many independent variables? OneMore than one How many groups? Two More than two ?? ?
CORRELATION & REGRESSION compiled by Dr Kunal Pathak
REGRESSION ANALYSIS 11/28/2019.
Presentation transcript:

Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.

Major Uses Correlational techniques are used for three major purposes: Degree of Association Predication Reliability

Bivariate Distribution Bivariate distribution - a distribution in which two variables are presented simultaneously Consider the following: X Y 4 5 9 8 4 6 7 Ordinarily, we might construct a graph for each set of data. However, we can place both on a “scatter diagram.”

Scatter Diagram X Y 0 1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1 X Y 4 5 8 4 6 7

What Scatter Diagrams Can Tell Us A scatter diagram can tell us much about a bivariate distribution: presence of relationship No Relationship Relationship

- Direction of relationship Positive Relationship Negative Relationship There is a positive relationship between high school SAT scores and college GPA. Other examples? There is a negative relationship between the number of missed classes and exam scores. Other examples?

- Linear or non-linear Linear Non-linear

- Homoscedasticity/Heteroscedasticity

- Exceptions to relationship Perfect Relationship

√ Conceptualizing r xy Sxy rxy = (Sx2)(Sy2) cross-products X S xy I II IV III Y X (-) values (+) values S xy xy rxy = Sxy (Sx2)(Sy2) √

Computational Formula rxy = = Sxy (Sx2)(Sy2) √ rxy = (SX)(SY) n SXY - SX2 - (SX)2 SY2 - (SY)2 [ ] √

Correlation and Causation “Correlation does not imply causation.” Consider the following: There is a very high correlation (i.e., in the upper .90s) between the length of a person’s big toe and ability to spell! Several possibilities exist: changes in X cause changes in Y changes in Y cause changes in X a third (or other) variable affects X and Y

Correlation and Causation How about this one? Children exposed to violent TV are more aggressive than children exposed to non-violent TV

Factors Influencing the Size of “r” Linearity of regression the more closely scores follow a straight line, the higher the value of r r underestimates true degree of association in non-linear relationship High value r Low value r

Factors Influencing the Size of “r” Restriction of Range (Truncated range) If the correlation coefficient is calculated on a portion of the data, r will usually be smaller than had all data been used Higher value r Lower value r

Factors Influencing the Size of “r” Discontinuous distribution If the correlation coefficient is calculated on portions of the data that are separated, r will usually be higher than had all data been used Lower value r Higher value r

Factors Influencing the Size of “r” The correlation coefficient will adequately reflect the degree of association for a homoscedastic distribution across the entire range of scores, but not for a heteroscedastic distribution Over estimates the degree of association at this point Homoscedastic Heteroscedastic Under estimates the degree of association at this point

Factors Influencing the Size of “r” Pooled data small samples may be combined if their means and standard deviations are similar, otherwise “spurious correlations” may occur Lower value r Higher value r

Factors Influencing the Size of “r” Sampling Variability Large sample sizes (i.e., n > 100) are not greatly affected by sampling variability Small sample sizes will vary considerably, so one must take sample size into consideration when interpreting r. Each of the previous factors indicates the need to consider the conditions under which the correlation coefficient is calculated when interpreting r

Interpreting Strength of Association The correlation coefficient is not the best way to interpret the strength of the association between X and Y scale is not linear and, therefore, r = .60 (for example) is not twice as strong a relationship as r = .30 The coefficient of determination is a better index of strength coefficient of determination - the proportion of variability in Y scores that can be explained by changes in X scores r 2

Regression

Prediction If two variables are correlated, you can predict Y from X with better than chance probability Given r < 1, there will be predictive error - the difference between the actual Y score and the predicted score (Y’) for a given value of X For example, predicted GPA = 3.40 actual GPA = 2.78 error = 2.78 - 3.40 = .62 Predictive error = Y - Y’

Reducing Predictive Error Obviously, we would want our predictions to be as accurate as possible (i.e., have little predictive error) When S(Y - Y’)2 is a minimum, we have met the least squares criterion for the “best fitting straight line” called the regression line

The regression line can be thought of as a “running mean” the means are estimated (i.e., what would be expected given a large number of observations for a given X value) Y = 2.57 X = 425 X = 650 Regression line Y’ = 2.78 Y’ = 2.31

Which Line is Best? Given the scatter plot below, where would we place the regression line?

The Regression Equation Fortunately, there is a simple way to determine precisely where the regression line should be placed so that the least squares criterion is met: r ( ) Sy Sx X - [ X]+ Y Y’ = X score

The regression equation is really nothing more than the equation for a straight line: y = aX + b where, a = slope b = y-intercept r ( ) Sy Sx X - [ X]+ Y Y’ = { slope { y-intercept As such, we can use the regression equation to predict Y from X

An Example Consider the following data: Batting Avg HR .219 8 .287 11 .219 8 .287 11 .306 12 .315 15 n = 4 Y’ = 61.06X - 5.71 SX = 1.127 X = .28175 Y = 11.5 SX = .037612332 SY = 2.5 r = .918581710 SX2 = .323191 SY = 46 XAVG = .271 SY2 = 554 Y’HR = 10.84 SXY = 13.306

Regression to the Mean Any time r < 1.00, the Y’ values will cluster more towards the overall Y The tendency for Y’ values to move closer to Y is called regression to the mean At the extreme case where r = 0, all our Y’ values will equal Y

Measuring Predictive Error Since a predicted value is only a “best estimate,” we would like to know how much is the predictive error overall One way to measure the predictive error is to calculate the amount of variability of the Y scores around the regression line Standard error of estimate (prediction): SYX = S(Y - Y’)2 n √

Standard Error of Estimate The standard error of estimate is like a standard deviation, but one where the deviations are measured from the regression line and not the mean SYX = S(Y - Y’)2 n √ SX = S(X - X)2 Standard deviation Standard error of estimate vs.

Standard Error of Estimate An easier formula is as follows: SYX = SY 1 - r2 √ As r decreases, SYX increases High value r Low value r

Confidence in Predictions We can also establish limits, with a specified probability, within which an individual’s actual score is likely to fall For example, given: Y’ = 2.78, SYX = .45 Y’GPA = 2.78 SAT = 650 Upper limit 3.66 1.96(SYX) 1.96(.45) + 2.78 = 3.66 95% -1.96(SYX) 1.90 Lower limit -1.96(.45) + 2.78 = 1.90

Confidence in Predictions Given an SAT = 650, we can be 95% confident the individual’s actual GPA will fall between 1.90 and 3.66 For such “confidence intervals” to make sense: the relationship between X and Y must be linear the bivariate distribution must be homoscedastic Y values must be normally distributed about Y’ n > 100

Ordinal and Nominal Measures of Association

Spearman r When you have two ordinal variables (e.g., ranks of candidates from two admissions counselors), you can determine the degree of association between the variables with Spearman r rs = 1 - 6 SD2 n(n2 - 1) where, D = difference between rankings n = number of pairs of ranks

Spearman r In case of ties, it is usual to assign to each tied observation the mean rank of the ranks the tied observations would have otherwise occupied For example, if you cannot decide whether applicant #8 or applicant #3 should be your 7th choice, then assign each a rank of 7.5 since they would have been your 7th and 8th choices had you been able to decide It is best to make the judges not have ties, but if they persist, it would be better to calculate Pearson r and interpret the value as Spearman r corrected for ties

Phi (f) When you have two true dichotomous variables (e.g., gender and employment), you can use f (AD - BC) (A+B)(C+D)(A+C)(B+D) √ f = = .35 M F (A) (B) (C) (D) Employed Unemployed 75 25 40 60 n = 200

Reliability The third major use of correlation is determining reliability - how consistently does a measuring instrument measure over time The most common is test-retest reliability in which a test is given at one time and, following some period (e.g., a week, month, year, etc.), the test is given a second time Other types of reliability include split-half alternate forms

Multiple Correlation and Regression Thus far we have examined the relationship between two variables, X and Y Multiple correlation and multiple regression examine the relationship between several X variables and a single Y variable (more commonly called “predictor” variables and the “criterion” variable) R = multiple correlation coefficient R2 = proportion of variability in Y scores that can be explained by the combined predictors Xi Y’ = a + b1X1 + b2X2