Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 7 -Part 1 Correlation. Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions:

Similar presentations


Presentation on theme: "Chapter 7 -Part 1 Correlation. Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions:"— Presentation transcript:

1 Chapter 7 -Part 1 Correlation

2 Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions: yIs it a linear or curvilinear correlation? yIs it a positive or negative relationship? yHow strong is the relationship? zSolving these questions with t scores and r, the estimated correlation coefficient derived from the tx and ty scores of individuals in a random sample.

3 Correlational research – how to start. zTo begin a correlational study, we select a population or, far more frequently, select a random sample from a population. z(Since we use samples most of the time, for the most part, we will use the formulae and symbols for computing a correlation from a sample.) zWe then obtain two scores from each individual, one score on each of two variables. These are usually variables that we think might be related to each other for interesting reasons). We call one variable X and the other Y.

4 Correlational research: comparing t X & t Y scores zWe translate the raw scores on the X variable to t scores (called t X scores) and raw scores on the Y variable to t Y scores. ySo each individual has a pair of scores, a t X score and a t Y score. zYou determine how similar or different the t X and t Y scores in the pairs are, on the average, by subtracting t Y from t X, then squaring, summing, and averaging the t X and t Y differences.

5 The estimated correlation coefficient, Pearson’s r zWith a simple formula, you transform the average squared differences between the t scores to Pearson’s correlation coefficient, r zPearson’s r indicates (with a single number), both the direction and strength of the relationship between the two variables in your sample. zr also estimates the correlation in the population from which the sample was drawn yIn Ch. 8, you will learn when you can use r that way.

6 Going from pairs of raw scores to r: Linearity - A preliminary question. zOnce you have scores on two variables, you ask, “Is this a linear or curvilinear relationship?” zPsychology is a relatively new science and this is an intro stat course yFor both reasons, you will only learn how to deal with linear relationships between two variables and save correlation with three or more variables and curvilinear relationships for grad school. BUT YOU MUST KNOW WHAT A LINEAR RELATIONSHIP IS, AND HOW TO RECOGNIZE A NONLINEAR (CURVILINEAR) CORRELATION.

7 Linearity vs. Curvilinearity zIn a linear relationship, as scores on one variable go from low to high, scores on the other variable either generally increase or generally decrease. In a curvilinear relationship, as scores on one variable go from low to high, scores on the other variable change directions. They can go 1.)down and then up, 2.) up and then down, 3.) up and down and then up again, 4.) up or down then flat. ETC.

8 Examples of linear relationships. zFor example, think of the relationship of the size of a pleasure boat (X) and its cost (Y). As one variable (boat size) increases, scores on the other variable (cost) also increase. zAnother example of a linear relationship: the relationship between the size of a car and the number of miles per gallon it gets. In general, as cars get gradually larger (X), they tend to get fewer miles per gallon (Y).

9 A curvilinear relationship zIn a curvilinear relationship, as scores on the X variable go gradually from low to high, the Y variable changes direction. zFor example, think of the relationship between age (X) and height (Y). zAs age increases from 0-14 or so, height increases also. zBut then people stop growing. As age increases, height stays the same. zThus the Y variable, height, changes direction. It goes from gradually rising to flat. zIf you graph age and height, the best fitting line is a curved line.

10 Correlation Characteristics: Which line best shows the relationship between age (X) and height (Y) Linear vs Curvilinear

11 Another non-linear relationship: shortstops and linemen: great shortstops may be too small to be great football lineman. Football potential Terrible Average Very Good Excellent Good Poor Baseball skill Terrible Very Poor Poor Average Good Very Good Excellent David Ben Ed Frank Chuck Al George Is this a linear relationship?

12 Plot the dots! zTo check whether a relationship is linear, make a graph and place the scores on it. zThat’s what I mean by “Plot the dots.” zIf you really want to know what is going on with data, Plot the dots! zHere is a graph for the baseball skills and football potential data.

13 When you plot the dots, is this linear? * Ben* Ed * Frank * Chuck * Al * David * George Excellent Terrible Very Good Good Average Poor Very Poor ExcellentTerribleVery GoodGoodAveragePoorVery Poor Football Skill Baseball Skill NO! It is best described by a curved line. It is a curvilinear relationship!

14 After you know a correlation is linear, there are other two questions: Direction and Strength of a correlation. But first, a definition of high and low scores. zDefinition of high and low scores: yHigh scores are scores above the mean. They are represented by positive t scores. yLow scores are scores below the mean of each variable. They are represented by negative t scores.

15 Positive relationships zIn a positive relationship, as X scores gradually increase, Y scores tend to increase as well. Example: The longer a sailboat is, the more it tends to cost. As length goes up, price tends to go up. zIn a positive correlation, X and Y scores tend to be on the same side of their respective means. zAs a result, the t X and t Y scores tend to be similar and the difference between them (t X – t Y ) tends to be small. zSince (t X – t Y ) is small, the squared difference between them, (t X – t Y ) 2 also tends to be small

16

17 Graphing a positive relationship. zIn a positive correlation high scores on X tend to go with high scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores also generally get higher. So, the line will tend to rise as it runs. zRemember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. zIf a line rises from left to right, “rise” is positive. Run is always positive. So a positive rise divided by an (always) positive run results in a positive slope. (That’s why we call it a “positive” correlation.)

18 Positive vs Negative scatterplot 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Negative relationship Positive relationship

19 Graphic display of a strong POSITIVE correlation. 3 -3 2 1 0 -2 3 -3 2 1 0 -2

20 Negative relationships zIn a negative relationship, as X scores gradually increase, Y scores tend to decrease. Example: The more years a sailboat is used, the less it tends to cost. As use goes up, price tends to go down. zIn a negative correlation, X and Y scores tend to be on opposite sides of their respective means. zAs a result, the t X and t Y scores tend to be dissimilar and the difference between them (t X – t Y ) tends to be large. zSince (t X – t Y ) is large, the squared difference between them, (t X – t Y ) 2 also tends to be large.

21

22 Graphing a negative relationship zIn a negative correlation, high scores on X tend to go with low scores on Y. On a graph, as the line runs from left to right, scores increase on the X axis. At the same time, Y scores get lower. So, the line will tend to fall as it runs. zRemember from math, slope equals how far a line rises on the Y axis for each unit it moves from left to right or “runs” along the X axis. zIf a line falls from left to right, “rise” is negative. Run is always positive. So a negative rise divided by an (always) positive run results in a negative slope. (That’s why we call it a “negative” correlation.)

23 Positive vs Negative scatterplot 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Negative relationship Positive relationship

24 Summary: zWhen t scores are consistently more similar than different, we have a positive correlation. On a graph the dots will rise from your left to your right. zWhen t scores are consistently more different than similar, we have a negative correlation. On a graph the dots will fall from your left to your right.

25 Positive vs Negative scatterplot 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Negative relationship Positive relationship

26 How strong is the relationship between the t X and t Y scores? zHere the question is about the consistency with which t X and t Y scores are either similar or dissimilar.

27 t scores: sign and size zThere are two aspects to the consistency of the relationship between t X and t Y scores. yFirst, are the t scores consistently of the same sign (positive correlation) or opposite signs (negative correlation). yIf they are almost always one way or the other, you have at least a moderately strong relationship. yOn the other hand, if you sometimes see t scores on the same side of the mean and sometimes on opposite sides, you have a relatively weak correlation.

28 t scores: sign and size zIf there is a consistent pattern of same signed t scores (positive correlation) or a consistent pattern of opposite signed t scores (negative correlation), then whether the t X and t Y scores are about the same distance from the mean comes into play. zThe large majority of t scores (usually well over 95%, range from –2.50 to + 2.50 zGiven a consistent positive or negative correlation, the more similar in size the t scores, the stronger the correlation.

29 Positive correlations: zPerfect: t X and t Y scores are all the same sign and are identical in size. zStrong: t X and t Y scores are almost all the same sign and are fairly similar in size. zModerate: t X and t Y scores are predominately the same sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. zWeak: t X and t Y scores are a little more often the same sign than opposite in sign. Nothing can be said about size.

30 Negative correlations: zPerfect: t X and t Y scores are all of the opposite sign and are identical in size. zStrong: t X and t Y scores are almost all of opposite sign and are fairly similar in size. zModerate: t X and t Y scores are predominately opposite in sign. This is especially true for pairs in which one of the values is one or more standard deviations from the mean. Size may be fairly dissimilar. zWeak: t X and t Y scores are a little more often of opposite signs than the same in sign. Nothing can be said about size.

31 Unrelated (independent) variables zWhen the size and sign of the t X scores bears no relationship to the size and sign of the t Y scores, the variables are unrelated. zWe also can call the variables “independent of” or “orthogonal to” each other. The three terms, unrelated, independent and orthogonal are synonymous in this context.

32 Graphing it on t axes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. zPerfect - scores fall exactly on a straight line whose slope will be +1.00 or –1.00. zStrong - most scores fall near the line whose slope will be close to +.750 or -.750. zModerate - some are near the line, some not. The slope of the line will be close to +.500 or -.500.

33 Graphing it on t axes: The strength of a relationship tells us approximately how the dots representing pairs of t scores will fall around a best fitting line. zWeak – some scores fall fairly close to the line, but others fall quite far from it. The slope of the line will be close to +.250 or -.250 zIndependent - the scores are not close to the line and form a circular or square pattern. The best fitting line will be the X axis, a line with a slope of 0.000.

34 Strength of a relationship 1.5 -1.5 1.0 0.5 0 -0.5 1.5 -1.5 1.0 0.5 0 -0.5 Perfect

35 Strength of a relationship 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Very Strong

36 Strength of a relationship 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Moderate

37 Strength of a relationship 3 -3 2 1 0 -2 3 -3 2 1 0 -2 Independent

38 What is this relationship? 3 -3 2 1 0 -2 3 -3 2 1 0 -2

39 What is this? 3 -3 2 1 0 -2 3 -3 2 1 0 -2

40 What is this? 3 -3 2 1 0 -2 3 -3 2 1 0 -2

41 What is this? 3 -3 2 1 0 -2 3 -3 2 1 0 -2

42 Computing the correlation coefficient.

43 Comparing apples to oranges? Use Z or t scores! zYou can use correlation to look for the relationship between ANY two values that you can measure of a single subject. zHowever, there may not be any relationship (independent). zA correlation tells us if scores are consistently similar on two measures, consistently different from each other, or have no real pattern

44 Comparing apples to oranges? Use t scores! zTo compare scores on two different variables, you transform them into Z X and Z Y scores if you are studying a population or t X and t Y scores if you have a sample. zZ X and Z Y scores (or t X and t Y scores) can be directly compared to each other to see whether they are consistently similar, consistently quite different, or show no consistent pattern of similarity or difference

45 Comparing variables zAnxiety symptoms, e.g., heartbeat, with number of hours driving to class. zHat size with drawing ability. zMath ability with verbal ability. zNumber of children with IQ. zTurn them all into Z or t scores

46 Pearson’s Correlation Coefficient zcoefficient - noun, a number that serves as a measure of some property. zThe correlation coefficient indexes BOTH the consistency and direction of a correlation with a single number

47 Pearson’s rho zPearson’s rho (  ) is the parameter that characterizes the strength and direction of a linear relationship (and only a linear relationship) between two variables. To compute rho, you must have the entire population. Then you can compute sigma, mu, Z scores and rho. zThe formula: rho= 1 -(1/2  ( Z X - Z Y ) 2 / (N P )) where N P is the number of pairs of Z scores in the population z In English: The correlation coefficient equals 1 minus half the average squared distance between the Z scores.

48 Pearson’s rho zWhen you have a perfect positive correlation, the Z scores will be identical in size and sign. So the average squared distance will be zero and rho = 1.000-1/2(0.000) = 1.000 zWhen you have a perfect negative correlation, the Z scores will be identical in size and opposite in sign. It can be proven algebraically that the average squared distance in that case will be 4.000: rho = 1.000-1/2(4.000) = -1.000 z When you have two totally independent variables, the average squared distance will be 2.000 (halfway between 0.000 and 4.000). Thus, rho = 1.000-1/2(2.000) = 0.000

49 Pearson’s Correlation Coefficient zThus, rho varies from -1.000 (perfect negative correlation to 0.000 (independent variables) to +1.000 (perfect positive correlation). zA negative value indicates a negative relationship; a positive value indicates a positive relationship. zValues of r close to 1.000 or -1.000 indicate a strong (consistent) relationship; values close to 0.000 indicate a weak (inconsistent) or independent relationship.

50 Estimating rho with r zComputing rho involves finding the actual average squared distance between the Z X and Z Y scores in the whole population. zIn computing r, we are estimating rho.

51 The formula for r zPearsons r is a least squares, unbiased estimate of rho, based on the relationships found between t X and t Y scores in a random sample. zr =1 - (1/2  ( t X - t Y ) 2 / (n P - 1)) where n P -1 equals one less than the number of pairs of t scores in the sample. yIn English: Pearson’s r equals 1.000 minus half the estimated average squared difference between the Z scores in the population based on squared differences between the t scores in the sample.

52 Look at those formulae again. zrho= 1 -(1/2  (Z X - Z Y ) 2 / (N P )) where N P is the number of pairs of Z scores in the population z  (Z X - Z Y ) 2 / (N P ) is the average squared distance between the Z scores. zThe rest of the formula, simply transforms the average squared distance between the Z scores into a variable that goes from +1.000 to –1.000.

53 Look at those formulae again. r =1 - (1/2  (t X - t Y ) 2 / (n P - 1)) where n P -1 equals one less than the number of pairs of t scores in the sample. REMEMBER, t scores are estimated Z scores z.  (t X - t Y ) 2 / (n P - 1)) is a least squared, unbiased estimate of the average squared difference between the Z scores in the population based on the differences between the t X and t Y scores in a random sample. zThe rest of the formula, simply transforms the estimated average squared distance between the Z scores into a variable that goes from +1.000 to –1.000.

54 Thus, r, the least squared, unbiased estimate of rho, is basically an estimate of the average squared difference between the Z X and Z Y scores in the population transformed into a variable that goes from -1.00 to +1.00.

55 Similarities of r and rho zr and rho vary from -1.000 to +1.000. zFor both r and rho, a negative value indicates a negative relationship; a positive value indicates a positive relationship. zValues of r or rho close to 1.000 or -1.000 indicate a strong (consistent) relationship; values close to 0.000 indicate a weak (inconsistent) or independent relationship.

56 Since we almost always are studying random samples, not populations, we almost always compute Pearson’s r, not Pearson’s rho.

57 r, strength and direction Perfect, positive+1.00 Strong, positive+.75 Moderate, positive+.50 Weak, positive+.25 Independent.00 Weak, negative -.25 Moderate, negative -.50 Strong, negative -.75 Perfect, negative -1.00

58 Calculating Pearson’s r zSelect a random sample from a population; obtain scores on two variables, which we will call X and Y. zConvert all the scores into t scores.

59 Calculating Pearson’s r zFirst, subtract the t Y score from the t X score in each pair. zThen square all of the differences and add them up, that is,  (t X - t Y ) 2.

60 Calculating Pearson’s r zEstimate the average squared distance between Z X and Z Y by dividing by the sum of squared differences between the t scores by (n P - 1).  ( t X - t Y ) 2 / (n P - 1) zTo turn this estimate into Pearson’s r, use the formula r =1 - (1/2  ( t X - t Y ) 2 / (n P - 1))

61 Example: Calculate t scores for X DATA 2 4 6 8 10  X=30 N= 5 X=6.00 MS W = 40.00/(5-1) = 10 s X = 3.16 (X - X) 2 16 4 0 4 16 X - X -4 -2 0 2 4 t x =(X-X)/ s -1.26 -0.63 0.00 0.63 1.26 SS W = 40.00

62 Calculate t scores for Y DATA 9 11 10 12 13  Y=55 N= 5 Y=11.00 MS W = 10.00/(5-1) = 2.50 s Y = 1.58 (Y - Y) 2 4 0 1 4 Y - Y -2 -0 +1 +2 (t y =Y - Y) / s -1.26 0.00 -0.63 0.63 1.26 SS W = 10.00

63 Calculate r t Y -0.63 -1.26 -0.63 0.63 1.26 t X -1.26 -0.63 0.00 0.63 1.26 t X - t Y 0.00 -0.63 0.63 0.00 (t X - t Y ) 2 0.00 0.40 0.00  (t X - t Y ) 2 / (n P - 1)=0.200 r = 1.000 - (1/2 * (  (t X - t Y ) 2 / (n P - 1))) r = 1.000 - (1/2 *.200) = 1 -.100 =.900  (t X - t Y ) 2 =0.80 This is a very strong, positive relationship.

64 By the way - True graphs. zCh.7 has true graphs, displays in which each dot stands for a score on two (in this case) or more (in more advanced cases) variables. zIn Ch. 1 through Ch. 6, most of the figures have represented the frequency of scores on a single variable. zFormally, displays of frequencies are figures, but they are not graphs.


Download ppt "Chapter 7 -Part 1 Correlation. Correlation Topics zCorrelational research – what is it and how do you do “co-relational” research? zThe three questions:"

Similar presentations


Ads by Google