Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Pearson Product-Moment Correlation Coefficient.

Similar presentations


Presentation on theme: "The Pearson Product-Moment Correlation Coefficient."— Presentation transcript:

1 The Pearson Product-Moment Correlation Coefficient

2 The regression coefficient is an asymmetrical statistic, one that gives different values for the model Y = f(X) and the model X = f(Y). The other major measure of bivariate association is the Pearson product-moment correlation coefficient (sometimes called "little r" for short). The correlation coefficient is a symmetrical statistic. That is, it simply describes the association between X and Y without worrying about whether Y = f(X) or X = f(Y). It would produce the same result in either case. Unlike the regression coefficient, whose values range from 0.0 to  , the correlation coefficient ranges from 0.0 when there is NO association between X and Y to  1.00 when there is PERFECT association (either direct or inverse).

3 To generate the second set of statistics describing association from the linear model, we partition the sum of squares. Graphically, we begin with a single data point, i, in two-dimensional space. Y i is its location on the scale of y (on the y-axis); below that is the predicted location of Y, Y i -hat. The dotted horizontal line (- - - -) is the location of the mean of Y. (When there is no association between X and Y, b = 0.0 and therefore a = Y-bar.) where b = 0,

4 i Y i } Y i - hat _} Y - - - - - - - - - - - - - - - X i

5

6 The vertical line represents the deviation of the i th observation from the mean of Y (i.e., the difference between Y i and Y-bar). The line of best fit bisects the deviation into its two mathematical components. The component ABOVE the line of best fit is the residual, the difference between Y i and Y i - hat, the actual location of the i th observation on the y-axis and the predicted location of this observation on the y-axis. This is the error (or residual) component.

7 The component BELOW the line of best fit is new. It is the difference between the predicted Y-value, Y i - hat, and the mean of Y (Y-bar). This component is called the regression component. Since these two components combined are the parts of the deviation of the i th observation from the mean of Y, the following is merely an algebraic summary of this relationship: deviation = regression component + error (residual)

8 Squaring both sides and summing across all observations yields or SS Total = SS Regression + SS Error

9 We can express the amount of association between X and Y as a ratio of the variance explained by the linear model to the total variance in Y to be explained. SS Total is the variance to be explained and SS Regression the variance accounted for by Y's relationship with X: R 2 YX = SS Regression / SS Total This is the Coefficient of Determination. Its values range from 0.0 when X and Y are independent (i.e., when Y-hat minus Y-bar = 0.0) to 1.0 with perfect association (i.e., SS Regression = SS Total ). It is interpreted as the percentage of the total variance in Y explained by Y's association with X.

10 In algebraic form, the Coefficient of Determination is calculated as The denominator is the product of the variance (standard deviation squared) of X and the variance of Y. The numerator is the square of the covariance and can be obtained by squaring the value from the following short-cut equation

11 In the time and temperature example, N = 3, the sum of X (time) was 23.5, the sum of the squared time values was 194.25, the sum of time values squared was 552.25, the sum of Y (temperature) was 248, and the sum of the cross-products was 1,911. s XY = (3)(1911) - (248)(23.5) / (3)(3 - 1) s XY = (5733 - 5828) / 6 s XY = - 95 / 6 s XY = - 15.833 Squaring to get the covariance squared, s 2 XY = 250.694

12 Next, we can use the short-hand equation to calculate the two variances: s 2 X = N  X 2 - (  X) 2 / N(N - 1) (Here, the absence of an index and counter on the summation sign implies summing from the first to the last value.) s 2 X = (3)(194.25) - (23.5) 2 / (3)(3- 1) s 2 X = (582.75) - (552.25) / (3)(2) s 2 X = 30.5 / 6 s 2 X = 5.083

13 And for the variance of Y: s 2 Y = N  Y 2 - (  Y) 2 / N(N - 1) s 2 Y = (3)(20,600) - (248) 2 / (3)(3 - 1) s 2 Y = (61,800) - (61,504) / 6 s 2 Y = 296 / 6 s 2 Y = 49.333

14 Now we can solve for the Coefficient of Determination: R 2 YX = s 2 XY / s 2 X s 2 Y R 2 YX = 250.694 / (5.083)(49.333) R 2 YX = 250.694 / 250.760 R 2 YX = 0.9997 This is interpreted as meaning that 99.9 percent of the variance in afternoon high temperature is statistically explained by the association of this variable with the time of the sun's first appearance. This is an extremely high—and extremely unlikely—value, since R 2 YX varies from a minimum of 0.0 (no variance explained) to a maximum of 1.0 (100 percent if ALL the variance is explained).

15 If the Coefficient of Determination is the percentage of the variance in Y explained by its association with X, then the converse is the percentage of variance in Y NOT explained by its association with X. This is called the Coefficient of Nondetermination, simply K YX = 1 - R 2 YX In this example, the percentage of variance NOT explained is 1 - 0.999, or less than 0.1 percent.

16 Conceptually, the Pearson product-moment correlation coefficient is the square root of the Coefficient of Determination: For raw data, the correlation coefficient is found by r XY = s XY / s X s Y where the numerator is the covariance and the denominator is the product of the standard deviations of X and Y. In our example, r XY = - 15.833 / (2.255) (7.024) r XY = - 15.833 / 15.839 r XY = - 0.9996

17 Notice that, unlike the Coefficient of Determination which only takes positive values, the correlation coefficient varies between 0.0 and  1.00. Here, a correlation of - 0.9996 shows an extremely STRONG INVERSE relationship. Finally, in the bivariate situation, the regression coefficient (i.e., slope, b) and the correlation coefficient (r XY ) are related, as follows: b = r XY (s Y / s X ) and r XY = b (s X / s Y )

18 In the present little example, b = (- 0.968) (7.024 / 2.255) b = (- 0.968) (3.115) b = - 3.015 and r XY = - 3.115 (2.255 / 7.024) r XY = - 3.115 (0.321) r XY = - 0.999

19 SAS Time and Temperature Example LIBNAME perm 'a:\'; LIBNAME library 'a:\'; OPTIONS NODATE NONUMBER PS=66; PROC CORR DATA=perm.weather NOSIMPLE; VAR temp time; TITLE1 'Time and Temperature Example'; RUN;

20 Time and Temperature Example Correlation Analysis 2 'VAR' Variables: TIME TEMP Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 3 TIME TEMP TIME 1.00000 -0.99983 0.0 0.0116 TEMP -0.99983 1.00000 0.0116 0.0

21 Time and Temperature Example Correlation Analysis 2 'VAR' Variables: TIME TEMP Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / Number of Observations TIME TEMP TIME 1.00000 -0.99983 0.0 0.0116 2 3 TEMP -0.99983 1.00000 0.0116 0.0 3 2

22 Correlation Example For the following data on ten families, answer the questions below. —————————————————————————————————————————————————————————————————————————————— Annual Income _ Number of _ _ _ Family (in $1,000) (X i - X) 2 Children (Y i - Y) 2 (X i - X)(Y i - Y) X Y —————————————————————————————————————————————————————————————————————————————— 1 25 0 2 17 0 3 20 1 4 14 2 5 11 2 6 10 3 7 6 4 8 8 5 9 8 6 10 4 7 --- ---  X =  Y = __ X =Y = —————————————————————————————————————————————————————————————————————————————— 1. What is the value of the correlation coefficient? ______________ 2. What is the value of the Coefficient of Determination? ______________ 3. What is the value of the Coefficient of Nondetermination? ______________

23 Correlation Example Answers For the following data on ten families, answer the questions below. —————————————————————————————————————————————————————————————————————————————— Annual Income _ Number of _ _ _ Family (in $1,000) (X i - X) 2 Children (Y i - Y) 2 (X i - X)(Y i - Y) X Y —————————————————————————————————————————————————————————————————————————————— 1 25 161.29 0 9-38.1 2 17 22.09 0 9-14.1 3 20 59.29 1 4-15.4 4 14 2.89 2 1 -1.7 5 11 1.69 2 1 1.3 6 10 5.29 3 0 0.0 7 6 39.69 4 1 -6.3 8 8 18.49 5 4 -8.6 9 8 18.49 6 9-12.9 10 4 68.89 7 16-33.2 --- ---  X = 123  Y = 30 _ _ X = 12.3 Y = 3.0  = 398.1  = 54  = -129 —————————————————————————————————————————————————————————————————————————————— 1. What is the value of the correlation coefficient? -0.880 2. What is the value of the Coefficient of Determination?0.774 3. What is the value of the Coefficient of Nondetermination?0.226


Download ppt "The Pearson Product-Moment Correlation Coefficient."

Similar presentations


Ads by Google