# Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

## Presentation on theme: "Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)"— Presentation transcript:

Correlation and Covariance R. F. Riesenfeld (Based on web slides by James H. Steiger)

Goals Introduce concepts of Introduce concepts of Covariance Covariance Correlation Correlation Develop computational formulas Develop computational formulas 2R F Riesenfeld Sp 2010CS5961 Comp Stat

Covariance Variables may change in relation to each other Variables may change in relation to each other measures how much the movement in one variable predicts the movement in a corresponding variable Covariance measures how much the movement in one variable predicts the movement in a corresponding variable 3R F Riesenfeld Sp 2010CS5961 Comp Stat

Smoking and Lung Capacity Example: investigate relationship between and Example: investigate relationship between cigarette smoking and lung capacity Data: sample group response data on smoking habits, measured lung capacities, respectively Data: sample group response data on smoking habits, and measured lung capacities, respectively R F Riesenfeld Sp 2010CS5961 Comp Stat4

Smoking v Lung Capacity Data 5 N Cigarettes ( X )Lung Capacity ( Y ) 1045 2542 31033 41531 52029 R F Riesenfeld Sp 2010CS5961 Comp Stat

Smoking and Lung Capacity 6

Smoking v Lung Capacity Observe that as smoking exposure goes up, corresponding lung capacity goes down Observe that as smoking exposure goes up, corresponding lung capacity goes down Variables inversely Variables covary inversely and quantify relationship Covariance and Correlation quantify relationship 7R F Riesenfeld Sp 2010CS5961 Comp Stat

Covariance Variables that inversely, like smoking and lung capacity, tend to appear on opposite sides of the group means Variables that covary inversely, like smoking and lung capacity, tend to appear on opposite sides of the group means When smoking is above its group mean, lung capacity tends to be below its group mean. When smoking is above its group mean, lung capacity tends to be below its group mean. Average measures extent to which variables covary, the degree of linkage between them Average product of deviation measures extent to which variables covary, the degree of linkage between them 8R F Riesenfeld Sp 2010CS5961 Comp Stat

The Sample Covariance Similar to variance, for theoretical reasons, average is typically computed using ( N -1 ), not N. Thus, Similar to variance, for theoretical reasons, average is typically computed using ( N -1 ), not N. Thus, 9R F Riesenfeld Sp 2010CS5961 Comp Stat

Cigs ( X )Lung Cap ( Y ) 045 542 1033 1531 2029 1036 Calculating Covariance R F Riesenfeld Sp 2010CS5961 Comp Stat10

Calculating Covariance Cigs (X ) Cap (Y ) 0-10-90945 5-5-30642 1000-333 155-25-531 2010-70-729 = - 215 R F Riesenfeld Sp 2010CS5961 Comp Stat11

Evaluation yields, Evaluation yields, 12 Covariance Calculation (2) R F Riesenfeld Sp 2010CS5961 Comp Stat

Covariance under Affine Transformation 13R F Riesenfeld Sp 2010CS5961 Comp Stat

14 CovarianceunderAffineTransf Covariance under Affine Transf (2) R F Riesenfeld Sp 2010CS5961 Comp Stat

(Pearson) Correlation Coefficient r xy Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data. Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data. 15R F Riesenfeld Sp 2010CS5961 Comp Stat

Alternative (common) Expression 16R F Riesenfeld Sp 2010CS5961 Comp Stat

Computational Formula 1 17R F Riesenfeld Sp 2010CS5961 Comp Stat

Computational Formula 2 18R F Riesenfeld Sp 2010CS5961 Comp Stat

Table for Calculating r xy 19 Cigs ( X ) X 2 XYY 2 Cap (Y ) 0 0 0202545 5 25210176442 10100330108933 1522546596131 2040058084129 = 5075015856680180 R F Riesenfeld Sp 2010CS5961 Comp Stat

Computing from Table Computing r xy from Table 20R F Riesenfeld Sp 2010CS5961 Comp Stat

Computing Correlation 21R F Riesenfeld Sp 2010CS5961 Comp Stat

Conclusion r xy = -0.96 implies almost certainty smoker will have diminish lung capacity r xy = -0.96 implies almost certainty smoker will have diminish lung capacity Greater smoking exposure implies greater likelihood of lung damage Greater smoking exposure implies greater likelihood of lung damage 22R F Riesenfeld Sp 2010CS5961 Comp Stat

End Covariance & Correlation 23 Notes R F Riesenfeld Sp 2010CS5961 Comp Stat

Similar presentations