Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.

Similar presentations


Presentation on theme: "1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey."— Presentation transcript:

1 1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey Review of Covariance Covariance and correlation Correlation as parameter Correlation in data analysis Correlation when one or more variables is binary

2 2 G89.2228 Lect 8b Correlation The correlation coefficient is the best known measure of association between two variables »It measures linear association »It ranges from –1 (perfect inverse association), to 0 (no linear association) to +1 (perfect association) The correlation coefficient is also related to an important parameter of the bivariate normal distribution

3 3 G89.2228 Lect 8b Example: Okazaki’s Inferences from a survey Does self-construal account for relation of adverse functioning with Asian status? Survey of 348 students (simple r. sample) Self-reported Interdependence was correlated.53 with self-reported Fear of Negative Evaluation Illustrative plot (simulated) of r=.53

4 4 G89.2228 Lect 8b Review of Covariance as Statistical Concept We discussed covariance as a bivariate moment E[(X-  x )(Y-  y )] = Cov(X,Y) =  XY is called the population covariance. Covariance provides an index of linear dependence of two variables It is an expectation that depends on the joint bivariate density of X and Y, f(X,Y). »f(X,Y) says how likely are any pair of values of X and Y »When X and Y are binary, then f(X,Y) represents joint probabilities »Scatterplots give an impression of the joint density

5 5 G89.2228 Lect 8b Interpreting covariance as index of linear association When X and Y tend to increase together, Cov(X,Y)>0 When high levels of X go with low levels of Y, Cov(X,Y)<0 When X and Y are independent, Cov(X,Y) = 0. Note that there are cases when Cov(X,Y) take the value zero when X and Y are related nonlinearly. X Y +,+ -,- -,+ +,-

6 6 G89.2228 Lect 8b Correlation and Covariance Besides noticing its sign and whether it is zero, it is difficult to interpret the absolute magnitude of covariance Note that Cov(X,Y) is bounded by V(X) and V(Y): Correlation, Corr(X,Y), is a rescaled version of covariance that is bounded by –1 and +1 »It is the covariance of two variables that have variances of 1

7 7 G89.2228 Lect 8b Estimating covariance Since covariance is simply the expected average product of deviations from the means of X and Y, we estimate it using an average of products of deviations in the sample, if  x and  y are not known, we use: as an unbiased estimator

8 8 G89.2228 Lect 8b Product moment estimate of correlation The population correlation is defined as: The sample product moment correlation is obtained by inserting the sample estimates of the moments e.g.,.69=43.38/(11.36*5.49)

9 9 G89.2228 Lect 8b Correlation as a parameter Bivariate distribution functions describe not only the marginal distributions of each variable, but also the pattern of association between variables. The bivariate normal distribution function is parameterized by the means, variances and an index of linear association (covariance or correlation). In such cases, we can think about the population correlation,  (rho), as a parameter to be estimated. The estimate is obtained from a survey of multivariate normal observations. Product moment correlation (r) provides a reasonable (but biased) estimate of  r adj is less so.

10 10 G89.2228 Lect 8b Correlation as a summary of data Pearson product moment (PPM) correlations (r) can be computed as summaries of linear association even when population parameter is not of central interest. If one or more variables are binary, r may be affected by the marginal variance »Only under special conditions will r take the value of 1 or -1. »r is related to test statistics. »When both variables are binary, the PPM correlation is called phi, »When one variable is binary, the PPM is called a Point Biserial Correlation,

11 11 G89.2228 Lect 8b Other kinds of Correlation for categorical data Biserial, tetrachoric and polychoric correlations are alternatives to r that estimate what bivariate normal  might have been if the categories had been formed by cutting up a truly normal continuum into “High”, “Low” and so on. These estimates are often unstable, but they can be useful if the sample is large. a b c d ab c d

12 12 G89.2228 Lect 8b Example: ZZ1 and ZZ2 Continuous, CZ1, CZ2 Discrete


Download ppt "1 G89.2228 Lect 8b G89.2228 Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey."

Similar presentations


Ads by Google