Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation. The sample covariance matrix: where.

Similar presentations


Presentation on theme: "Correlation. The sample covariance matrix: where."— Presentation transcript:

1 Correlation

2 The sample covariance matrix: where

3 The sample correlation matrix: where

4 Note: where

5 Tests for Independence and Non-zero correlation

6 Tests for Independence The test statistic If independence is true then the test statistic t will have a t - distributions with = n –2 degrees of freedom. The test is to reject independence if: Test for zero correlation (Independence between a two variables)

7 The test statistic If H 0 is true the test statistic z will have approximately a Standard Normal distribution Test for non-zero correlation (H 0 :    We then reject H 0 if:

8 Partial Correlation Conditional Independence

9 Recall has p-variate Normal distribution with mean vector and Covariance matrix Then the conditional distribution of given is q i -variate Normal distribution with mean vector and Covariance matrix

10 is called the matrix of partial variances and covariances. is called the partial covariance (variance if i = j) between x i and x j given x 1, …, x q. is called the partial correlation between x i and x j given x 1, …, x q.

11 Let denote the sample Covariance matrix Let is called the sample partial covariance (variance if i = j) between x i and x j given x 1, …, x q.

12 Also is called the sample partial correlation between x i and x j given x 1, …, x q.

13 The test statistic If independence is true then the test statistic t will have a t - distributions with = n – p - 2 degrees of freedom. The test is to reject independence if: Test for zero partial correlation correlation (Conditional independence between a two variables given a set of p Independent variables) = the partial correlation between y i and y j given x 1, …, x p.

14 The test statistic If H 0 is true the test statistic z will have approximately a Standard Normal distribution Test for non-zero partial correlation We then reject H 0 if:

15 The Multiple Correlation Coefficient Testing independence between a single variable and a group of variables

16 has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector Definition The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

17 This vector has a bivariate Normal distribution with mean vector and Covariance matrix We are interested if the variable y is independent of the vector Derivation The multiple correlation coefficient is the maximum correlation between y and a linear combination of the components of

18 Thus we want to choose to maximize The multiple correlation coefficient is the maximum correlation between y and The correlation between y and Equivalently

19 Note:

20 The multiple correlation coefficient is independent of the value of k.

21 We are interested if the variable y is independent of the vector The sample Multiple correlation coefficient Then the sample Multiple correlation coefficient is

22 Testing for independence between y and The test statistic If independence is true then the test statistic F will have an F- distributions with 1 = p degrees of freedom in the numerator and 1 = n – p + 1 degrees of freedom in the denominator The test is to reject independence if:

23 Canonical Correlation Analysis

24 The problem Quite often when one has collected data on several variables. The variables are grouped into two (or more) sets of variables and the researcher is interested in whether one set of variables is independent of the other set. In addition if it is found that the two sets of variates are dependent, it is then important to describe and understand the nature of this dependence. The appropriate statistical procedure in this case is called Canonical Correlation Analysis.

25 Canonical Correlation: An Example In the following study the researcher was interested in whether specific instructions on how to relax when taking tests and how to increase Motivation, would affect performance on standardized achievement tests Reading, Language and Mathematics

26 A group of 65 third- and fourth-grade students were rated after the instruction and immediately prior taking the Scholastic Achievement tests on: In addition data was collected on the three achievement tests how relaxed they were (X 1 ) and how motivated they were (X 2 ). Reading (Y 1 ), Language (Y 2 ) and Mathematics (Y 3 ). The data were tabulated on the next page

27

28 Definition: (Canonical variates and Canonical correlations) have p-variate Normal distribution with and Let be such that U 1 and V 1 have achieved the maximum correlation  1. and Then U 1 and V 1 are called the first pair of canonical variates and  1 is called the first canonical correlation coefficient.

29 derivation: ( 1 st pair of Canonical variates and Canonical correlation) has covariance matrixThus Now

30 derivation: ( 1 st pair of Canonical variates and Canonical correlation) has covariance matrixThus Now hence

31 Thus we want to choose is at a maximum so that is at a maximum or Let

32 Computing derivatives and

33 Thus This shows thatis an eigenvector of k is the largest eigenvalue of andis the eigenvector associated with the largest eigenvalue.

34 Also and

35 Summary: are found by finding, eigenvectors of the matrices associated with the largest eigenvalue (same for both matrices) The first pair of canonical variates The largest eigenvalue of the two matrices is the square of the first canonical correlation coefficient  1

36 Note: then have exactly the same eigenvalues (same for both matrices) Proof: and

37 The remaining canonical variates and canonical correlation coefficients are found by finding, so that 1. (U 2,V 2 ) are independent of (U 1,V 1 ). The second pair of canonical variates 2. The correlation between U 2 and V 2 is maximized The correlation,  2, between U 2 and V 2 is called the second canonical correlation coefficient.

38 are found by finding, so that 1. (U i,V i ) are independent of (U 1,V 1 ), …, (U i-1,V i-1 ). The i th pair of canonical variates 2. The correlation between U i and V i is maximized The correlation,  2, between U 2 and V 2 is called the second canonical correlation coefficient.

39 derivation: ( 2 nd pair of Canonical variates and Canonical correlation) has covariance matrix Now

40 and maximizing Is equivalent to maximizing subject to Using the Lagrange multiplier technique

41 Now and alsogives the restrictions

42 These equations can used to show that are eigenvectors of the matrices associated with the 2 nd largest eigenvalue (same for both matrices) The 2 nd largest eigenvalue of the two matrices is the square of the 2 nd canonical correlation coefficient  2

43 Coefficients for the i th pair of canonical variates, are eigenvectors of the matrices associated with the i th largest eigenvalue (same for both matrices) The i th largest eigenvalue of the two matrices is the square of the i th canonical correlation coefficient  i continuing

44 Example Variables relaxation Score (X 1 ) motivation score (X 2 ). Reading (Y 1 ), Language (Y 2 ) and Mathematics (Y 3 ).

45 Summary Statistics

46 Canonical Correlation statistics Statistics

47 continued

48 Summary U 1 = 0.197 Relax + 0.979 Mot V 1 = 0.504 Read + 0.900 Lang + 0.565 Math  1 =.592 U 2 = 0.980 Relax + 0.203 Mot V 2 = 0.391 Math - 0.361 Read - 0.354 Lang  2 =.159


Download ppt "Correlation. The sample covariance matrix: where."

Similar presentations


Ads by Google