# A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

## Presentation on theme: "A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and."— Presentation transcript:

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and a single set of r predictor variables z 1,…,z r. Each of the m responses is assumed to follow its own regression model, i.e., Y 1 = B 01 + B 11 z 1 + B 21 z 2 +  + B r1 z r Y 2 = B 02 + B 12 z 1 + B 22 z 2 +  + B r2 z r  Y1 = B 01 + B 11 z 1 + B 21 z 2 +  + B r1 z r where V. Multivariate Linear Regression

Conceptually, we can let [z j0, z j1, …, z jr ] denote the values of the predictor variables for the j th trial and be the responses and errors for the j th trial. Thus we have an n x (r + 1) design matrix

If we now set

and the multivariate linear regression model is Note also that the m observed responses on the j th trial have covariance matrix with

The ordinary least squares estimates  are found in a manner analogous to the univariate case – we begin by taking collecting the univariate least squares estimates yields ^ ~ Now for any choice of parameters the resulting matrix of errors is

The resulting Error Sums of Squares and Crossproducts is We can show that the selection b (i) =  (i) minimizes the i th diagonal sum of squares ^ ~~ i.e., are both minimized. generalized variance

so we have matrices of predicted values and we have a resulting matrices of residuals Note that the orthogonality conditions among residuals, predicted values, and columns of the design matrix which hold in the univariate case are also true in the multivariate case because

… which means the residuals are perpendicular to the columns of the design matrix and to the predicted values Furthermore, because we have total sums of squares and crossproducts predicted sums of squares and crossproducts residual (error) sums of squares and crossproducts

Example – suppose we had the following six sample observations on two independent variables (palatability and texture) and two dependent variables (purchase intent and overall quality): Use these data to estimate the multivariate linear regression model for which palatability and texture are independent variables while purchase intent and overall quality are the dependent variables

We wish to estimate Y 1 = B 01 + B 11 z 1 + B 21 z 2 and Y 2 = B 02 + B 12 z 1 + B 22 z 2 jointly. The design matrix is

so and

so

and so

This gives us estimated values matrix

and residuals matrix Note that each column sums to zero!

B. Inference in Multivariate Regression The least squares estimators  = [  (1) |  (2) |  |  (m) ] of the multivariate regression model have the following properties - if the model is of full rank, i.e., rank(Z)= r + 1 < n. Note that  and  are also uncorrelated. ~ ~~~ ~ ~ ~

This means that, for any observation z 0 is an unbiased estimator, i.e., ~ We can also determine from these properties that the estimation errors have covariances

Furthermore, we can easily ascertain that i.e., the forecasted vector Y 0 associated with the values of the predictor variables z 0 is an unbiased estimator of Y 0. The forecast errors have covariance ~ ~ ^ ~

Thus, for the multivariate regression model with full rank (Z) = r + 1, n  r + 1 + m, and normally distributed errors , is the maximum likelihood estimator of  and ~ ~ ~ where the elements of  are ~

Also, the maximum likelihood estimator of  is independent of the maximum likelihood estimator of the positive definite matrix  given by and all of which provide additional support for using the least squares estimate – when the errors are normally distributed ~ ~ ^ are the maximum likelihood estimators of

These results can be used to develop likelihood ratio tests for the multivariate regression parameters. The hypothesis that the responses do not depend on predictor variables z q+1, z q+2,…, z r is ~ (q + 1) x m (r - q) x m If we partition Z in a similar manner m x (q + 1)m x (r - q) Big Beta (2)

we can write the general model as The extra sum of squares associated with  (2) are ~ where and ^

The likelihood ratio for the test of the hypothesis H 0 :  (2) = 0 is given by the ratio of generalized variances ~ which is often converted to Wilks’ Lambda statistic ~

Finally, for the multivariate regression model with full rank (Z) = r + 1, n  r + 1 + m, normally distributed errors , and the null hypothesis is true (so n(  1 –  ) ~ W q,r-q (  )) ~ when n – r and n – m are both large. ~ ~~~ ^^

If we again refer to the Error Sum of Squares and Crossproducts as E = n  and the Hypothesis Sum of Squares and Crossproducts as H = n(  1 -  ) then we can define Wilks’ lambda as ~ ~ ^ ~ ~ where  1   2     s are the ordered eigienvalues of HE -1 where s = min(p, r - q). ~ ~

There are other similar tests (as we have seen in our discussion of MANOVA): Each of these statistics is an alternative to Wilks’ lambda and perform in a very similar manner (particularly for large sample sizes). Pillai’s Trace Hotelling-Lawley Trace Roy’s Greatest Root

Example – For our previous data (the following six sample observations on two independent variables - palatability and texture - and two dependent variables - purchase intent and overall quality to test the hypotheses that i) palatability has no joint relationship with purchase intent and overall quality and ii) texture has no joint relationship with purchase intent and overall quality.

We first test the hypothesis that palatability has no joint relationship with purchase intent and overall quality, i.e., H 0 :  (1) = 0 The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances For ease of computation, we’ll use the Wilks’ lambda statistic ~

The error sum of squares and crossproducts matrix is and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is

so the calculated value of the Wilks’ lambda statistic is

The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is at  = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non- rejection. Also, the approximate p-value of this chi- square test is 0.630174 – note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).

We next test the hypothesis that texture has no joint relationship with purchase intent and overall quality, i.e., H 0 :  (2) = 0 The likelihood ratio for the test of this hypothesis is given by the ratio of generalized variances For ease of computation, we’ll use the Wilks’ lambda statistic ~

The error sum of squares and crossproducts matrix is and the hypothesis sum of squares and crossproducts matrix for this null hypothesis is

so the calculated value of the Wilks’ lambda statistic is

The transformation to a Chi-square distributed statistic (which is actually valid only when n – r and n – m are both large) is at  = 0.01 and m(r - q) = 1 degrees of freedom, the critical value is 9.210351 - we have a strong non- rejection. Also, the approximate p-value of this chi- square test is 0.925701 - note that this is an extremely gross approximation (since n – r = 4 and n – m = 4).

OPTIONS LINESIZE = 72 NODATE PAGENO = 1; DATA stuff; INPUT z1 z2 y1 y2; LABEL z1='Palatability Rating' z2='Texture Rating' y1='Overall Quality Rating' y2='Purchase Intent'; CARDS; 65716367 72777070 77737270 68787572 81768988 73877677 ; PROC GLM DATA=stuff; MODEL y1 y2 = z1 z2/; MANOVA H=z1 z2/PRINTE PRINTH; TITLE4 'Using PROC GLM for Multivariate Linear Regression'; RUN; SAS code for a Multivariate Linear Regression Analysis:

Dependent Variable: y1 Overall Quality Rating Sum of Source DF Squares Mean Square F Value Pr > F Model 2 256.5203092 128.2601546 3.37 0.1711 Error 3 114.3130241 38.1043414 Corrected Total 5 370.8333333 R-Square Coeff Var Root MSE y1 Mean 0.691740 8.322973 6.172871 74.16667 Source DF Type I SS Mean Square F Value Pr > F z1 1 234.6482940 234.6482940 6.16 0.0891 z2 1 21.8720152 21.8720152 0.57 0.5037 Source DF Type III SS Mean Square F Value Pr > F z1 1 214.9618676 214.9618676 5.64 0.0980 z2 1 21.8720152 21.8720152 0.57 0.5037 Dependent Variable: y1 Overall Quality Rating Standard Parameter Estimate Error t Value Pr > |t| Intercept -37.50120546 48.82448511 -0.77 0.4984 z1 1.13458373 0.47768661 2.38 0.0980 z2 0.37949941 0.50090335 0.76 0.5037 SAS output for a Multivariate Linear Regression Analysis:

Dependent Variable: y2 Purchase Intent Sum of Source DF Squares Mean Square F Value Pr > F Model 2 181.4905702 90.7452851 2.51 0.2289 Error 3 108.5094298 36.1698099 Corrected Total 5 290.0000000 R-Square Coeff Var Root MSE y2 Mean 0.625830 8.127208 6.014134 74.00000 Source DF Type I SS Mean Square F Value Pr > F z1 1 162.7322835 162.7322835 4.50 0.1241 z2 1 18.7582867 18.7582867 0.52 0.5235 Source DF Type III SS Mean Square F Value Pr > F z1 1 147.8282325 147.8282325 4.09 0.1364 z2 1 18.7582867 18.7582867 0.52 0.5235 Dependent Variable: y2 Purchase Intent Standard Parameter Estimate Error t Value Pr > |t| Intercept -21.43229335 47.56894895 -0.45 0.6829 z1 0.94088063 0.46540276 2.02 0.1364 z2 0.35144979 0.48802247 0.72 0.5235 SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance E = Error SSCP Matrix y1 y2 y1 114.31302415 99.335143683 y2 99.335143683 108.5094298 Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r| DF = 3 y1 y2 y1 1.000000 0.891911 0.1081 y2 0.891911 1.000000 0.1081 SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for z1 y1 y2 y1 214.96186763 178.26225891 y2 178.26225891 147.82823253 Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z1 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 1.89573606 100.00 0.10970859 -0.01905206 0.00000000 0.00 -0.17533407 0.21143084 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z1 Effect H = Type III SSCP Matrix for z1 E = Error SSCP Matrix S=1 M=0 N=0 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.34533534 1.90 2 2 0.3453 Pillai's Trace 0.65466466 1.90 2 2 0.3453 Hotelling-Lawley Trace 1.89573606 1.90 2 2 0.3453 Roy's Greatest Root 1.89573606 1.90 2 2 0.3453 SAS output for a Multivariate Linear Regression Analysis:

The GLM Procedure Multivariate Analysis of Variance H = Type III SSCP Matrix for z2 y1 y2 y1 21.872015222 20.255407498 y2 20.255407498 18.758286731 Characteristic Roots and Vectors of: E Inverse * H, where H = Type III SSCP Matrix for z2 E = Error SSCP Matrix Characteristic Characteristic Vector V'EV=1 Root Percent y1 y2 0.19454961 100.00 0.06903935 0.02729059 0.00000000 0.00 -0.19496558 0.21052601 MANOVA Test Criteria and Exact F Statistics for the Hypothesis of No Overall z2 Effect H = Type III SSCP Matrix for z2 E = Error SSCP Matrix S=1 M=0 N=0 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.83713560 0.19 2 2 0.8371 Pillai's Trace 0.16286440 0.19 2 2 0.8371 Hotelling-Lawley Trace 0.19454961 0.19 2 2 0.8371 Roy's Greatest Root 0.19454961 0.19 2 2 0.8371 SAS output for a Multivariate Linear Regression Analysis:

We can also build confidence intervals for the predicted mean value of Y 0 associated with z 0 - if the model and has normal errors, then ~ ~ independent so

Thus the 100(1 –  )% confidence interval for the predicted mean value of Y 0 associated with z 0 (  ’ z 0 ) is given by ~ ~ and the 100(1 –  )% simultaneous confidence intervals for the mean value of Y i associated with z 0 (z ’ 0  (i) ) are ~ ~ ~ ~ ~ i = 1,…,m ~

Finally, we can build prediction intervals for the predicted value of Y 0 associated with z 0 – here the prediction error and has normal errors, then ~~ independent so

the prediction intervals the 100(1 –  )% prediction interval associated with z 0 is given by and the 100(1 –  )% simultaneous prediction intervals with z 0 are ~ ~ i = 1,…,m

Similar presentations