# Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of.

## Presentation on theme: "Analysis of multivariate transformations. Transformation of the response in regression The normalized power transformation is: is the geometric mean of."— Presentation transcript:

Analysis of multivariate transformations

Transformation of the response in regression The normalized power transformation is: is the geometric mean of the observations The purpose is to find an estimate of for which the errors in z( ) are approximately normally distributed with constant variance

Score test for transformation The score test T sc ( = 0 ) is the t-statistic on the constructed variable w( 0 )

Multivariate transformations In this case y i is a v 1 vector of responses at observation i with y ij the observation on response j. The normalized transformation of y ij is given by: is the geometric mean of the jth response

Multivariate transformations We assume a multivariate linear regression model of the form

Mult. transformations to normality If the transformed obs. are normally distributed with mean μ i and cov. matrix Σ the max. loglikelihood is given by

Mult. transformations to normality If the explanatory variables are the same The max. lik. estimator of Σ is given by e i (λ) is a v 1 vector of residuals for observation i for some value of

The profile loglikelihood (i.e. maximized over μ and Σ) is

Multivariate likelihood ratio test The multivariate generalization of T SC is given by: This statistic must be compared with a 2 distr. with v df.

Swiss heads: monitoring lik. ratio test for transf. H 0 :λ=1 The last two units (104 and 111) to enter provide all the evidence for a transformation

Boxplot of 6 var. with univariate outliers labelled

Swiss heads The marginal distribution of y 4 had the two outliers (units 104 and 111). We want to test whether all the evidence for a transformation is due to y 4. We recalculate the likelihood ratio but now testing whether 4 is equal to 1.

Forward plot of the lik. ratio test H 0 : 4 =1 The last two units to enter provide all the evidence for a transformation

Mussels data 82 observations on Horse mussels (cozze) from New Zealand. Five variables: Purpose: to see whether multivariate normality can be obtained by joint transformation of all 5 variables

Mussels data: spm

Forward lik. ratio for H 0 : =1

Finding a multivariate transformation with the forward search With just one variable for transformation it is extremely easy to use the fan plot from the forward search to find satisfactory transformations and observations which are influential With v variables there are 5 v combinations of the 5 values of =(-1,-0.5,0,0.5,1)

Suggested procedure for finding multivariate transformations Run the FS through untransformed data, ordering the observations at each m by MD calculated from untransformed observations. Estimate at each step. Select a preliminary set of transformation parameters

Monitoring of MLE of H 0 : =1 H 0 : =(0.5, 0, 0.5, 0, 0)

Monitoring of MLE of H 0 : =(0.5, 0, 0.5, 0, 0)

Forward lik. ratio for H 0 : =(0.5,0,0.5,0,0)

Validation of the transformation In univariate analysis the likelihood ratio test is Asymptotically the null distribution of T LR is chi-squared on one degree of freedom.

Signed square root of T LR This test asymptotically has N(0,1) Including the sign of the difference between the two gives an indication of the direction of any departure from the hypothesised value

Multivariate version of the signed sqrt lik. ratio We test just one component of when all others are kept at some specified value We calculate a set of tests by varying each component of about 0

Example: mussels data validation of 0 =(0.5,0,0.5,0,0) Purpose to validate in a multivariate way 1 =0.5 for the first variable To form the likelihood ratio test we need an estimator = ( 1, …, v ) found by maximization only over 1. The other parameters keep their values in 0. (In this example 0,0.5,0,0) 1 takes the 5 standard values of (-1,-0.5,0,0.5,1)

Example: validation of 1 We perform 5 independent FS with 0 =(-1, 0,0.5,0,0) 0 =(-0.5, 0,0.5,0,0) 0 =(0, 0,0.5,0,0) 0 =(0.5, 0,0.5,0,0) 0 =(-1, 0,0.5,0,0) We monitor for each search the signed square root likelihood ratio test

Version for multivariate data of the signed sqrt LR test j is the parameter under test S j is one of the 5 standard values of 0j is the vector of parameter values in which j takes one of the 5 standard values S while the other parameters keep their value in 0 One plot for each j j =1, …, v

Mussels data: validation of 0 =(0.5,0,0.5,0,0)

Forward lik. ratio for H 0 : =(1/3,1/3,1/3,0,0)

Mussels data: spm (transf. obs.)

Monitoring MD before transforming

Monitoring MD after transforming

Minimum MD before and after transforming The transformation has separated the outliers from the bulk of the data.

Gap before and after transforming

Conclusions This was an example of our approach to finding a mult. transformation in the presence of potential influential obs. and outliers. Procedure: start the search with untransformed data to suggest a transformation and repeat the analysis until you find an acceptable transformation. In this example only 3 searches were necessary to find a transformation which is stable for all the search, any changes being at the end.

Exercises

Exercise 1 The next slide gives two sets of bivariate data. Which of the two has to be transformed to achieve bivariate normality? Consider a forward search in which you monitor the likelihood ratio test for the hypothesis of no transformation. Describe the plot you would expect to get for each of the two sets of data.

Two sets of simulated bivariate data

Similar presentations