Presentation is loading. Please wait.

Presentation is loading. Please wait.

ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ.

Similar presentations


Presentation on theme: "ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ."— Presentation transcript:

1

2 ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ Email: roger.payne@bbsrc.ac.uk

3 General balance.. ERCIM May 2001 is a very useful concept for small to medium sized data sets it caters for several sources of variation it leads to very efficient algorithms & to clear output, test statistics etc but what about large to very large data sets...?..

4 General balance.. ERCIM May 2001 fits a mixed model with several error (or block) terms total sum of squares partitioned into components known as strata, one for each block term: each stratum contains the sum of squares for the treatment terms estimated between the units of that stratum a residual representing the random variability of those units..

5 General balance: example.. ERCIM May 2001

6 General balance: properties.. ERCIM May 2001 block (error) terms mutually orthogonal treatment terms mutually orthogonal contrasts of each treatment term all have equal efficiency factors in each of the strata where they are estimated..

7 General balance: theory.. ERCIM May 2001 mixed model: y =   Z    X  dispersion structure: Var(y) = V =     Š  Š  known symmetric matrices with Š  Š  =   Š  (i.e. orthogonal)   Š  = I..

8 General balance: theory.. ERCIM May 2001 random effects model: y  E(y) =   Z    where E(  ) = 0 and Var(  ) =   2 I then Var(y) =     2 Z  Z  and if terms orthogonal S  = Z  Z  Z    Z  = (1 / n  ) Z  Z  (if equal rep.) S  is the projection operator (form and project means) Š  = S  ( I   term  marginal to term } Š  )   = n    2 +  term  marginal to term } n    2..

9 General balance: theory.. ERCIM May 2001 treatment structure E(y) = X  =  i X i  i (X = [ X 0 | X 1 | X 2 |...] ) E(y) =  = T  (T = X ( X X )  X ) treatment terms are orthogonal, so E(y) =  i Ť i  i T i = X i X i X i   X i = (1 / n i ) X i X i (if equal rep.) Ť i = T i ( I   jterm j marginal to term i} Ť j )..

10 General balance: theory.. ERCIM May 2001 orthogonal block structure implies independent least squares analysis within each stratum residual s.s.(Š  y  Š  T  ) (Š  y  Š  T  ) normal equations T Š  T  ^  = T Š  y final condition Ť i Š  Ť j =  ij i Ť i normal equations now  i i Ť i  ^ i =  i Ť i Š  y solved by  ^ i = (1/ i ) Ť i Š  y with var-cov matrix   Ť i / i i is efficiency factor of term i and eigenvalue of Ť i Š  Ť i..

11 Analysis by “sweeps”.. ERCIM May 2001 requires a first-order balance all effects of each model term have an equal efficiency factor, at each point where the term is estimated (see Wilkinson 1970, Biometrika; Payne & Wilkinson 1977, Applied Statistics) similar to general balance, but that has block (i.e. error) terms mutually orthogonal (note: always true if nested) treatment terms mutually orthogonal (see Payne & Tobias 1992, Scandinavian J. Stats.)..

12 Analysis by “sweeps”.. ERCIM May 2001 requires a working vector v which initially contains the data values finally contains the residuals terms fitted sequentially: sweeps estimate and remove effects of a term i in stratum  by v (+1) = { I  ( 1 / i ) T i } v () and are then followed by a repeat of the sweeps up to this point (a reanalysis sequence) can omit reanalysis sweeps of terms orthogonal to term i (so none if i  1, & much simpler if general balance) notice projection operator T i simply calculates tables of means so no matrix inversion (unless there are covariates)..

13 Pictorial representation.. ERCIM May 2001 efficiency factor = sin 2  (Payne & Wilkinson 1977, Applied Statist.)

14 Analysis by “sweeps”.. ERCIM May 2001 with general balance: initial working vector for stratum  calculated by S   < (I  S  ) y S  is a pivot (calculate means, and insert into vector) and fitted values for treatment term i in stratum  calculated by (1/ i )T i  j: j>0; j<i {R j (I-(1/ j )T j )} S   < (I  S  )y R i = Iif i = 1 R i = S   < (I  S  )yif i < 1..

15 Other issues.. ERCIM May 2001 analysis of covariance analyse the response (y) variate and the covariates calculate the covariate regression coefficients (regression of y residuals on covariate residuals) adjust treatment estimates and sums of squares combination of information form treatment (and covariate) estimates combining information from all the strata where each is estimated weighted combination of estimates with general balance estimate stratum variances   to calculate weights see Payne & Tobias (1992, Scand.J.Stats.)..

16 Workspace requirements.. ERCIM May 2001 sweep algorithm working vector: N vectors for effects of each term: n  or n i analysis of covariance - symmetric matrices: ncov(ncov+1)/2, n i (n i +1)/2 c.f. multiple-regression style algorithms (including REML) which typically require matrix: neffects(neffects+1)/2 where neffects is total no block & treatment effects excluding residual vector(s): N much more efficient for large models (see Payne & Welham 1990, COMPSTAT)..

17 Large data sets.. ERCIM May 2001 data may be unbalanced take a balanced sample..? adapt the algorithm..? Wilkinson (1970, Biometrika) “general recursive algorithm” requires as many sweep sequences for each term i in stratum  as Ť i Š  Ť i has eigenvalues Iterative algorithms Hemmerle (1974, JASA) Worthington (1975, Biometrika)..

18 Large data sets.. ERCIM May 2001 Hemmerle (1974, JASA) also uses sweep-type operations does not require first-order balance instead performs a sequence of “balanced sweeps” for each term until estimation converges but only one error term fits whole model at once (so a sequence of increasingly large models is required to assess individual terms) data must be “connected” (i.e. no aliasing) does not provide sed’s..

19 Large data sets.. ERCIM May 2001 Worthington (1975, Biometrika) performs sequence of operations analogous to projections and sweeps assumes additional (unspecified) algorithm to determine the strata (and their projectors) assumes orthogonal (equal replicated) block structure and assumes equally replicated treatment combinations again fits whole model at once..

20 Generalizing the algorithms.. ERCIM May 2001 both algorithms are based on the result (I  M) 1 = I + M + M 2 +...(when this converges) apply this to the general form T Š  T  ^  = T Š  y T S   < (I  S  ) T  ^  = T S   < (I  S  ) y  ^  = (I +  m (I  TS   < (IS  )T) m ) TS   < (IS  ) y  ^  (1)  = TS   < (I  S  ) y  ^  (m+1)  =  ^  (m) + (ITS   < (IS  )T) m TS   < (IS  )y relatively straightforward algorithm sequences of sweeps & pivots with efficiency 1

21 Generalizing the algorithms.. ERCIM May 2001 iterative scheme  ^  (1)  = TS   < (IS  ) y  ^  (m+1)  =  ^  (m) + (ITS   < (IS  )T) m TS   < (IS  )y implementation  ^  (m+1)  =  ^  (m) +  (m)  (m) =(I  T S   < (I  S  ) T) m TS   < (I  S  ) y =(I  T S   < (I  S  ) T)  (m1) calculation of  (m).. project  (m1) into the treatment space T  (m1) project into stratum  S   < (IS  )T  (m1) project into treatment spaceTS   < (IS  )T  (m1) subtract result from  (m1) (ITS   < (IS  )T)  (m1)


Download ppt "ERCIM May 2001 Analysis of variance, general balance and large data sets Roger Payne Statistics Department, IACR-Rothamsted, Harpenden, Herts AL5 2JQ."

Similar presentations


Ads by Google