Presentation on theme: "SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats."— Presentation transcript:
SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats
SJS SDI_22 linear combinations: 1) If X i is a random variable with expected value E[X i ]= i and variance V[X i ] = i 2 and a and b are two constants, then E[a + bX i ] = a + b i and V[a + bX i ] = b 2 i 2. 2) If X i and X j are two random variables, then E[aX i + bX j ] = a i + b i and V[aX i + bX j ] = a 2 i 2 + b 2 j 2 + 2ab ij, where ij = E[(X i - i )(X j - j )] is known as the covariance of X i and X j. 3) If X 1, X 2,..X n are n independent random variables, with expectations, n and variances n, respectively, then a i X i has expectation a i i and variance a i 2 i 2.
SJS SDI_23 Expected value of a corrected sum of squares If X 1, X 2,......X n is a random sample of size n from a population with variance 2, then is known as the corrected sum of squares and has expected value (n - 1) 2. NB The factor (n - 1), known as the degrees of freedom, arises because the correction point (in this case the sample mean) is estimated from the data. In general we lose one degree of freedom for every constant fitted.
SJS SDI_24 Distribution of a corrected sum of squares If a corrected sum of squares, CSS, with degrees of freedom is calculated from a random sample from a Normal distribution with variance 2, then CSS/ 2 has a chi-square distribution with degrees of freedom. chi-square statistics If Y 1 has a chi-square distribution with 1 degrees of freedom and Y 2 is independently distributed as a chi-square with 2 degrees of freedom then Y = Y 1 + Y 2 has a chi-square with degrees of freedom
SJS SDI_25. t-statistics If Z is a random variable which is Normally distributed with mean 0 and variance 1 and Y is independently distributed as a chi-square with degrees of freedom, then t = Z/ (Y/ ) has a t distribution with degrees of freedom.
SJS SDI_26 Further Variate Relations The square of a t is distributed F 1, The square of a Normal (0,1) is distributed 2 1 The sum of a series of Normally distributed random variables is itself Normally distributed with mean and variance given by the rule for linear combinations. The ratio of two independent random chi-square variables, each divided by its degrees of freedom is an F r.v. with corresponding degrees of freedom. (If the numerator chi-square has d.f. and the denominator has d.f. then the resulting r.v. is F, )
SJS SDI_27 Regression A model of the form Y i = X 1i + 2 X 2i +...X ki + i i = 1...n where Y i is a response measured on the i th individual (for example patient i), X 1i, X 2i etc are measurements of linear predictors (covariates) for the i th individual and i is a stochastic disturbance term, is known as a general linear model and may be expressed in matrix form Y = X + where Y is an n x 1 vector of responses
SJS SDI_28 X is an n x (k + 1) matrix of predictors consisting of k + 1 column vectors of length n, where the first column vector has all n elements = 1 and the next k columns represent the linear predictors X 1 to X k. is an n x 1 vector of disturbance terms with E( )= 0 usually assumed independent and of constant variance 2, so that E 2 I, where I is an n x n identity matrix. The Ordinary Least Squares (OLS) estimator of is and its variance (variance covariance matrix) is
SJS SDI_29 If the further assumption is made that the i terms are Normally distributed then b has a multivariate Normal distribution and individual elements of b are Normally distributed with variance identifiable from (1.2). In practice, 2 will be unknown but has unbiased estimate s 2 = e T e/(n - k -1) where e = Y - Xb and is the vector of residuals from the fitted model. The ratio of b j to a jj s has a t-distribution with n - k - 1 degrees of freedom, where b j is the j th element of b and a jj is the j th diagonal element of A. This fact may be used to test hypotheses about any element of and to construct confidence intervals for it.
SJS SDI_210 The Bivariate Normal The bivariate Normal first received extensive application in statistical analysis in the work of Francis Galton ( ) who was a UCL man! These are some brief notes about some mathematical aspects of it. If the joint probability density function of two random variables is given by where, then X and Y are said to have a bivariate Normal distribution (1.3)
SJS SDI_211 are parameters of the distribution. Since (1.1) is a p.d.f then A contour plot of a bivariate Normal with is given on the next slide.
SJS SDI_212 Contour plot
SJS SDI_213 Surface plot viewed from NE corner
SJS SDI_214 If we integrate out Y, we obtain the marginal distribution of X and this is, in fact a Normal with mean and variance Thus and similarly by integrating out X we obtain the marginal distribution of Y, which is also a Normal distribution (1.4) (1.5)
SJS SDI_215 From (1.4) and (1.5) we see that are respectively the mean of X and Y and the variance of X and Y. The parameter is known as the correlation coefficient and was studied extensively by Galton. We are often interested in the conditional distribution of Y given X and vice versa. These also turn out to be Normal distributions. In fact we have where and (1.6) (1.7) (1.8) and, of course, an analogous expression exists for the conditional distribution of X given Y exchanging Y for X and vice versa.
SJS SDI_216 Note that (1.7) is the equation of a straight line with intercept and slope. Given a bivariate Normal, for particular values of X we will find that the average value of Y lies on this straight line. The degree of scatter about this line is constant and is given by (1.8).
SJS SDI_217 Further Reading Clarke and Kempson, Introduction to the Design and Analysis of Experiments, Arnold, London, 1997 Chapter 2 Senn, S.J. Cross-over Trials in Clinical Research, (2nd edition), Wiley, Chichester, 2002, Chapter 2.