# Linear and generalised linear models

## Presentation on theme: "Linear and generalised linear models"— Presentation transcript:

Linear and generalised linear models
Purpose of linear models Solution for linear models Some statistics related with the linear model Additive and non-linear models Generlised linear model

Reason for linear models
Purpose of regression is to reveal statistical relations between input and output variables. Statistics cannot reveal functional relationship. It is purpose of other scientific studies. Statistics can help validation various functional relationship. Let us assume that we suspect that functional relationship is where  is a vector of unknown parameters, x=(x1,x2,,,xp) a vector of controllable parameters, and y is output,  is error associated with the experiment. Then we can set for various values of x experiments and get output (or response) for them. If number of experiments is n then we will have n output values. Denote them as a vector y=(y1,y2,,,yn). Purpose of statistics is to evaluate parameter vector using input and output values. If function f is a linear function of the parameters and errors are additive then we are dealing with linear model. For this model we can write Note that linear model is linearly dependent on parameters but not on input variables. For example is a linear model. But is not a linear model.

Assumptions Basic assumptions for analysis of linear model are:
the model is linear in parameters the error structure is additive Random errors have 0 mean and equal variance and errors are uncorrelated. These assumptions are sufficient to deal with linear models. Uncorrelated with equal variance assumption can be removed. Then the treatments becomes a little bit more complicated. Note that for general solution normality assumption is not used. This assumption is necessary to design test statistics. These assumption can be written in a vector form: where y, 0, I,  are vectors and X is a matrix. This matrix is called design matrix, input matrix etc. I is nxn identity matrix.

Solution Solution with given model and assumptions is:
If we use form of the model and write least squares equation (since we want to find solution with minimum least-squares error: and get first and second derivatives and solve the equation then we can see that this solution is correct. This solution is unbiased. If we use formula for the solution and expression of y then we can write: So solution is unbiased. Variance of estimation is: Here we used form of the solution and assumption 3)

Variance To calculate variance we need to be able to calculate 2. Since it is variance of the error term we can find it using form of the solution. For the estimated error (denoted by e) we can write: Using : Immediately gives Since matrix M is idempotent i.e. M2=M=MT we can find for estimation of variance following formula: Wher n is the number of the observations and p is the number of the fitted parameters.

Test of hypothesis Sometimes question arises if some of the parameters are significant. To test this type of hypothesis it is necessary to understand elements of likelihood ratio test. Let us assume that we want to test the following null-hypothesis vs alternative hypothesis: where 1 is the subvector of the parameter vector. It is equivalent to saying that we want to test of one or several parameters are 0 or not. Likelihood ratio test for this case works like that. Let us assume we have the likelihood function for the parameters: where parameters are partitioned into to subvectors: Then maximum likelihood estimators are found for two cases. 1st case is when whole parameter vector is assumed to be variable. 2nd case is when subvector 1 is fixed to a value defined by the null hypothesis. Then values of the likelihood function for this two cases is found and their ratio is calculated. Assume that L0 is value of the likelihood under null hypothesis (subvector is fixed to the given value) and L1 is under the alternative hypothesis. Then ratio of these values is found and statistics related to this ratio is found and is used for testing. Ratio is: If this ratio is sufficiently small then null-hypothesis is rejected. It is not always possible to find distribution of this ratio.

Singular case This forms of the solution is true if matrices X and XTX are non-singular. I.e. rank of matrix X is equal to the number of parameters. If it is not true then either singular value decomposition or eignevalue filtering techniques are used. Fortunately most good properties of the linear model remains. Singular value decomposition: Any nxp matrix can be decomposed in a form: Where U is nxn and V is pxp orthogonal matrices. I.e.multiplication of transpose of the matrix with itself gives unit matrix. D is nxp diagonal matrix of the singular values. If X is singular then number of non-zero diagonal elements of D is less than p. Then for XTX we can write: DTD is pxp diagonal matrix. If the matrix is non-singular then we can write: Since DTD is diagonal its inverse is the diagonal matrix with diagonals inversed. Main trick used in singular value decomposition techniques for equation solution is that when diagonals are 0 or close to 0 then instead of their inversion 0 is used. I.e. if E is the inverse of the DTD then pseudo inverse is calculated:

Likelihood ratio test for linear model
Let us assume that we have found maximum likelihood values for the variances under null and alternative hypothesis and they are: furthermore let us assume that n is the number of the observations, p is the number of all parameters and r is the number of the parameters we want test. Then it turns out that relevant likelihood ratio test statistic for this case is related with F distribution. Relevant random variable is: This random variable has F distribution with (r,n-p) degrees of freedom. It is true if the distribution of the errors is normal. As we know in this case maximum likelihood and least-squares coincide. Note: Distribution becomes F distribution if null-hypothesis is true. If it is not true then distribution becomes non-central F distribution Note: if there are two random variables distributed by 2 distribution with n and m degrees of freedom respectively then their ration has F distribution with (n,m) degrees of freedom.

Let us consider several non-linear models briefly. Additive model. If model is described as then model is called an additive model. Where si can be some set of functions. Usually they are smooth functions. These type of models are used for smoothening. 2) If model is a non-linear function of the parameters and the input variables then it is called non-linear model. In general form it can be written: Form of the function depends on the subject studied. This type of models do not have closed form and elegant solutions. Non-linear least-squares may not have unique solution or may have many local minimum. This type of models are usually solved iteratively. I.e. initial values of the parameters are found and then using some optimisation techniques they are iteratively updated. Statistical properties of non-linear models is not straightforward to derive. Although bootstrap technique can be used to derive some sort of approximations they are not exact in general

Generalised linear model
If the distribution of errors is one of the distributions from the exponential family and some function of the expected value of the observations is linear function of the parameters then generalised linear models are used: Function g is called the link function. Here is list of the popular distribution and corresponding link functions: binomial - logit = ln(p/(1-p)) normal identity Gamma - inverse poisson - log All good statistical packages have implementation of many generalised linear models. To use them finding initial values might be necessary. Additive model can also be generalised. I.e. the function of the expected value of the observation can have the form:

Exercise: linear model
Consider hypothesis testing. We have n observation. Parameter vector has dimension p. We have partitioning of the parameter vector like (dimension of 1 is r): Corresponding partitioning of the design (input) matrix is Assume that all observations are distributed with equal variance normally and they are uncorrelated. Find maximum likelihood estimators for parameters and variance under null and alternative hypothesis: Hint: -loglikelihood function under null hypothesis is (since 1=0): and under the alternative hypothesis: Find minimum of these functions. They will be maximum likelihood estimators.