Linear and generalised linear models

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Matrix Algebra Matrix algebra is a means of expressing large numbers of calculations made upon ordered sets of numbers. Often referred to as Linear Algebra.
Ch11 Curve Fitting Dr. Deshi Ye
The General Linear Model. The Simple Linear Model Linear Regression.
Maximum likelihood (ML) and likelihood ratio (LR) test
Basics of regression analysis I Purpose of linear models Least-squares solution for linear models Analysis of diagnostics.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Chapter 10 Simple Regression.
Elementary hypothesis testing
Factor Analysis Purpose of Factor Analysis Maximum likelihood Factor Analysis Least-squares Factor rotation techniques R commands for factor analysis References.
Factor Analysis Purpose of Factor Analysis
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Maximum likelihood (ML)
Elementary hypothesis testing
Generalised linear models
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Mixed models Various types of models and their relation
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Chapter 11 Multiple Regression.
Ordinary least squares regression (OLS)
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Inferences About Process Quality
Linear and generalised linear models
Basics of regression analysis
Proximity matrices and scaling Purpose of scaling Classical Euclidean scaling Non-Euclidean scaling Non-Metric Scaling Example.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Maximum likelihood (ML)
Linear regression models in matrix terms. The regression function in matrix terms.
Chapter 12 Section 1 Inference for Linear Regression.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 5QF Introduction to Vector and Matrix Operations Needed for the.
Review of Lecture Two Linear Regression Normal Equation
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Correlation & Regression Analysis
Trees Example More than one variable. The residual plot suggests that the linear model is satisfactory. The R squared value seems quite low though,
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Computacion Inteligente Least-Square Methods for System Identification.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
6-1 Introduction To Empirical Models
Simple Linear Regression
Presentation transcript:

Linear and generalised linear models Purpose of linear models Solution for linear models Some statistics related with the linear model Analysis of diagnostics Exponential family and generalized linear model

Reason for linear models Purpose of regression is to reveal statistical relations between input and output variables. Statistics cannot reveal functional relationship. It is purpose of other scientific studies. Statistics can help validation various functional relationship. Let us assume that we suspect that functional relationship is where  is a vector of unknown parameters, x=(x1,x2,,,xp) a vector of controllable parameters, and y is output,  is an error associated with the experiment. Then we can set for various values of x experiment and get output (or response) for them. If number of experiments is n then we will have n output values. Denote them as a vector y=(y1,y2,,,,yn). Purpose of statistics is to evaluate parameter vector using input and output values. If function f is a linear function of the parameters and errors are additive then we are dealing with linear model. For this model we can write Note that linear model is linearly dependent on parameters but not on input variables. For example is a linear model. But is not a linear model.

Assumptions Basic assumptions for analysis of linear model are: the model is linear in parameters the error structure is additive Random errors have 0 mean and equal variances and errors are uncorrelated. These assumptions are sufficient to deal with linear models. Uncorrelated with equal variance assumption can be removed. Then the treatments becomes a little bit more complicated. Note that for general solution normality assumption is not used. This assumption is necessary to design test statistics. These assumption can be written in a vector form: where y, 0, I,  are vectors and X is a matrix. This matrix is called design matrix, input matrix etc. I is nxn identity matrix.

Solution Solution with given model and assumptions is: If we use the form of the model and write least squares equation (since we want to find solution with minimum least-squares error): and get the first and the second derivatives and solve the equation then we can see that this solution is correct. This solution is unbiased. If we use the formula for the solution and the expression of y then we can write: So solution is unbiased. Variance of estimation is: Here we used the form of the solution and assumption 3)

Variance To calculate variance we need to be able to calculate 2. Since it is variance of the error term we can find it using form of the solution. For the estimated error (denoted by e) we can write: If we use: Immediately gives Since matrix M is idempotent and symmetric i.e. M2=M=MT we can write: Where n is the number of the observations and p is the number of the fitted parameters. Then for unbiased estimator for variance of the residual we can write:

Singular case This forms of the solution is true if matrices X and XTX are non-singular. I.e. rank of matrix X is equal to the number of parameters. If it is not true then either singular value decomposition or eignevalue filtering techniques are used. Fortunately most good properties of the linear model remains. Singular value decomposition: Any nxp matrix can be decomposed in a form: Where U is nxn and V is pxp orthogonal matrices. I.e.multiplication of transpose of the matrix with itself gives unit matrix. D is nxp diagonal matrix of the singular values. If X is singular then number of non-zero diagonal elements of D is less than p. Then for XTX we can write: DTD is pxp diagonal matrix. If the matrix is non-singular then we can write: Since DTD is diagonal its inverse is the diagonal matrix with diagonals inversed. Main trick used in singular value decomposition techniques for equation solution is that when diagonals are 0 or close to 0 then instead of their inversion 0 is used. I.e. if E is the inverse of the DTD then pseudo inverse is calculated:

Test of hypothesis Sometimes question arises if some of the parameters are significant. To test this type of hypothesis it is necessary to understand elements of likelihood ratio test. Let us assume that we want to test the following null-hypothesis vs alternative hypothesis: where 1 is a subvector of the parameter vector. It is equivalent to saying that we want to test of one or several parameters are 0 or not. Likelihood ratio test for this case works like that. Let us assume we have the likelihood function for the parameters: where parameters are partitioned into to subvectors: Then maximum likelihood estimators are found for two cases. 1st case is when whole parameter vector is assumed to be variable. 2nd case is when subvector 1 is fixed to a value defined by the null hypothesis. Then values of the likelihood function for this two cases is found and their ratio is calculated. Assume that L0 is value of the likelihood under null hypothesis (subvector is fixed to the given value) and L1 is under the alternative hypothesis. Then ratio of these values is found and statistics related to this ratio is found and is used for testing. Ratio is: If this ratio is sufficiently small then null-hypothesis is rejected. It is not always possible to find distribution of this ratio.

Likelihood ratio test for linear model Let us assume that we have found maximum likelihood values for the variances under null and alternative hypothesis and they are: furthermore let us assume that n is the number of the observations, p is the number of all parameters and r is the number of the parameters we want test. Then it turns out that relevant likelihood ratio test statistic for this case is related with F distribution. Relevant random variable is: This random variable has F distribution with (r,n-p) degrees of freedom. It is true if the distribution of the errors is normal. As we know in this case maximum likelihood and least-squares coincide. Note: Distribution becomes F distribution if null-hypothesis is true. If it is not true then distribution becomes non-central F distribution Note: if there are two random variables distributed by 2 distribution with n and m degrees of freedom respectively then their ration has F distribution with (n,m) degrees of freedom.

Analysis of diagnostics Residuals and hat matrix: Residuals are differences between observation and fitted values: H is called a hat matrix. Diagonal terms hi are leverage of the observations. If these values are close to one then that fitted value is determined by this observation. Sometimes hi’=hi/(1-hi) is used to enhance high leverages. Q-Q plot can be used to check normality assumption. Q-Q plot is plot of quantiles of two distributions (in this case frequency distribution of residuals vs normal distribution). If assumption on distribution is correct then this plot should be nearly linear. Cook’s distance: Observations in turn are removed (it is equivalent to removing corresponding row from the design matrix X). Then parameters estimated without this observation. Cook’s distance is defined as:

Analysis of diagnostics: Cont. Other analysis tools include: Where hi is leverage, hi’ is enhanced leverage, s2 is unbiased estimator of 2, si2 is unbiased estimator of 2 after removal of i-th observation

Exponential family Natural exponential family of distributions has a form: S() is called a scale parameter. When A is identity then it is called. By change of variables A can be changed by identity. So it is usual to use Many distributions including normal, binomial, Poisson, exponential distributions belong to this family. Moment generating function if exists for this distribution is: Then the first moment (mean value) and the second central moments can be calculated:

Generalised linear model If the distribution of errors is one of the distributions from the exponential family and some function of the expected value of the observations is linear function of the parameters then generalised linear models are used: Function g is called the link function. Here is list of the popular distribution and corresponding link functions: binomial - logit = ln(p/(1-p)) normal - identity Gamma - inverse poisson - log All good statistical packages have implementation of many generalised linear models. To use them finding initial values might be necessary. To fit using generalised linear model the likelihood function is used (if  is constant and we ignore constants not depending on parameter of interest): Then expression for natural exponential family can be used to fit a model. Most natural way is using =X

Additive model and non-linear models Let us consider several non-linear models briefly. Additive model. If model is described as then model is called an additive model. Where si can be some set of functions. Usually they are smooth functions. These type of models are used for smoothening. 2) If model is a non-linear function of the parameters and the input variables then it is called non-linear model. In general form it can be written: Form of the function depends on the subject studied. This type of models do not have closed form and elegant solutions. Non-linear least-squares may not have unique solution or may have many local minimum. This type of models are usually solved iteratively. I.e. initial values of the parameters are found and then using some optimisation techniques they are iteratively updated. Statistical properties of non-linear models is not straightforward to derive.

Exercise: linear model Consider hypothesis testing. We have n observation. Parameter vector has dimension p. We have partitioning of the parameter vector like (dimension of 1 is r): Corresponding partitioning of the design (input) matrix is Assume that all observations are distributed with equal variance normally and they are uncorrelated. Find maximum likelihood estimators for parameters and variance under null and alternative hypothesis: Hint: -loglikelihood function under null hypothesis is (since 1=0): and under the alternative hypothesis: Find minimum of these functions. They will be maximum likelihood estimators.