Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

Central Limit Theorem. So far, we have been working on discrete and continuous random variables. But most of the time, we deal with ONE random variable.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Maximum likelihood (ML) and likelihood ratio (LR) test
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Modeling Wim Buysse RUFORUM 1 December 2006 Research Methods Group.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA
Maximum likelihood (ML)
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Generalised linear models
Maximum likelihood (ML) and likelihood ratio (LR) test
Estimation of parameters. Maximum likelihood What has happened was most likely.
Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.
Design of experiment and ANOVA
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
Analysis of Simulation Input.. Simulation Machine n Simulation can be considered as an Engine with input and output as follows: Simulation Engine Input.
Some standard univariate probability distributions
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Some standard univariate probability distributions
Linear and generalised linear models
Continuous Random Variables and Probability Distributions
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Some standard univariate probability distributions
Generalized Linear Models
Moment Generating Functions 1/33. Contents Review of Continuous Distribution Functions 2/33.
Basics of ANOVA Why ANOVA Assumptions used in ANOVA Various forms of ANOVA Simple ANOVA tables Interpretation of values in the table Exercises.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
0 Simulation Modeling and Analysis: Input Analysis K. Salah 8 Generating Random Variates Ref: Law & Kelton, Chapter 8.
Moment Generating Functions
Some standard univariate probability distributions Characteristic function, moment generating function, cumulant generating functions Discrete distribution.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Normal approximation of Binomial probabilities. Recall binomial experiment:  Identical trials  Two outcomes: success and failure  Probability for success.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Repeated Measures Analysis of Variance Analysis of Variance (ANOVA) is used to compare more than 2 treatment means. Repeated measures is analogous to.
Generalized Linear Models (GLMs) and Their Applications.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Continuous Random Variables and Probability Distributions
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
4-1 Continuous Random Variables 4-2 Probability Distributions and Probability Density Functions Figure 4-1 Density function of a loading on a long,
BINARY LOGISTIC REGRESSION
Probability Distributions: a review
Logistic Regression APKC – STATS AFAC (2016).
Generalized Linear Models
Two-way ANOVA problems
Distribution functions
Generalized Linear Models
Chapter 7: Sampling Distributions
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
SA3202 Statistical Methods for Social Sciences
Quantitative Methods What lies beyond?.
Moment Generating Functions
Discrete Event Simulation - 4
Lecture 5 b Faten alamri.
Quantitative Methods What lies beyond?.
Two-way analysis of variance (ANOVA)
Chapter 3 : Random Variables
Two-way ANOVA problems
Presentation transcript:

Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances R commands for log-linear and logistic models

Generalised linear model Linear models are useful when the distribution of the observations are or can be approximated with normal distribution. Even if it is not case for large number of observation normal distribution is safe assumption. However there are many cases when different model should be used. Generalised linear model is a way of generalising linear models to a wide range of distributions. If distribution of the observations is the from the family of generalised exponential family and mean value of this distribution is linear on the input parameters then generalised linear model can be used. Recall generalised exponential family: Following distributions belong to the generalised exponential family (note that parameters we are considering are the mean values. Other members of this family include: gamma, exponential and many others. If some function of the mean ( , , for above cases) is a linear function of the observations then it can be handled using generalised linear model. Usually this function is taken A(  ).

ANOVA revisited Let us recall purpose of ANOVA: We want to know difference between effects of different parameters. There might be several set of parameters. If number of parameters is two then t-test is suitable for testing difference between means. If there are more than two parameters then we design experiment according to one of the schemes (n-way crossed, n-fold nested or mixture of them). When we have the result of experiments then we fit various linear models under different hypotheses. Then we calculate likelihood ratio (LR) tests. LR test turns out to be related with ratio of sum of the squares under different hypotheses. And this ratio is related with F-distribution (if observations are distributed normally). If F-value is large enough then we say that differences between means are significant. If it is small we say that differences are not significant and we can remove some parameters from our model. One of the assumptions in ANOVA model is that observations are distributed normally. Another hidden assumption is that parameters are continuous. If number of observations is large enough then these assumptions work very well. There are cases when ANOVA is not adequate using linear model. Examples are: Outcomes are success or failure. In this case binomial distribution is more adequate. Outcome is the number of occurrences. In this case Poisson distribution is more adequate. One more feature of Binomial and Poisson distribution is that they can be applied to categorical variables (since these distributions are discrete).

R commands for ANOVA First decide what type ANOVA it is. Then decide what is the result of experiment and what are factors. Then define factors. There are several command to define factors f1 <- gl(k,n,total number) or it can be done directly: g1 <- c(numbers) f1 <- factor(some variable) then use linear model to fit data: result <- lm(data~formula) data is result of the experiment, formula is what do we want to fit. if we have two factors f1 and f2 then if we want to fit effects of f1 and f2 we can use as formula f1 + f2. If we want effect of f1 and f2 and their interaction then we can use f1*f2. If we want only interactions then we can use f1:f2. If we want effect of f1 and interaction between f1 and f2 then we can use f1*f2-f1. It is equivalent to f1 + f1:f2. Once we have result of linear model we can use following commands: anova(result), plot(result), summary(result).

Poisson distribution: log-linear model If the distribution of observations is Poisson then log-linear model should be used. Recall that Poisson distribution is from exponential family and the function A of the mean value is logarithm. It can be handled using generalised linear model. When log-linear model is appropriate: When outcomes are frequencies (expressed as integers) and parameters are categorical then log-linear model is appropriate. When we fit log-linear model then we can find estimated mean using exponential function: Example: Relation between gray hair and age Age gray hair under 40 over 40 yes no It is similar to two-fold nested ANOVA model. We could analyse this type of data using the log-linear model.

Binomial distribution: logistic model If the distribution of the result of experiment is binomial, i.e. outcome is 0 or 1 (success of failrure) then logistic model can be used. Recall that function of mean value A has the form: This function has a special name – logit. It has several advantages: If logit(  ) has been estimated then we can find  and it is between 0 and 1. If probability of success is larger than failure then this function is positive, otherwise it is negative. Changing places of success and failure changes only the sign of this function. This model can be used when outcomes are binary (0 and 1). If logit(  ) is linear then we can find  : For logistic model either grouped variables (fraction of successes) or individual items (every individual have success (1) or failure (0) can be used. Ratio of the probability of success to the probability of failure is also called odds.

Deviances In linear model we maximise the likelihood with full model and under the hypothesis. Then ratio of the values of the likelihoods under two hypotheses (null and alternative) is related with F-distribution. Interpretation is that how much variance would increase if we would remove part of the model (null hypothesis). In logisitc and log-linear model analysis again likelihood function is maximised under the null- and alternative hypotheses. Then logarithm of ratio of the values of the likelihood under these two hypotheses is related asymptotically with chi-squared distribution: That is the reason why in log-linear and logistic regressions it is usual to talk about deviances and chi-squared statistics instead of variances and F-statistics. Analysis based on log- linear and logistic models (in general for generalised linear models) is usually called analyisis of deviances. Reason for this is that chi-squared is related with deviation of the fitted model and observations. Another test is based on Pearson’s chi-squared test. These two tests behave similarly as the number of observations increases. Pearson chi-squared is calculated using:

R commands for log-linear model log-linear model can be analysed using generalised linear model. Once the factors, the data and the formula have been decided then we can use: result <- glm(data~formula,family=poisson) It will give us fitted model. Then we can use anova.glm(result,test=‘Chisq’) anova(result,test=‘Chisq’) plot(result) summary(result) Interpretation of the results is similar to linear model ANOVA table. Degrees of freedom is defined similarly. Only difference is that instead of sum of squares deviances are used.

R commands for logistic regression Similar to log-linear model: Decide what are the data, the factors and what formula should be used. Then use generalised linear model to fit. result <- glm(data~formula,family=binomial) then analyse using anova(result,test=“Chisq”) summary(result) plot(result)