Presentation on theme: "Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia"— Presentation transcript:
Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia
Housekeeping ls() asks what variables are in the global environment rm(list=ls()) gets rid of EVERY variable q() quit, get a prompt to save workspace or not
Janka exercise Conclusion: The best y transformation to optimize the model fit (highest log likelihood)…..is not the best y transformation for normal residuals
This workshop Linear, general linear, and generalized linear models. Understand how GLMs work [Excel simulation] Definitions: e.g. deviance, link functions Poisson GLMs[R exercise] Binomial distribution and logistic regression Fit GLMs in R! [Exercise]
In the beginning there were… Linear models: a normally-distributed y fit to a continuous x But wait…couldn’t we just code a categorical variable to be continuous? Yx
Then there were… General Linear Models: a normally- distributed y fit to a continuous OR categorical x But wait…why do we force our data to be normal when often it isn’t?
Generalized linear models No more need for tedious transformations! Proud to be Poisson ! All variances are unequal, but some are more unequal than others… Because most things in life aren’t normal ! Distribution solution !
What linear models do: X Y X Log Y 1.Transform y 2.Fit line to transformed y 3.Back transform to linear y
What GLMs do: X Y X Log fitted values 1.Start with an arbitrary fitted line 2.Back-transform line into linear space 3.Calculate residuals 4.Improve fitted line to maximize likelihood Many iterations
Maximum likelihood Means that an iterative process is used to find the model equation that has the highest probability (likelihood) of explaining the y values given the x values. Equation for likelihood depends on the error distribution chosen Least squares – by contrast – minimizes variation from the model. If the data are normally distributed, maximum likelihood gives the same answer as least squares.
GLM simulation exercise Simulates fitting a model with normal errors and a log link to data. Your task: (1)understand how the spreadsheet works (2)find through an iterative process the best slope
Generalized linear models In least squares, we fit: y=mx + b + error In GLM, the model is fit more indirectly: y=g(mx + b + error) where g is a function, the inverse of which is called the “link function”: link fn(expected y) = mx + b + error
LMs vs GLMs Uses least squares Assumes normality Based on Sum of Squares Fits model to transformed y Uses maximum likelihood Specify one of several distributions Based on deviance Fits model to untransformed y by means of a link function
All that really matters… By using a log link function, we do not need to calculate log(0). Be careful! A log link model predicts log y not y! Error distribution need not be normal : Poisson, binomial, gamma, Gaussian (=normal)
Exercise 1. Open up the file : Rlecture.csv diane<-read.table(file.choose(),sep=“,",header=TRUE) 2. Look at dataframe. Make treat a factor (“treat”) 3. Fit this model: my.first.glm<-glm(growth~size*treat, family=poisson (link=log), data=diane) ; summary(my.first.glm) 4. Model dignostics par(mfrow=c(2,2)); plot(my.first.glm)
Binomial errors Variance gets constrained near limits; binomial accounts for this Type 1: Classic example: series of trials resulting in success (value=1) or failure (value=0). Type 2: Also continuous but bounded (e.g. % mortality bounded between 0% and 100%).
Logistic regression Least squares: arcsine transformations GLMs: use logit (or probit) link with binomial errors x y
Logit p = proportion of successes If p = e ax+b / (1+ e ax+b ) calculate: log e (p/1-p)
Logits continued Output from logistic regression with logit link: predicted log e (p/1-p) = a+bx To obtain any expected values of p, need to input a and b in original equation: p = e ax+b / (1+ e ax+b )
Binomial GLMs Type 1 binomial Simply set family = binomial (link=logit) Type 2 binomial First create a vector of % not parasitized. Then “cbind” into a matrix (% parasitized, % not parasitized) Then run your binomial glm (link = logit) with the matrix as your y.
Homework 1.Fit the binomial glm survival = size*treat 2. Fit the bionomial glm parasitism = size*treat 3. Predict what size has 50% parasitism in treatment “0”