# © Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.

## Presentation on theme: "© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32."— Presentation transcript:

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 2 The Exam! 25 multiple choice questions, similar to term test (60% for 330, 50% for 762) 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions (STATS 762) You have to do a compulsory extra question Held on am of Wed 31 th October 2012

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 3 Help Schedule I will be available in Rm 265 from 10:30 am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting –Tutors: as per David Smiths’s email

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 4 STATS 330: Course Summary The course was about Graphics for data analysis Regression models for data analysis

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 5 Graphics Important ideas: Visualizing multivariate data –Pairs plots –3d plots –Coplots –Trellis plots Same scales Plots in rows and columns Diagnostic plots for model criticism

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 6 Regression models We studied 3 types of regression: –“Ordinary” (normal, least squares) regression for continuous responses –Logistic regression for binomial responses –Poisson regression for count responses (log-linear models)

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 7 Normal regression Response is assumed to be N( ,  2 ) Mean is a linear function of the covariates     x   k x k Covariates can be either continuous or categorical Observations independent, same variance

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 8 Logistic regression Response ( s “successes” out of n) is assumed to be Binomial Bin(n,  ) Logit of Probability log(  is a linear function of the covariates  log(      x   k x k Covariates can be either continuous or categorical Observations independent

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 9 Poisson regression Response is assumed to be Poisson(  ) Log of mean log(  is a linear function of the covariates (log-linear models)  log(     x   k x k  (Or, equivalently  exp     x   k x k ) Covariates can be either continuous or categorical Observations independent

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 10 Interpretation of  - coefficients For continuous covariates: –In normal regression,  is the increase in mean response associated with a unit increase in x –In logistic regression,  is the increase in log odds associated with a unit increase in x –In Poisson regression,  is the increase in log mean associated with a unit increase in x –In logistic regression, if x is increased by 1, the odds are increased by a factor of exp(  –In Poisson regression, if x is increased by 1, the mean is increased by a factor of exp( 

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 11 Interpretation of  - coefficients For categorical covariates (main effects only): –In normal regression,  is the increase in mean response relative to the baseline –In logistic regression,  is the increase in log odds relative to the baseline –In logistic regression, if we change from baseline to some level, the odds are increased by a factor of exp( parameter for that level  relative to the baseline –In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp( parameter for that level  relative to the baseline

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 12 Measures of Fit R 2 (for normal regression) Residual Deviance (for Logistic and Poisson regression) –But not for ungrouped data in logistic, or Poisson with very small means (cell counts)

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 13 Prediction For normal regression, –Predict response at covariates x1,...,xk –Estimate mean response at covariates x1,...,xk For logistic regression, –estimate log-odds at covariates x1,...,xk –Estimate probability of “success” at covariates x1,...,xk For Poisson regression, –Estimate mean at covariates x1,...,xk

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 14 Inference Summary table –Estimates of regression coefs –Standard errors –Test stats for coef = 0 –R 2 etc (normal regression) –F-test for null model –Null and residual deviances (logistic/Poisson)

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 15 Testing model vs sub-model Use and interpretation of both forms of anova –Comparing model with a sub-model –Adding successive terms to a model

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 16 Topics specific to normal regression Collinearity –VIF’s –Correlation –Added variable plots Model selection –Stepwise procedures: FS, BE, stepwise –All possible regressions approach AIC, BIC, CP, adjusted R 2, CV

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 17 Factors (categorical explanatory variables) Factors –Baselines –Levels –Factor level combinations –Interactions –Dummy variables –Know how to express interactions in terms of means, means in terms of interactions –Know how to interpret zero interactions

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 18 Fitting and Choosing models Fit a separate plane (mean if no continuous covariates) to each combination of factor levels Search for a simpler submodel (with some interactions zero) using stepwise and anova

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 19 Diagnostics For non-planar data –Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot –Transform either x’s or response, fit polynomial terms For unequal variance –Plot res/ fitted, look for funnel effect –Weighted least squares –Transform response

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 20 Diagnostics (2) For outliers and high-leverage points –Hat matrix diagonals –Standardised residuals, –Leave-one-out diagnostics Independent observations –Acf plots –Residual/previous residual –Time series plot of residuals –Durbin-Watson test

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 21 Diagnostics (3) Normality –Normal plot –Weisberg-Bingham test –Box Cox (select power)

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 22 Specifics for Logistic Regression Log-likelihood is

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 23 Deviance Deviance = 2(log L MAX - log L MOD ) –log L MAX : replace  ’s with frequencies s i /n i –log L MOD : replace  ’s with estimated  ’s from logistic model i.e. Can’t use as goodness of fit measure in ungrouped case

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 24 Odds and log-odds Probabilities  : Pr(Y=1) Odds  /(1-  ) Log-odds: log 

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 25 Residuals Pearson Deviance

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 26 Topics specific to Poisson regression Offsets Interpretation of regression coefficients –(same as for odds in logistic regression) Correspondence between Poisson regression (Log-linear models) and the multinomial model for contingency tables –The “Poisson trick”

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 27 Contingency tables Cells 1,2,…, m Cell probabilities  1,...,  m Counts y 1,..., y m Log-likelihood is

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 28 Contingency tables (2) A “model” for the table is anything that specifies the form of the probabilities, possibly up to k unknown parameters Test if the model is OK by –Calculate Deviance = 2(log L MAX - log L MOD ) log L MAX : replace  ’s with table frequencies log L MOD : replace  ’s with estimated  ’s from the model –Model OK if deviance is small,(p-value > 0.05) –Degrees of freedom m - 1 - k –k = number of parameters in the model

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 29 Independence models Correspond to interactions being zero Fit a “saturated” model using Poisson regression Use anova, stepwise to see which interactions are zero Identify the appropriate model Models can be represented by graphs

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 30 Odds Ratios Definition and interpretation Connection to independence Connection with interactions Relationship between conditional OR’s and interactions Homogeneous association model

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 31 Association graphs Each node is a factor Factors joined by lines if an interaction between them Interpretation in terms of conditional independence Interpretation in terms of collapsibility

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 32 Contingency tables: final topics Association reversal –Simpson’s paradox –When can you collapse Product multinomial –comparing populations –populations the same if certain interactions are zero Goodness of fit to a distribution –Special case of 1-dimensional table

Similar presentations