Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
HSRP 734: Advanced Statistical Methods July 24, 2008.
The General Linear Model. The Simple Linear Model Linear Regression.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
The Simple Linear Regression Model: Specification and Estimation
1 Experimental design and analyses of experimental data Lesson 6 Logistic regression Generalized Linear Models (GENMOD)
Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Generalised linear models
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Generalised linear models Generalised linear model Exponential family Example: logistic model - Binomial distribution Deviances R commands for generalised.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
Linear and generalised linear models
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
8/7/2015Slide 1 Simple linear regression is an appropriate model of the relationship between two quantitative variables provided: the data satisfies the.
Regression and Correlation Methods Judy Zhong Ph.D.
Simple linear regression and correlation analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Statistics for Business and Economics Dr. TANG Yu Department of Mathematics Soochow University May 28, 2007.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Multiple regression models Experimental design and data analysis for biologists (Quinn & Keough, 2002) Environmental sampling and analysis.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Outliers and influential data points. No outliers?
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
AP Statistics Chapter 14 Section 1.
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Multiple Linear Regression
Regression Diagnostics
CH 5: Multivariate Methods
Multiple Linear Regression
Multiple Regression Chapter 14.
Regression Assumptions
Chapter 13 Additional Topics in Regression Analysis
Linear Regression and Correlation
Regression and Correlation of Data
Topic 11: Matrix Approach to Linear Regression
Regression Assumptions
Presentation transcript:

Linear statistical models 2008 Model diagnostics  Residual analysis  Outliers  Dependence  Heteroscedasticity  Violations of distributional assumptions  Identification of influential observations  Examination of over- and under-dispersion

Linear statistical models 2008 A simple model of water clarity Inputs: year, temperature, salinity, station dummies Output; Secchi depth (water clarity)

Linear statistical models 2008 Sampling sites for water quality in the Stockholm archipelago Stockholm Baltic Sea

Linear statistical models 2008 Raw residuals in generalized linear models The predicted values are linear combinations of the observed values, i.e.. where H is a symmetric idempotent matrix ( H = H*H ) The vector of raw residuals can be written In contrast to residuals in general linear models, the raw residuals in glims may have a variance that is strongly related to the size of

Linear statistical models 2008 Pearson residuals The Pearson residual is the raw residual standardized with the standard deviation of the fitted value Special cases: Poisson and binomial models

Linear statistical models 2008 Adjusted Pearson residuals The Pearson residual can be adjusted by computing where h ii is the i th diagonal element of the ‘hat’ matrix H. The adjusted Pearson residuals can often be assumed to be approximately standard normal.

Linear statistical models 2008 Deviance The deviance is defined as where is the log likelihood of the full (saturated) model, and is the log likelihood of the current model at the ML-estimates of its parameters. The deviance is a sum of the contributions to the deviance from each of the observations

Linear statistical models 2008 Deviance residuals The (unadjusted) deviance residuals are defined as The adjusted deviance residuals are defined as where h ii is the i th diagonal element of the ‘hat’ matrix H.

Linear statistical models 2008 Score residuals The score equations involve sums of terms U i, one for each observation. Properly standardized these terms can be regarded as residuals

Linear statistical models 2008 Approximate likelihood residuals Likelihood residuals may, in principle, be computed by comparing the deviance for a model based on all observations with the deviance for a model based on all but the i th observation An approximation of these residuals is given by the formula

Linear statistical models 2008 Choice of residuals Type of residualsTest Pearson residualsLikelihood ratio test Deviance residualsWald tests Score residualsScore tests Likelihood residuals

Linear statistical models 2008 Influential observations The leverage (influence) of observation i on the fitted value is the derivative of this estimate with respect to y i. Because these derivatives are given by the diagonal elements h ii of the ‘hat’ matrix H.

Linear statistical models 2008 Cook’s distance The combined change in all parameters when observation i is omitted can be computed as

Linear statistical models 2008 Over-dispersion Over-dispersion occurs when the variance of the response is larger than would be expected for the chosen distribution. Example: In a model involving Poisson distributions, the estimated variance is considerably larger than the estimated mean.

Linear statistical models 2008 Possible causes of over-dispersion Lack of homogeneity (the distribution of the target variable varies within experiments that are assumed to be replicates) Dependence (the response levels in experiments assumed to be replicates are actually positively correlated)

Linear statistical models 2008 Modelling over-dispersion Introduce an extra scale parameter  in the variance function of the response Y. Note that the variance is a function of the mean for all members of the exponential family.

Linear statistical models 2008 Software recommendations General linear models MINITAB Generalized linear models SAS,proc GENMOD