Negative Binomial Regression NASCAR Lead Changes 1975-1979.

Slides:



Advertisements
Similar presentations
Multiple Regression.
Advertisements

Panel Data Models Prepared by Vera Tabakova, East Carolina University.
Lecture 11 (Chapter 9).
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
The Art of Model Building and Statistical Tests. 2 Outline The art of model building Using Software output The t-statistic The likelihood ratio test The.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Linear regression models
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Generalized Linear Mixed Model English Premier League Soccer – 2003/2004 Season.
Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.
Chapter 10 Simple Regression.
Some more issues of time series analysis Time series regression with modelling of error terms In a time series regression model the error terms are tentatively.
Instructor: K.C. Carriere
Additional Topics in Regression Analysis
Generalised linear models
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
1 Regression Econ 240A. 2 Retrospective w Week One Descriptive statistics Exploratory Data Analysis w Week Two Probability Binomial Distribution w Week.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Generalized Linear Models
Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races L. Winner (2006). “NASCAR Winston Cup Race Results for ,” Journal.
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
Inference for regression - Simple linear regression
1 G Lect 11W Logistic Regression Review Maximum Likelihood Estimates Probit Regression and Example Model Fit G Multiple Regression Week 11.
Stat13-lecture 25 regression (continued, SE, t and chi-square) Simple linear regression model: Y=  0 +  1 X +  Assumption :  is normal with mean 0.
Simple Linear Regression Models
Moment Generating Functions
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Hong Tran, April 21, 2015.
VI. Regression Analysis A. Simple Linear Regression 1. Scatter Plots Regression analysis is best taught via an example. Pencil lead is a ceramic material.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Environmental Modeling Basic Testing Methods - Statistics III.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Correlation & Regression Analysis
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
Beta Regression Proportion of Prize Money for Ford in NASCAR Winston Cup Races – Methodology: S.L.P. Ferrari and F. Cribari-Neto (2004). “Beta.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Logistic Regression and Odds Ratios Psych DeShon.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
The simple linear regression model and parameter estimation
Negative Binomial Regression
Generalized Linear Models
Checking Regression Model Assumptions
Caution Flags (Crashes) in NASCAR Winston Cup Races
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
CHAPTER 29: Multiple Regression*
Chapter 12 Inference on the Least-squares Regression Line; ANOVA
Checking Regression Model Assumptions
24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)
Hypothesis testing and Estimation
Tutorial 1: Misspecification
Multiple Linear Regression
Product moment correlation
Logistic Regression with “Grouped” Data
Presentation transcript:

Negative Binomial Regression NASCAR Lead Changes

Data Description Units – 151 NASCAR races during the Seasons Response - # of Lead Changes in a Race Predictors:  # Laps in the Race  # Drivers in the Race  Track Length (Circumference, in miles)  Models:  Poisson (assumes E(Y) = V(Y))  Negative Binomial (Allows for V(Y) > E(Y))

Poisson Regression Random Component: Poisson Distribution for # of Lead Changes Systematic Component: Linear function with Predictors: Laps, Drivers, Trklength Link Function: log: g(  ) = ln(  )

Regression Coefficients – Z-tests Note: All predictors are highly significant. Holding all other factors constant: As # of laps increases, lead changes increase As # of drivers increases, lead changes increase As Track Length increases, lead changes increase

Testing Goodness-of-Fit Break races down into 10 groups of approximately equal size based on their fitted values The Pearson residuals are obtained by computing: Under the hypothesis that the model is adequate, X 2 is approximately chi-square with 10-4=6 degrees of freedom (10 cells, 4 estimated parameters). The critical value for an  =0.05 level test is The data (next slide) clearly are not consistent with the model. Note that the variances within each group are orders of magnitude larger than the mean.

Testing Goodness-of-Fit >>  Data are not consistent with Poisson model

Negative Binomial Regression Random Component: Negative Binomial Distribution for # of Lead Changes Systematic Component: Linear function with Predictors: Laps, Drivers, Trklength Link Function: log: g(  ) = ln(  )

Regression Coefficients – Z-tests Note that SAS and STATA estimate 1/k in this model.

Goodness-of-Fit Test Clearly this model fits better than Poisson Regression Model. For the negative binomial model, SD/mean is estimated to be 0.43 = sqrt(1/k). For these 10 cells, ratios range from 0.24 to 0.67, consistent with that value.

Computational Aspects - I k is restricted to be positive, so we estimate k* = log(k) which can take on any value. Note that software packages estimating 1/k are estimating –k* Likelihood Function: Log-Likelihood Function:

Computational Aspects - II Derivatives wrt k* and 

Computational Aspects - III Newton-Raphson Algorithm Steps: Step 1: Set k*=0 (k=1) and iterate to obtain estimate of  Step 2: Set  ’ = [ ] and iterate to obtain estimate of k*: Step 3: Use results from steps and 2 as starting values (software packages seem to use different intercept) to obtain estimates of k* and  Step 4: Back-transform k* to get estimate of k: k=exp(k*)