Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN.

Similar presentations


Presentation on theme: "THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN."— Presentation transcript:

1 THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN

2 OUTLINE OF PRESENTATION  Poisson Regression  Model Assumptions, Assessment, and Interpretations  Applications in SAS and R  Quick Programming in SPSS and MINITAB  Negative Binomial  Model Assumptions, Assessment, and Interpretations  Applications in SAS and R  Quick Programming in SPSS

3 ASSUMPTIONS FOR POISSON MODEL 3 Number of events must occur at a fixed period of time Number of events must occur at a constant rate Events must be independent Dependent variable’s conditional mean and variance must be equal Dependent variable must be an integer

4 THE POISSON MODEL  Random Component: Poisson Distribution for the # of lead changes  Systematic Component:  Mass Function:  E(Y) = µ & V(Y)= µ  Link Function: g(µ) = log(µ) 4

5 EXAMPLES OF POISSON DISTRIBUTION Number of earthquakes in a region Number of accidents on a highway in a certain area in a specified time Number of telephone calls received in one hour Number of customers that enter a bank in one hour Number of times an elderly person will fall in a month 5

6 INTEPRETING COEFFICIENTS CONTINUOUS PREDICTOR  Keeping all constant, when is increased by one unit, Y increases/decreases (+/-) by  Keeping all constant, when is increased by one unit, the expected number of Y will go up/down (+/-) by CATEGORICAL PREDICTOR  Keeping all constant, when, Y increases/decreases (+/-) by  Keeping all constant, when the expected number of Y will go up/down (+/-) by 6

7 7 POTENTIAL PROBLEM WITH POISSON OVERDISPERSION-the variance is much larger than the mean Negative Binomial is the solution!

8 THE DATA  Trying to predict the number of field goal attempts in NBA  Extracted the top 100 highest scoring players in the NBA for the season  The following were used as predictors:  Number of games played (GP)  Number of defensive rebounds(DREB)  Number of assists (AST)  Number of steals (STL)  Number of blocks (BLK)  Number of turnovers (TOV)  Number of free throws made (FTM) 8

9 SAMPLE OF THE DATA RankPlayerGPFGADREBASTSTLFTMTOV 1Kevin Love (MIN) Kevin Durant (OKC) Monta Ellis (DAL) Blake Griffin (LAC) LeBron James (MIA) Evan Turner (PHI) Kevin Martin (MIN) Paul George (IND) LaMarcus Aldridge (POR) Carmelo Anthony (NYK) Kyrie Irving (CLE) Klay Thompson (GSW) Dirk Nowitzki (DAL) James Harden (HOU) Chris Paul (LAC) Arron Afflalo (ORL) Damian Lillard (POR) DeMarcus Cousins (SAC)

10 POISSON-EXAMPLE WITH SAS  proc genmod data = nba; model FGA= GP DREB AST STL TOV FTM /dist=poisson; run;  /*check goodness of fit for model*/ data pvalue; df = 93; chisq = ; pvalue = 1 - probchi(chisq, df); run;  proc print data = pvalue noobs; run; /*pvalue is NOT significant, model isnt good*; dispersion parameter >> 1, major overdipsersion/ 10

11 EXAMPLE RESULTS-GOODNESS OF FIT The GENMOD Procedure Model Information Data Set WORK.NBA Distribution Poisson Link Function Log Dependent Variable FGA Number of Observations Read 100 Number of Observations Used 100 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Full Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

12 RESULTS: Analysis of Maximum Likelihood Parameter Estimates 12 PARAMETERDFESTIMATESTANDARD ERROR WALD 95% CONFIDENCE LIMITS WALD CHI- SQUARE PR>CHISQ Intercept (4.0396,43332) <.0001 GP (0.0310,0.0534)54.93<.0001 DREB ( ,0.0010) AST ( ,0.0005) STL (0.0004,0.0052) TOV (0.0038,0.0077)33.53<.0001 FTM (0.0032,0.0048)98.23<.0001 Scale (1.0, 1.0)

13 ASSESSMENT OF RESULTS  Ratio of Deviance/Df= >>>1==major overdispersion  Deviance= , not well fit because pvalue=1-prob(chisq,df) is NOT significant  Every term significant except for AST and DREB  False results possible if model is inaccurate  Must perform a NEGATIVE BINOMIAL 13

14 POISSON-EXAMPLE WITH R  nba <- read.csv("F:/STATS544/nba.cs v",header=TRUE)  poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FT M, family = "poisson", data = nba)  summary(poiss) 14

15 R-GOODNESS OF FITS Deviance Residuals: Min 1Q Median 3Q Max (Dispersion parameter for poisson family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 93 degrees of freedom AIC:

16 R-ANALYSIS OF PARAMETER ESTIMATES Call: glm(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, family = "poisson", data = nba) Coefficients: Estimate Std. Error z value Pr(>|z|) --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 16 ESTIMATESTD.ERRORZ VALUEPR(>|z|) (Intercept) < 2e-16 *** GP e-13 *** DREB AST STL * TOV e-09 *** FTM < 2e-16 ***

17 POISSON WITH SPSS & MINITAB SPSS genlin FGA with GP DREB AST STL TOV FTM /model GP DREB AST STL TOV FTM INTERCEP=YES distribution = poisson link = log /print FIT SUMMARY SOLUTION. MINITAB Stat > Regression > Poisson Regression > Fit Poisson Model. 17

18 Detecting over-dispersion with SAS Poisson regression gives a ratio between DEVIANCE and DF >1. proc genmod data = nba; model FGA= GP DREB AST STL TOV FTM /dist=poisson; run; PROC MEANS---  the variance of FGA(Y) is much higher than its mean proc means data = nba n mean var min max; var FGA run;

19 Detecting over-dispersion with R  Poisson regression gives a ratio between RESIDUAL DEVIANCE and DF >1  poiss<-glm(FGA ~GP+DREB+AST+STL+TOV+FTM, family = "poisson", data = nba)  summary(poiss)  mean(nba$FGA)  [1]  var(nba$FGA)  [1]

20 NEGATIVE BINOMIAL REGRESSION  Generalization of Poisson regression  Used for over-dispersed count data  PMF:  E(Y)= , V(Y) =  k  2 )  K=dispersion parameter  As k  0, the V(Y)   approaches Poisson and V(Y)=E(Y)=   Link Function same as Poisson: g(  ) = log( .)  Equation: Log(λ(X))= β0 + β1Χ1 + β2Χ2+……..+ βp-1Xp-1  Goodness Of fit Test-same as Poisson 20

21 NEGATIVE BINOMAL-EXAMPLE WITH SAS proc genmod data = nba; model FGA= GP DREB AST STL TOV FTM /dist=negbin; (ONLY DIFFERENCE FROM POISSON) run; /*check goodness of fit for model*/ data pvalue; df = 93; chisq = ; pvalue = 1 - probchi(chisq, df); run; proc print data = pvalue noobs; run; 21

22 EXAMPLE RESULTS-GOODNESS OF FIT Data Set WORK.NBA Distribution Negative Binomial Link Function Log Dependent Variable FGA Number of Observations Read 100 Number of Observations Used 100 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Full Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

23 RESULTS: Analysis of Maximum Likelihood Parameter Estimates 23 PARAMETERDFESTIMATESTANDARD ERROR WALD 95% CONFIDENCE LIMITS WALD CHI- SQUARE PR>CHI- SQ INTERCEPT (3.8525,4.4958)647.01<.0001 GP (0.0181,0.0671) DREB ( ,0.0016) AST ( ,0.0014) STL ( ,0.0077) TOV (0.0015,0.0105) FTM (0.0023,0.0061)19.32<.0001 DISPERSION (0.0163,0.0325)

24 Assessment of Results  Ratio of Deviance/Df= ≈1 (over-dispersion fixed!)  Deviance= , now is well fit because pvalue=1- prob(chisq,df) IS significant  Extra parameter in the “Analysis of Maximum Likelihood Parameter Estimates” called “Dispersion” (aka ALPHA)  Accounts for the over-dispersion factor we came across in the Poisson regression  This estimate has a value of.0230 with a Wald Confidence Interval of (.0163, 0325). Based on the 95% Confidence Limits for our dispersion parameter, we can say that dispersion is significantly different from 0, justifying the negative binomial model is more appropriate  GP, TOV, & FTM only significant predictors 24

25 NEGATIVE BINOMIAL-EXAMPLE WITH R  nba <- read.csv("F:/STATS544/nba.csv",header= TRUE)  install.packages('MASS')  library(MASS)  nb<-glm.nb(FGA ~GP+DREB+AST+STL+TOV+FTM, data = nba)  summary(nb) 25

26 EXAMPLE RESULTS-GOODNESS OF FIT (Dispersion parameter for Negative Binomial( ) family taken to be 1) Null deviance: on 99 degrees of freedom Residual deviance: on 93 degrees of freedom AIC: Number of Fisher Scoring iterations: 1 Deviance Residuals: Min 1Q Median 3Q Max Theta: Std. Err.: x log-likelihood:

27 RESULTS: Analysis of Maximum Likelihood Parameter Estimates Call: glm.nb(formula = FGA ~ GP + DREB + AST + STL + TOV + FTM, data = nba, init.theta = , link = log) Coefficients: Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 27 ESTIMATESTD.ERRORZ-VALUEPR(>|Z|) (Intercept) < 2e-16 *** GP *** DREB AST STL TOV ** FTM e-06 ***

28 INTERPETATION OF SIGNIFICANT COEFFICIENTS  GP: Holding all other variables constant, for every one unit addition of games played, the expected log number of field goal attempts will go up by Or similarly, for every additional game played, the number of field goal attempts will increase by 4.35%  TOV: Holding all other variables constant, for every one extra TOV, the expected log number of field goal attempts will increase by Or similarly, for every additional turnover made, the number of field goal attempts will increase by 0.60%.  FTM: Holding all other variables constant, for every one unit addition of free throws made, the expected log number of field goal attempts will go up by Or similarly, for every additional free throw made, the number of field goal attempts will increase by 0.42%. 28

29 NEGATIVE BINOMIAL WITH SPSS & MINITAB SPSS genlin FGA with GP DREB AST STL TOV FTM /model GP DREB AST STL TOV FTM INTERCEP=YES Distribution=negbin(mle) link = log /print FIT SUMMARY SOLUTION. MINITAB NA 29

30 SUMMARY  Use Poisson regression when dealing with COUNT data  If there’s Overdispersion, switch to Negative binomial  Assumptions for both Poisson and NB are the same  Both model coefficients are interpreted same manner  Can perform both regressions in SAS, R, & SPSS 30

31 31


Download ppt "THE POISSON & NEGATIVE BINOMIAL MODELS By: ALVARD AYRAPETYAN."

Similar presentations


Ads by Google