Copyright © 2003, SAS Institute Inc. All rights reserved.
Fitting Extended Count Data Models to Insurance Claims CAS 2005 Ratemaking Seminar Matt Flynn March, 2005
Introduction This presentation discusses several example GLM count data models. The models are fit with Poisson and Negative Binomial distributions using Proc GENMOD. The example models are next fit using Proc NLMIXED. More flexible model count data model variations such as Generalized Negative Binomial, Generalized Poisson and Zero-Inflated Poisson (ZIP) models are introduced using Proc NLMIXED.
proc tabulate data=ridout format=comma6.; class photo bap roots ; var n; table (roots all), (photo*bap=' ' all)*n=' '*sum=' ' / box = ' BAP (muM)' rts=25; run; Introductory example – Ridout, Hinde, and Demetrio Proc TABULATE code to examine our example data
proc univariate data=ridout2; class photo; var roots; histogram roots / midpoints=0 to 17 cframe=ligr cfill=blue; run; A little EDA.
Proc UNIVARIATE histogram – number of roots
number of roots by photoperiod
Poisson model
proc genmod data=ridout2; model roots = photo2 / link=log dist=poisson; run; quit; Coding break … CountDataModels_2.sas Fit a Poisson model via Proc GENMOD Response probability distribution Linear model for mean Link function
Fit a Poisson model via Proc GENMOD, cont
proc nlmixed data=ridout2; eta_mu = b_0 + b_1*photo2; mu = exp(eta_mu); loglike = - mu + roots*log(mu) - log(fact(y)); model y ~ general(loglike); *model y ~ poisson(mu); run; Coding break... CountDataModels_3.sas Fit a Poisson model via Proc NLMIXED Linear model for mean Log link function Response probability distribution
Negative Binomial model Note: when k -> 0 then Pr(y) -> Poisson
proc genmod data=ridout2; model roots = photo2 / link=log dist=negbin; run; quit; Coding break... CountDataModels_4.sas Fit a Negative Binomial model via Proc GENMOD Response probability distribution
proc nlmixed data=ridout2; eta = b_0 + b_1*photo2; mean = exp(eta); loglike = (lgamma(roots + (1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mean) - (roots+(1/k))*log(1+k*mean)); model roots ~ general(loglike); run ; Coding break … CountDataModels_5.sas Fit a Negative Binomial model via Proc NLMIXED
Generalized Poisson model Note variance is proportional to the mean.
proc nlmixed data=ridout2; eta = b_0 + b_1*photo2; mu = exp(eta); loglike = (lgamma(roots + (1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mu) - (roots+(1/k))*log(1+k*mu)); model roots ~ general(loglike); run ; Coding break... CountDataModels_9.sas Fit a Generalized Poisson model via Proc NLMIXED
proc nlmixed data=ridout2; parms b_0=0 b_1=0 b_2=0 b_3=0 b_4=0 b_5=0 b_6=0 b_7=0 a_0=0 a_1=0; eta_lambda = b_0 + b_1*photo2 + b_2*bap2 + b_3*bap3 + b_4*bap4 + b_5*bp1 + b_6*bp2 + b_7*bp3; mean = exp(eta_lambda); eta_k = a_0 + a_1*photo2; * estimate a parameter in the NB dispersion; k = exp(eta_k); loglike = (lgamma(roots+(1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mean) - (roots+(1/k))*log(1+k*mean)); model roots ~ general(loglike); title 'Generalized Negative Binomial model'; run; Coding break... CountDataModels_9.sas Fit a Generalized Negative Binomial model via Proc NLMIXED (with parameters in the dispersion function)
Comparative histograms – Poisson, NegBin, GPD
Zero-Inflated Poisson (ZIP) model
proc nlmixed data=ridout2; parms bp_0=0 bp_1=0 bll_0=0 bll_1=0 p_0=0; eta_lambda = bll_0 + bll_1*photo2; lambda = exp(eta_lambda); if roots=0 then loglike = log(p_0 + (1-p_0)*exp(-lambda)); else loglike = log(1-p_0) + roots*log(lambda) - lambda - lgamma(roots+1); model roots ~ general(loglike); run; Fit a Zero-Inflated Poisson (ZIP) model via Proc NLMIXED Additional parameter for zero- inflation
proc nlmixed data=ridout2; parms bp_0=0 bp_1=0 bll_0=0 bll_1=0; eta_prob = bp_0 + bp_1*photo2; p_0 = exp(eta_prob)/(1 + exp(eta_prob)); eta_lambda = bll_0 + bll_1*photo2; lambda = exp(eta_lambda); if roots=0 then loglike = log(p_0 + (1-p_0)*exp(-lambda)); else loglike = log(1-p_0) + roots*log(lambda) - lambda - lgamma(roots+1); model roots ~ general(loglike); run; Fit a Zero-Inflated Poisson (ZIP) model via Proc NLMIXED Generalized with parameters in the Zero-Inflation probability Logit model for p_0 Poisson model for conditional mean
Compare a series of models * Each model has three components, 1) mean model, 2) Zero-inflation, 3) Dispersion parameter (for NB models) (c means constant)
proc gbarline data=outfreq; bar clm_freq / sumvar=percent discrete; plot / sumvar=pred raxis=axis1; run; quit; ZIP Models Yip and Yau Auto claim count data
Yip and Yau Auto claim count data – Poisson model
Yip and Yau Auto claim count data – ZIP model
ZIP model log-likelihood surface – p_0 & lambda
Ridout, M.S., Hinde, J.P. and Demetrio, C.G.B., Models for count data with many zeros, Proceedings of the XIXth International Biometric Conference, Cape Town, Invited Papers, 1998, , Yip, Karen C. H. and Kelvin K. W. Yau, Application of zero-inflated models for claim frequency data in general insurance, European Applied Business Conference, Venice Italy, 2003, Flynn, Matt, Modeling event count data with Proc GENMOD and the SAS system, SUGI 24, 1999 Bibliography/Resources
SAS-L – search for NLMIXED, ZIP models, Dale McLerran 8&group=comp.soft-sys.sas SAS Online Docs – Proc GENMOD Overview What is a Generalized Linear Model? Response Probability Distributions & Log-likelihood functions Bibliography/Resources, cont.
Matt Flynn (860) x8764 (806) cell