Copyright © 2003, SAS Institute Inc. All rights reserved.

Slides:



Advertisements
Similar presentations
Data Analysis Class 4: Probability distributions and densities.
Advertisements

Let X 1, X 2,..., X n be a set of independent random variables having a common distribution, and let E[ X i ] = . then, with probability 1 Strong law.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Logistic Regression Example: Horseshoe Crab Data
Modeling silky shark bycatch
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
PROC GLIMMIX: AN OVERVIEW
Copyright © 2013, SAS Institute Inc. All rights reserved. GENERALIZED LINEAR MODELS.
The Power of Proc Nlmixed. Introduction Proc Nlmixed fits nonlinear mixed-effects models (NLMMs) – models in which the fixed and random effects have a.
Modeling Process Quality
Discrete Probability Distributions
Header= Verdana 28 pt., Red 1 STA 517 – Chapter 3: Inference for Contingency Tables 3. Inference for Contingency Tables 3.1 Confidence Intervals for Association.
Linear statistical models 2008 Binary and binomial responses The response probabilities are modelled as functions of the predictors Link functions: the.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
OLS versus MLE Example YX Here is the data:
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Generalized Linear Models
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Discrete Random Variables Chapter 4.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Fixed vs. Random Effects Fixed effect –we are interested in the effects of the treatments (or blocks) per se –if the experiment were repeated, the levels.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen.
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
Linear Model. Formal Definition General Linear Model.
Different Distributions David Purdie. Topics Application of GEE to: Binary outcomes: – logistic regression Events over time (rate): –Poisson regression.
 2007 National Council on Compensation Insurance, Inc. All Rights Reserved. 1 “Forecasting Workers Compensation Severities And Frequency Using The Kalman.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
AOV Assumption Checking and Transformations (§ )
By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.
Negative Binomial Regression NASCAR Lead Changes
Bivariate Poisson regression models for automobile insurance pricing Lluís Bermúdez i Morata Universitat de Barcelona IME 2007 Piraeus, July.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 STA 617 – Chp9 Loglinear/Logit Models 9.7 Poisson regressions for rates  In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
1 STA 517 – Chp4 Introduction to Generalized Linear Models 4.3 GENERALIZED LINEAR MODELS FOR COUNTS  count data - assume a Poisson distribution  counts.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 4 Discrete Random Variables.
1 STA 617 – Chp12 Generalized Linear Mixed Models SAS for Model (12.3) with Matched Pairs from Table 12.1.
Discrete Choice Modeling William Greene Stern School of Business New York University.
SAS® Global Forum 2014 March Washington, DC Got Randomness?
Discrete Choice Modeling William Greene Stern School of Business New York University.
Beginning Statistics Table of Contents HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems, Inc.
Stochastic Loss Reserving with the Collective Risk Model Glenn Meyers ISO Innovative Analytics Casualty Loss Reserving Seminar September 18, 2008.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Chapter 3 Statistical Models or Quality Control Improvement.
Chapter 31Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 5 Discrete Random Variables.
Analysis of Overdispersed Data in SAS
Transforming the data Modified from:
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Discrete Random Variables
Generalized Linear Models
Generalized Linear Model
Generalized Linear Models
SA3202 Statistical Methods for Social Sciences
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Count Data Models in SAS
Discrete Random Variables
Statistical Models or Quality Control Improvement
Presentation transcript:

Copyright © 2003, SAS Institute Inc. All rights reserved.

Fitting Extended Count Data Models to Insurance Claims CAS 2005 Ratemaking Seminar Matt Flynn March, 2005

Introduction This presentation discusses several example GLM count data models. The models are fit with Poisson and Negative Binomial distributions using Proc GENMOD. The example models are next fit using Proc NLMIXED. More flexible model count data model variations such as Generalized Negative Binomial, Generalized Poisson and Zero-Inflated Poisson (ZIP) models are introduced using Proc NLMIXED.

proc tabulate data=ridout format=comma6.; class photo bap roots ; var n; table (roots all), (photo*bap=' ' all)*n=' '*sum=' ' / box = ' BAP (muM)' rts=25; run; Introductory example – Ridout, Hinde, and Demetrio Proc TABULATE code to examine our example data

proc univariate data=ridout2; class photo; var roots; histogram roots / midpoints=0 to 17 cframe=ligr cfill=blue; run; A little EDA.

Proc UNIVARIATE histogram – number of roots

number of roots by photoperiod

Poisson model

proc genmod data=ridout2; model roots = photo2 / link=log dist=poisson; run; quit; Coding break … CountDataModels_2.sas Fit a Poisson model via Proc GENMOD Response probability distribution Linear model for mean Link function

Fit a Poisson model via Proc GENMOD, cont

proc nlmixed data=ridout2; eta_mu = b_0 + b_1*photo2; mu = exp(eta_mu); loglike = - mu + roots*log(mu) - log(fact(y)); model y ~ general(loglike); *model y ~ poisson(mu); run; Coding break... CountDataModels_3.sas Fit a Poisson model via Proc NLMIXED Linear model for mean Log link function Response probability distribution

Negative Binomial model Note: when k -> 0 then Pr(y) -> Poisson

proc genmod data=ridout2; model roots = photo2 / link=log dist=negbin; run; quit; Coding break... CountDataModels_4.sas Fit a Negative Binomial model via Proc GENMOD Response probability distribution

proc nlmixed data=ridout2; eta = b_0 + b_1*photo2; mean = exp(eta); loglike = (lgamma(roots + (1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mean) - (roots+(1/k))*log(1+k*mean)); model roots ~ general(loglike); run ; Coding break … CountDataModels_5.sas Fit a Negative Binomial model via Proc NLMIXED

Generalized Poisson model Note variance is proportional to the mean.

proc nlmixed data=ridout2; eta = b_0 + b_1*photo2; mu = exp(eta); loglike = (lgamma(roots + (1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mu) - (roots+(1/k))*log(1+k*mu)); model roots ~ general(loglike); run ; Coding break... CountDataModels_9.sas Fit a Generalized Poisson model via Proc NLMIXED

proc nlmixed data=ridout2; parms b_0=0 b_1=0 b_2=0 b_3=0 b_4=0 b_5=0 b_6=0 b_7=0 a_0=0 a_1=0; eta_lambda = b_0 + b_1*photo2 + b_2*bap2 + b_3*bap3 + b_4*bap4 + b_5*bp1 + b_6*bp2 + b_7*bp3; mean = exp(eta_lambda); eta_k = a_0 + a_1*photo2; * estimate a parameter in the NB dispersion; k = exp(eta_k); loglike = (lgamma(roots+(1/k)) - lgamma(roots+1) - lgamma(1/k) + roots*log(k*mean) - (roots+(1/k))*log(1+k*mean)); model roots ~ general(loglike); title 'Generalized Negative Binomial model'; run; Coding break... CountDataModels_9.sas Fit a Generalized Negative Binomial model via Proc NLMIXED (with parameters in the dispersion function)

Comparative histograms – Poisson, NegBin, GPD

Zero-Inflated Poisson (ZIP) model

proc nlmixed data=ridout2; parms bp_0=0 bp_1=0 bll_0=0 bll_1=0 p_0=0; eta_lambda = bll_0 + bll_1*photo2; lambda = exp(eta_lambda); if roots=0 then loglike = log(p_0 + (1-p_0)*exp(-lambda)); else loglike = log(1-p_0) + roots*log(lambda) - lambda - lgamma(roots+1); model roots ~ general(loglike); run; Fit a Zero-Inflated Poisson (ZIP) model via Proc NLMIXED Additional parameter for zero- inflation

proc nlmixed data=ridout2; parms bp_0=0 bp_1=0 bll_0=0 bll_1=0; eta_prob = bp_0 + bp_1*photo2; p_0 = exp(eta_prob)/(1 + exp(eta_prob)); eta_lambda = bll_0 + bll_1*photo2; lambda = exp(eta_lambda); if roots=0 then loglike = log(p_0 + (1-p_0)*exp(-lambda)); else loglike = log(1-p_0) + roots*log(lambda) - lambda - lgamma(roots+1); model roots ~ general(loglike); run; Fit a Zero-Inflated Poisson (ZIP) model via Proc NLMIXED Generalized with parameters in the Zero-Inflation probability Logit model for p_0 Poisson model for conditional mean

Compare a series of models * Each model has three components, 1) mean model, 2) Zero-inflation, 3) Dispersion parameter (for NB models) (c means constant)

proc gbarline data=outfreq; bar clm_freq / sumvar=percent discrete; plot / sumvar=pred raxis=axis1; run; quit; ZIP Models Yip and Yau Auto claim count data

Yip and Yau Auto claim count data – Poisson model

Yip and Yau Auto claim count data – ZIP model

ZIP model log-likelihood surface – p_0 & lambda

Ridout, M.S., Hinde, J.P. and Demetrio, C.G.B., Models for count data with many zeros, Proceedings of the XIXth International Biometric Conference, Cape Town, Invited Papers, 1998, , Yip, Karen C. H. and Kelvin K. W. Yau, Application of zero-inflated models for claim frequency data in general insurance, European Applied Business Conference, Venice Italy, 2003, Flynn, Matt, Modeling event count data with Proc GENMOD and the SAS system, SUGI 24, 1999 Bibliography/Resources

SAS-L – search for NLMIXED, ZIP models, Dale McLerran 8&group=comp.soft-sys.sas SAS Online Docs – Proc GENMOD Overview What is a Generalized Linear Model? Response Probability Distributions & Log-likelihood functions Bibliography/Resources, cont.

Matt Flynn (860) x8764 (806) cell