Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

Similar presentations


Presentation on theme: "By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry."— Presentation transcript:

1 By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry.

2 DEPENDENT VARIABLE: Number of claims (0, 1, 2) ATTRIBUTES:  1. Total number of vehicles on a policy  2. Total number of drivers on a policy  3. Anti-theft  4. Driver with training (1 if driver has training, 0 otherwise)  5. Age of the oldest driver  6. Age of the youngest driver  7. Territory (driver’s location) (Take the Log) ◦ Cost of territory (numerical value)  8. Sdip: The Safe Driver Insurance Plan  9. Credit Score flag (1 if driver has credit score, 0 otherwise)  10. Credit score (numerical value) (take the Log)  11. Business Source (1 if it’s a book transfer, 0 walking)  12. Group Insurance flag ( 1 if driver is from a group insurance, 0 otherwise)

3

4

5  Where is the expected value (mean) of y.  And  An unusual property:  This model can be estimated by maximum likelihood.

6  Overdispersion ( often Var(y) > E(y) ) or underdispersion.  While overdispersion doesn’t bias the coefficients, it does lead to underestimates of the standard errors.  Overdispersion also implies that conventional MLE are not efficient.  One Reason for Overdispersion: Excess Zero

7  The state wildlife biologists want to model how many fish are being caught by fishermen at a state park.  Visitors are asked how long they stayed, how many people were in the group, were there children in the group and how many fish were caught.  Some visitors do not fish, but there is no data on whether a person fished or not.  Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish.

8  Property and Casualty Insurance: The excess zeros may come from different sources :  1. Censor data create more zeros  2. Some drivers drive less or occasionally. They prefer taking public transportation.

9  It’s a generalization of the Poisson Model  Allows for correction of overdispersion  A disturbance term is included in the model which accounts for the overdispersion.  has a standard gamma distribution  is a constant (Poisson: )

10  Alternate Response to modeling Overdispersion  Some zeros result from fishing and not catching any fish. In the case of insurance, the Zeros may result from driving and not causing accidents.  Some zeros result from not fishing at all. In the case of insurance some zeros result from not driving a lot. Censor data may also result in more zeros.  Zero-inflated models allow one to model each process separately.

11  the zip model has two parts: a Poisson count model and the Logit model for predicting excess zeros. With probability A logit model is used with a count model

12

13

14

15  We have a large degree of freedom (DF) relative to the deviance: 55595>19119: Underdispersion(less variation in the model) Deviance = 19119 and DF = 55595  Overestimate of standard errors  MLE not efficient  Inadequate fit of the Poisson Model  Estimation of negative binomial and Zero inflated poisson

16

17 Quality of fitDEVIANCEAIC (Akaike Information Criteria): Poisson1911925703 Negative Bionomial (NB)1622225581 AIC (NB) < AIC (Poisson) NB is better AIC = 2k – 2ln(L) K= number of parameters in the model L = maximum value of the likelihood function The preferred model is the one with the minimum AIC AIC rewards goodness of fit and imposes a penalty that is an increase function of the number of estimated Parameters. The penalty discourages overfitting.

18

19

20 The Vuong test is a likelihood-ratio based test for model selection

21  Given our unbalanced we cannot used SVM  Conditional Random Field is a particular case of Log linear models.

22  A log linear model can be written as:   special case : Poisson:  Where the partition  Given x, the label predicted by the model is:  Is called a feature-function.

23  A feature-function is any mapping:

24  Linear-chain CRF  A case of Multilevel  Given a sentence: each word can be tag as: noun, verb, adjective, preposition, etc…  Fj is a sum along the sentence, for i = 1 to i = n where n is the length of

25  1: CRF++  2. CRFSGD (Stochastic gradient Descent)  3. Mallet :Umass Amherst

26 The current CRF may not be best suited for a model where the response variable is a count due to the way the feature functions are being built. The feature functions describe the interactions between response variables and covarites. I found (below) an extension of the CRF that can be applied to Count model. I may tried this one.  Eunho Yang, Pradeep K Ravikumar, Genevera I Allen, Zhandong Liu UT Austin; UT Austin; Rice University; Baylor College of Medicine: “Conditional Random Fields via Univariate Exponential Families” 2013 (Neural Information Processing Systems Foundation).

27  They introduced a “novel subclass of CRFs”, derived by imposing node-wise conditional distributions of response variables conditioned on the rest of the responses and the covariates as arising from univariate exponential families.  This allows them to derive novel multivariate CRFs given any univariate exponential distribution, including the Poisson, negative binomial, and exponential distributions.

28  http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm http://www.ats.ucla.edu/stat/r/dae/zipoisson.htm  http://videolectures.net/cikm08_elkan_llmacrf/ http://videolectures.net/cikm08_elkan_llmacrf/  http://nips.cc/Conferences/2013/Program/event.php?ID=38 11 http://nips.cc/Conferences/2013/Program/event.php?ID=38 11


Download ppt "By Torna Omar Soro Modeling risk of accidents in the Property and Casualty Insurance Industry."

Similar presentations


Ads by Google