Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004.

Slides:



Advertisements
Similar presentations
Point Estimation Notes of STAT 6205 by Dr. Fan.
Advertisements

1 Chi-Square Test -- X 2 Test of Goodness of Fit.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [ Research.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Lwando Kondlo Supervisor: Prof. Chris Koen University of the Western Cape 12/3/2008 SKA SA Postgraduate Bursary Conference Estimation of the parameters.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Sampling Distributions (§ )
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Tests Jean-Yves Le Boudec. Contents 1.The Neyman Pearson framework 2.Likelihood Ratio Tests 3.ANOVA 4.Asymptotic Results 5.Other Tests 1.
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
DISTRIBUTION FITTING.
A Review of Probability and Statistics
Simulation Modeling and Analysis
1 Choice of Distribution 1.Theoretical Basis e.g. CLT, Extreme value 2.Simplify calculations e.g. Normal or Log Normal 3.Based on data: - Histogram - Probability.
Statistical inference form observational data Parameter estimation: Method of moments Use the data you have to calculate first and second moment To fit.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Inference about a Mean Part II
SIMULATION MODELING AND ANALYSIS WITH ARENA
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chapter 13: Inference in Regression
Risk Modeling of Multi-year, Multi-line Reinsurance Using Copulas
Testing Distributions Section Starter Elite distance runners are thinner than the rest of us. Skinfold thickness, which indirectly measures.
HSRP 734: Advanced Statistical Methods July 10, 2008.
Input Analysis 1.  Initial steps of the simulation study have been completed.  Through a verbal description and/or flow chart of the system operation.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Modeling and Simulation CS 313
Traffic Modeling.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
Chapter 9 – Modeling Breaking Strength with Dichotomous Data You are a statistician working for the Cry Your Eyes Out Tissue Company. The company wants.
The COTOR Challenge Committee on the Theory of Risk November 2004 Annual Meeting.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Integrated circuit failure times in hours during stress test David Swanick DSES-6070 HV5 Statistical Methods for Reliability Engineering Summer 2008 Professor.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
1 Statistical Distribution Fitting Dr. Jason Merrick.
1999 CAS SEMINAR ON RATEMAKING OPRYLAND HOTEL CONVENTION CENTER MARCH 11-12, 1999 MIS-43 APPLICATIONS OF THE MIXED EXPONENTIAL DISTRIBUTION CLIVE L. KEATINGE.
Validity and application of some continuous distributions Dr. Md. Monsur Rahman Professor Department of Statistics University of Rajshahi Rajshshi – 6205.
Testing Models on Simulated Data Presented at the Casualty Loss Reserve Seminar September 19, 2008 Glenn Meyers, FCAS, PhD ISO Innovative Analytics.
Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
0K. Salah Input Analysis Ref: Law & Kelton, Chapter 6.
STATISTICAL ANALYSIS OF FATIGUE SIMULATION DATA J R Technical Services, LLC Julian Raphael 140 Fairway Drive Abingdon, Virginia.
Chapter 9 Input Modeling Banks, Carson, Nelson & Nicol Discrete-Event System Simulation.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Goodness-of-Fit Chi-Square Test: 1- Select intervals, k=number of intervals 2- Count number of observations in each interval O i 3- Guess the fitted distribution.
Fitting probability models to frequency data. Review - proportions Data: discrete nominal variable with two states (“success” and “failure”) You can do.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
HYPOTHESIS TESTING Distributions(continued); Maximum Likelihood; Parametric hypothesis tests (chi-squared goodness of fit, t-test, F-test) LECTURE 2 Supplementary.
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19,
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
2000 SEMINAR ON REINSURANCE PITFALLS IN FITTING LOSS DISTRIBUTIONS CLIVE L. KEATINGE.
CARe Seminar ILF estimation Oliver Bettis 15 th September 2009.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
© 2007 Thomson Brooks/Cole, a part of The Thomson Corporation. FIGURES FOR CHAPTER 8 ESTIMATION OF PARAMETERS AND FITTING OF PROBABILITY DISTRIBUTIONS.
Choosing A Distribution Risk Analysis for Water Resources Planning and Management Institute for Water Resources May 2008.
Review. Common probability distributions Discrete: binomial, Poisson, negative binomial, multinomial Continuous: normal, lognormal, beta, gamma, (negative.
Modeling and Simulation CS 313
Modeling and Simulation CS 313
DISCUSSION OF MINIMUM DISTANCE ESTIMATION OF LOSS DISTRIBUTIONS BY STUART A. KLUGMAN AND A. RAHULJI PARSA CLIVE L. KEATINGE.
Statistical Hydrology and Flood Frequency
Estimation Maximum Likelihood Estimates Industrial Engineering
Estimation Maximum Likelihood Estimates Industrial Engineering
Sampling Distributions (§ )
Fractional-Random-Weight Bootstrap
Presentation transcript:

Toward a unified approach to fitting loss models Jacques Rioux and Stuart Klugman, for presentation at the IAC, Feb. 9, 2004

Handout/slides me

Overview What problem is being addressed? The general idea The specific ideas  Models to consider  Recording the data  Representing the data  Testing a model  Selecting a model

The problem Too many models  Two books – 26 distributions!  Can mix or splice to get even more Data can be confusing  Deductibles, limits Too many tests and plots  Chi-square, K-S, A-D, p-p, q-q, D

The general idea Limited number of distributions Standard way to present data Retain flexibility on testing and selection

Distributions Should be  Familiar  Few  Flexible

A few familiar distributions Exponential  Only one parameter Gamma  Two parameters, a mode if  Lognormal  Two parameters, a mode Pareto  Two parameters, a heavy right tail

Flexible Add by allowing mixtures That is, where and all Some restrictions:  Only the exponential can be used more than once.  Cannot use both the gamma and lognormal.

Why mixtures? Allows different shape at beginning and end (e.g. mode from lognormal, tail from Pareto). By using several exponentials can have most any tail weight (see Keatinge).

Estimating parameters Use only maximum likelihood  Asymptotically optimal  Can be applied in all settings, regardless of the nature of the data  Likelihood value can be used to compare different models

Representing the data Why do we care?  Graphical tests require a graph of the empirical density or distribution function.  Hypothesis tests require the functions themselves.

What is the issue? None if,  All observations are discrete or grouped  No truncation or censoring But if so,  For discrete data the Kaplan-Meier product- limit estimator provides the empirical distribution function (and is the nonparametric mle as well).

Issue – grouped data For grouped data,  If completely grouped, the histogram represents the pdf, the ogive the cdf.  If some grouped, some not, or multiple deductibles, limits, our suggestion is to replace the observations in the interval with that many equally spaced points.

Review Given a data set, we have the following:  A way to represent the data.  A limited set of models to consider.  Parameter estimates for each model. The remaining tasks are:  Decide which models are acceptable.  Decide which model to use.

Example The paper has two example, we will look only at the second one. Data are individual payments, but the policies that produced them had different deductibles (100, 250, 500) and different maximum payments (1,000, 3,000, 5,000). There are 100 observations.

Empirical cdf

Distribution function plot Plot the empirical and model cdfs together. Note, because in this example the smallest deductible is 100, the empirical cdf begins there. To be comparable, the model cdf is calculated as

Example model All plots and tests that follow are for a mixture of a lognormal and exponential distribution. The parameters are

Distribution function plot

Confidence bands It is possible to create 95% confidence bands. That is, we are 95% confident that the true distribution is completely within these bands. Formulas adapted from Klein and Moeschberger with a modification for multiple truncation points (their formula allows only multiple censoring points).

CDF plot with bounds

Other CDF pictures Any function of the cdf, such as the limited expected value, could be plotted. The only one shown here is the difference plot – magnify the previous plot by plotting the difference of the two distribution functions.

CDF difference plot

Histogram plot Plot a histogram of the data against the density function of the model. For data that were not grouped, can use the empirical cdf to get cell probabilities.

Histogram plot

Hypothesis tests Null-model fits Alternative-it doesn’t Three tests  Kolmogorov-Smirnov  Anderson-Darling  Chi-square

Kolmogorov-Smirnov Test statistic is maximum difference between the empirical and model cdfs. Each difference is multiplied by a scaling factor related to the sample size at that point. Critical values are way off when parameters estimated from data.

Anderson-Darling Test statistic looks complex: where e is empirical and m is model. The paper shows how to turn this into a sum. More emphasis on fit in tails than for K-S test.

Chi-square test You have seen this one before. It is the only one with an adjustment for estimating parameters.

Results K-S: A-D: Chi-square p-value of The model is clearly acceptable. Simulation study needed to get p-values for these tests. Simulation indicates that the p-values are over 0.9.

Comparing models Good picture Better test numbers Likelihood criterion such as Schwarz Bayesian. The SBC is the loglikelihood minus (r/2)ln(n) where r is the number of parameters and n is the sample size.

Several models ModelLoglikeA-DK-SChi-sqSBC Exp Ln Gam L/E G/E L/E/E G/E/E

Which is the winner? Referee A – loglikelihood rules – pick gamma/exp/exp mixture  This is a world of one big model and the best is the best, simplicity is never an issue. Referee B – SBC rules – pick exponential  Parsimony is most important, pay a penalty for extra parameters. Me – lognormal/exp. Great pictures, better numbers than exponential, but simpler than three component mixture.

Can this be automated? We are working on software Test version can be downloaded at MLEs are good. Pictures and test statistics are not quite right. May crash. Here is a quick demo.