A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper - 2004.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Chapter 3 Properties of Random Variables
NORMAL OR GAUSSIAN DISTRIBUTION Chapter 5. General Normal Distribution Two parameter distribution with a pdf given by:
1 General Iteration Algorithms by Luyang Fu, Ph. D., State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting LLP 2007 CAS.
Copula Regression By Rahul A. Parsa Drake University &
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
What role should probabilistic sensitivity analysis play in SMC decision making? Andrew Briggs, DPhil University of Oxford.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Part V The Generalized Linear Model Chapter 16 Introduction.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
9. SIMPLE LINEAR REGESSION AND CORRELATION
A Review of Probability and Statistics
Probability Densities
Generalised linear models
1 Chain ladder for Tweedie distributed claims data Greg Taylor Taylor Fry Consulting Actuaries University of New South Wales Actuarial Symposium 9 November.
Estimation of parameters. Maximum likelihood What has happened was most likely.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Presenting: Assaf Tzabari
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Linear and generalised linear models
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
1 Math 479 / 568 Casualty Actuarial Mathematics Fall 2014 University of Illinois at Urbana-Champaign Professor Rick Gorvett Session 14: Credibility October.
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
The Lognormal Distribution
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
Review of Lecture Two Linear Regression Normal Equation
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Traffic Modeling.
Generalized Minimum Bias Models
Introduction to Generalized Linear Models Prepared by Louise Francis Francis Analytics and Actuarial Data Mining, Inc. October 3, 2004.
Practical GLM Modeling of Deductibles
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
RMK and Covariance Seminar on Risk and Return in Reinsurance September 26, 2005 Dave Clark American Re-Insurance Company This material is being provided.
Approximation of Aggregate Losses Dmitry Papush Commercial Risk Reinsurance Company CAS Seminar on Reinsurance June 7, 1999 Baltimore, MD.
Geographic Information Science
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
More Continuous Distributions
The Triangle of Statistical Inference: Likelihoood Data Scientific Model Probability Model Inference.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Pro gradu –thesis Tuija Hevonkorpi.  Basic of survival analysis  Weibull model  Frailty models  Accelerated failure time model  Case study.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Bivariate Poisson regression models for automobile insurance pricing Lluís Bermúdez i Morata Universitat de Barcelona IME 2007 Piraeus, July.
Ilona Verburg Nicolette de Keizer Niels Peek
Generalized Linear Models (GLMs) and Their Applications.
An ecological analysis of crime and antisocial behaviour in English Output Areas, 2011/12 Regression modelling of spatially hierarchical count data.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Stochastic Loss Reserving with the Collective Risk Model Glenn Meyers ISO Innovative Analytics Casualty Loss Reserving Seminar September 18, 2008.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Estimating the Predictive Distribution for Loss Reserve Models Glenn Meyers ISO Innovative Analytics CAS Annual Meeting November 14, 2007.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Biostatistics Class 3 Probability Distributions 2/15/2000.
 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Introduction to Probability - III John Rundle Econophysics PHYS 250
Dave Clark American Re-Insurance 2003 Casualty Loss Reserve Seminar
Probability Theory and Parameter Estimation I
Generalized Linear Models
Generalized Linear Models
Generalized Linear Models
Presentation transcript:

A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper

2 Agenda Brief Introduction to GLM Overview of the Exponential Family Some Specific Distributions Suggestions for Insurance Applications

3 Context for GLM Linear Regression Generalized Linear Models Maximum Likelihood Y~ NormalY ~ Exponential FamilyY ~ Any Distribution

4 Advantages over Linear Regression Instead of linear combination of covariates, we can use a function of a linear combination of covariates Response variable stays in original units Great flexibility in variance structure

5 Transforming the Response versus Transforming the Covariates Linear RegressionGLM E[g(y)] = X·  E[y] = g -1 (X·  ) Note that if g(y)=ln(y), then Linear Regression cannot handle any points where y  0.

6 Advantages of this Special Case of Maximum Likelihood Pre-programmed in many software packages Direct calculation of standard errors of key parameters Convenient separation of Mean parameter from “nuisance” parameters

7 Advantages of this Special Case of Maximum Likelihood GLM useful when theory immature, but experience gives clues about:  How mean response affected by external influences, covariates  How variability relates to mean  Independence of observations  Skewness/symmetry of response distribution

8 General Form of the Exponential Family Note that y i can be transformed with any function e().

9 “Natural” Form of the Exponential Family Note that y i is no longer within a function. That is, e(y i )=y i.

10 Specific Members of the Exponential Family Normal (Gaussian) Poisson Negative Binomial Gamma Inverse Gaussian

11 Some Other Members of the Exponential Family Natural Form  Binomial  Logarithmic  Compound Poisson/Gamma (Tweedie) General Form [use ln(y) instead of y]  Lognormal  Single Parameter Pareto

12 Normal Distribution Natural Form: The dispersion parameter, , is replaced with  2 in the more familiar form of the Normal Distribution.

13 Poisson Distribution Natural Form: “Over-dispersed” Poisson allows   1. Variance/Mean ratio = 

14 Negative Binomial Distribution Natural Form: The parameter k must be selected by the user of the model.

15 Gamma Distribution Natural Form: Constant Coefficient of Variation (CV): CV =  -1/2

16 Inverse Gaussian Distribution Natural Form:

17 Table of Variance Functions DistributionVariance Function Normal Var(y) =  Poisson Var(y) =  ·  Negative Binomial Var(y) =  ·  +(  /k)·  2 Gamma Var(y) =  ·  2 Inverse Gaussian Var(y) =  ·  3

18 The Unit Variance Function We define the “Unit Variance” function as V(  ) = Var(y) / a(  ) That is,  =1 in the previous table.

19 Uniqueness Property The unit variance function V(  ) uniquely identifies its parent distribution type within the natural exponential family. f(y)  V(  )

20 Table of Skewness Coefficients DistributionSkewness Normal 0 Poisson CV Negative Binomial[1+  /(  +k)]·CV Gamma 2·CV Inverse Gaussian 3·CV

21 Graph of Skewness versus CV

22 The Big Question: What should the variance function look like for insurance applications?

23 What is the Response Variable? Number of Claims Frequency (# claims per unit of exposure) Severity Aggregate Loss Dollars Loss Ratio (Aggregate Loss / Premium) Loss Rate (Aggregate Loss per unit of exposure)

24 An Example for Considering Variance Structure How would you calculate the mean and variance in these loss ratios?

25 Defining a Variance Structure We intuitively know that variance changes with loss volume – but how? This is the same as asking “ V(  ) = ?”

26 Defining a Variance Structure We want CV to decrease with loss size, but not too quickly. GLM provides several approaches: Negative BinomialVar(y) =  ·  +(  /k)·  2 TweedieVar(y) =  ·  p 1<p<2 Weighted L-SVar(y) =  /w

27 The Negative Binomial The variance function: Var(y) =  ·  + (  /k)·  2 random systematic variance variance

28 The “Tweedie” Distribution TweedieNeg. Binomial FrequencyPoisson Poisson SeverityGammaLogarithmic (exponential when p=1.5) Both the Tweedie and the Negative Binomial can be thought of as intermediate cases between the Poisson and Gamma distributions.

29 Defining a Variance Structure Negative Binomial Tweedie

30 Defining a Variance Structure

31 Weighted Least-Squares Use Normal Distribution but set a(  ) =  /w i such that, variance is proportional to some external exposure weight w i. This is equivalent to weighted least- squares:L-S = Σ(y i -  i ) 2 ·w i

32 Conclusion A model fitted to insurance data should reflect the variance structure of the phenomenon being modeled. GLM provides a flexible tool for doing this.