Modeling the Loss Process for Medical Malpractice Bill Faltas GE Insurance Solutions CAS Special Interest Seminar … Predictive Modeling “GLM and the Medical.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Nguyen Ngoc Anh Nguyen Ha Trang
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Regression with a Binary Dependent Variable
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
P&C Reserve Basic HUIYU ZHANG, Principal Actuary, Goouon Summer 2008, China.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Clustered or Multilevel Data
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
Analysis of Complex Survey Data Day 3: Regression.
An Introduction to Logistic Regression
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Severity Distributions for GLMs: Gamma or Lognormal? Presented by Luyang Fu, Grange Mutual Richard Moncher, Bristol West 2004 CAS Spring Meeting Colorado.
Review of Lecture Two Linear Regression Normal Equation
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Practical GLM Modeling of Deductibles
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
Non-life insurance mathematics Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Logistic Regression Database Marketing Instructor: N. Kumar.
Estimating the Predictive Distribution for Loss Reserve Models Glenn Meyers Casualty Loss Reserve Seminar September 12, 2006.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
 2007 National Council on Compensation Insurance, Inc. All Rights Reserved. 1 “Forecasting Workers Compensation Severities And Frequency Using The Kalman.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Institute for Mathematics and Its Applications
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
IMPROVING ACTUARIAL RESERVE ANALYSIS THROUGH CLAIM-LEVEL PREDICTIVE ANALYTICS 1 Presenter: Chris Gross.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Generalized Linear Models (GLMs) and Their Applications.
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Linear Models Alan Lee Sample presentation for STATS 760.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Practical GLM Analysis of Homeowners David Cummings State Farm Insurance Companies.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture 9 Review.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Lecturer: Ing. Martina Hanová, PhD. Business Modeling.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
CAS Ratemaking Seminar COM-21 Medical Malpractice Pricing Jeff Donaldson, FCAS, MAAA The Doctors’ Company.
Methods of Presenting and Interpreting Information Class 9.
Introduction to Probability - III John Rundle Econophysics PHYS 250
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Generalized Linear Models
Simultaneous equation system
Generalized Linear Models (GLM) in R
Introduction to logistic regression a.k.a. Varbrul
CHAPTER 29: Multiple Regression*
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Regression Assumptions
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Regression Assumptions
Presentation transcript:

Modeling the Loss Process for Medical Malpractice Bill Faltas GE Insurance Solutions CAS Special Interest Seminar … Predictive Modeling “GLM and the Medical Malpractice Crisis” Session October 4, 2004 Chicago, IL © Employers Reinsurance Corporation Patent Pending

2 © Employers Reinsurance Corporation Patent Pending "The work of science is to substitute facts for appearances and demonstrations for impressions.“ John Ruskin

3 © Employers Reinsurance Corporation Patent Pending Regression Modeling  Simply: A functional relationship between one unknown (Y) and one or more knowns (X’s) Y = f (X 1, X 2,..., X n )  error  Statistically: A distribution for Y with parameters that vary with X Example: Ordinary Least Squares (OLS) (“linear regression”) Y ~ Normal (  i,  2 ) Estimate both  i and  2  have a measure of variability (  2 ) E(Y) is a linear combination of X’s  i = a + b 1 X 1i + b 2 X 2i +…+ b n X ni Estimate parameters a, b 1, b 2, …, b n

4 © Employers Reinsurance Corporation Patent Pending Terminology X’s (explanatory / covariate / predictor / independent variables) could be: (1) Numerical:(a) Continuous [e.g., years of practice, square feet] (b) Discrete [e.g., # past claims] (2) Categorical: (a) Ordinal [e.g., income or state group (H / M / L)] (b) Nominal [e.g., gender (M/F), state] Y (response / dependent variable) could be: (1) Continuous [e.g., total $ losses from an insurance policy] (2) Discrete [e.g., # of insurance claims] (3) Binary [e.g., whether an insurance policy is likely to have a claim (Y/N)]

5 © Employers Reinsurance Corporation Patent Pending Popular Regression Modeling Choices Y Continuous Ordinary Least Squares Model (OLS) Y Binary (0,1) Logistic Model Y Positive (Y>0) Exponential Model Y Discrete {0,1,2,3, …} Poisson Model Y

6 © Employers Reinsurance Corporation Patent Pending Model Item GLMOLSLogistic Form of Y AnyContinuousBinary (0,1) Distribution of Y Y ~ Exponential Family Y ~ Normal ( ,  2 ) (in exponential family) Y (=1/0) ~ Bernoulli (P) (in exponential family) Model [E(Y)] Mean(Y) = h(X  ) Mean(Y i ) = f(a + b 1 X 1i + … + b n X ni ) f(linear combination of X’s)  = X   i = a + b 1 X 1i + …+ b n X ni (linear combination of X’s) P = e X  / (1 + e X  ) P i = P(Y i =1) = e L i / (1+ e Li ) where L i =a + b 1 X 1i + … + b n X ni Method of Estimating a, b 1, …, b n M.L.E. Method of Least Squares (same as M.L.E. for Normal) M.L.E. GLM, OLS, and Logistic

7 © Employers Reinsurance Corporation Patent Pending Loss Process Model for Medical Malpractice  Line Characteristic: low frequency / high severity  Objective: Build models to forecast emergence and ultimate values for (Y’s) # notices (a.k.a. incidents) # notices that turn into claims with indemnity payment $ losses  Based on Four Types of X’s Policyholder attributes … state, specialty, years of practice, etc. Policy attributes … form type, limit, etc. Environmental attributes … lawyers per 1000, births per 1000, etc. Time … e.g., policy age measures time since effective date

8 © Employers Reinsurance Corporation Patent Pending pdf Dependence of Likelihood on X 1 X1X1 Not significantly different Likelihood of Notice Claim Likelihood for doctor rises and falls with Age Likelihood at policy age 2.5 years (mode), rises and falls with X 2 Likelihood is a function of many (X) variables, including policy age Likelihood changes with X 1 and X 2  include both in model Y is binary (1/0), “whether there is a notice or not” Likelihood at policy age 2.5 years (mode) increases with X 1

9 © Employers Reinsurance Corporation Patent Pending To model: P = Likelihood of Notice = Pr(Y=1) Likelihood of Notice A Logistic Model (a GLM application) Transform some of the X variables, including policy age Develop model based on 70% data Validate model on remaining 30% of data Compare actual vs. modeled triangles of ‘# policies with notices’ Finalize parameters on 100% of data P = P(Y=1) = e L / (1+ e L ) where L =a + b 1 X 1 + … + b n X n

10 © Employers Reinsurance Corporation Patent Pending Model Validation Approaches Set aside sample Develop parameters using remaining data Verify model works against sample Finalize model using all data Set aside 1 st sample Develop parameters using remaining data Verify model works against 1 st sample Resample and redo … n times Finalize model using all data Divide data into n partitions (often 4-6) Set aside 1 st partition Develop parameters using other partitions Verify model works against 1 st partition Repeat process for all other partitions Finalize model using all data SamplingResamplingPartitioning Uses all data

11 © Employers Reinsurance Corporation Patent Pending Notice to Claim … Waiting Time Approach Waiting time defined as time from notice to claim Waiting time approach enables lack of claim data to be used as information # Claims = (# notices) x (prob. of notice turning into a claim) Area represents probability of turning into a claim years after receiving notice (no actual data prior to 1.0 year). “Waiting time” varies by different values of attribute X 2  include X 2 in notice-to-claim model

12 © Employers Reinsurance Corporation Patent Pending Estimate Claim Closing Values (Claim Sizes) Model trended claim sizes using standard actuarial approaches –Closed claims, without regard to closing lag –Closed claims by closing lag –Closed claims by policyholder attributes Compare company data and models with external benchmarks Select model(s) Test modeled severities against actual severities –Actual severities in development triangles –Modeled severities: f(policyholder, policy, closing year)

13 © Employers Reinsurance Corporation Patent Pending P.D.F of Log of claim sizes by 8 groups of X 1 LN(Claim Size) Density Claim Size Distribution Claim size distribution varies by different values of attribute X 1  include X 1 in claim size modeling modes Claim size model parameters are a function of significant attributes Model location and shape varies w/attributes A way to introduce distributional variation

14 © Employers Reinsurance Corporation Patent Pending Modeling Summary # Notices (Logistic Model) Claim Size Distribution Policyholder Attributes Policy Attributes Environmental Attributes CLAIM COUNTS Notices Becoming Claims (Waiting Time) # Claims = # Notices x Prob of Notice to Claim CLAIM SIZES $ LOSSES $ Losses = # Claims x Claim Size

15 © Employers Reinsurance Corporation Patent Pending GLM Application Advantages  Useful for all lines, including low freq / high sev  Identifies and uses significant variables simultaneously  Effective in dealing with interacting variables  Can use time element to model emergence and ultimates  Variability of modeled estimates can be byproduct and useful for measurements of risk/uncertainty  Multiple applications  Underwriting  Pricing  Reserving  Risk