1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University

2 Classical Multiple Linear Regression  Y i = a 0 + a 1 X i1 + a 2 X i2 …+ a m X im + e i  Y i are the response variables  X ij are predictors  i subscript denotes i th observation  j subscript identifies j th predictor  Y i = a 0 + a 1 X i1 + a 2 X i2 …+ a m X im + e i  Y i are the response variables  X ij are predictors  i subscript denotes i th observation  j subscript identifies j th predictor

3 One Predictor: Y i = a 0 + a 1 X i

4 Classical Multiple Linear Regression: Solving for a i

5 Classical Multiple Linear Regression  μ i = E[Y i ] = a 0 + a 1 X i1 + …+ a m X im  Y i is Normally distributed random variable with constant variance σ 2  Want to estimate μ i = E[Y i ] for each i  μ i = E[Y i ] = a 0 + a 1 X i1 + …+ a m X im  Y i is Normally distributed random variable with constant variance σ 2  Want to estimate μ i = E[Y i ] for each i

7 Problems with Traditional Model  Number of claims is discrete  Claim sizes are skewed to the right  Probability of an event is in [0,1]  Variance is not constant across data points i  Nonlinear relationship between X’s and Y’s  Number of claims is discrete  Claim sizes are skewed to the right  Probability of an event is in [0,1]  Variance is not constant across data points i  Nonlinear relationship between X’s and Y’s

8 Generalized Linear Models - GLMs  Fewer restrictions  Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.  Large and small policies can be put into one model  Y can be nonlinear function of X’s  Classical linear regression model is a special case  Fewer restrictions  Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.  Large and small policies can be put into one model  Y can be nonlinear function of X’s  Classical linear regression model is a special case

9 Generalized Linear Models - GLMs  Same goal Predict : μ i = E[Y i ]  Same goal Predict : μ i = E[Y i ]

10 Generalized Linear Models - GLMs  g(μ i )= a 0 + a 1 X i1 + …+ a m X im  g( ) is the link function  E[Y i ] = μ i = g -1 (a 0 + a 1 X i1 + …+ a m X im )  Y i can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …  Variance can be modeled  g(μ i )= a 0 + a 1 X i1 + …+ a m X im  g( ) is the link function  E[Y i ] = μ i = g -1 (a 0 + a 1 X i1 + …+ a m X im )  Y i can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …  Variance can be modeled

11 GLMs Extend Classical Linear Regression  If link function is identity: g(μ i ) = μ i  And Y i has Normal distribution → GLM gives same answer as Classical Linear Regression* * Least squares and MLE equivalent for Normal dist.  If link function is identity: g(μ i ) = μ i  And Y i has Normal distribution → GLM gives same answer as Classical Linear Regression* * Least squares and MLE equivalent for Normal dist.

12 Exponential Family of Distributions – Canonical Form

13 Some Math Rules: Refresher

14 Normal Distribution in Exponential Family

15 Normal Distribution in Exponential Family θb(θ)b(θ)

16 Poisson Distribution in Exponential Family θ

17 Poisson Distribution in Exponential Family

18 Compound Poisson Distribution  Y = C 1 + C 2 +... + C N  N is Poisson random variable  C i are i.i.d. with Gamma distribution  This is an example of a Tweedie distribution  Y is member of Exponential Family  Y = C 1 + C 2 +... + C N  N is Poisson random variable  C i are i.i.d. with Gamma distribution  This is an example of a Tweedie distribution  Y is member of Exponential Family

19 Members of the Exponential Family Normal Poisson Binomial Gamma Inverse Gaussian Compound Poisson Normal Poisson Binomial Gamma Inverse Gaussian Compound Poisson

20 Variance Structure  E[Y i ] = μ i = b' ( θ i ) → θ i = b' (-1) (μ i )  Var[ Y i ] = a(Φ i ) b' ' ( θ i ) = a(Φ i ) V(μ i )  Common form: Var[ Y i ] = Φ V(μ i )/w i Φ is constant across data but weights applied to data points  E[Y i ] = μ i = b' ( θ i ) → θ i = b' (-1) (μ i )  Var[ Y i ] = a(Φ i ) b' ' ( θ i ) = a(Φ i ) V(μ i )  Common form: Var[ Y i ] = Φ V(μ i )/w i Φ is constant across data but weights applied to data points

21 V(μ)  Normal  0  Poisson   Binomial  (1-  )  Tweedie  p, 1<p<2  Gamma  2  Inverse Gaussian  3  Recall: Var[ Y i ] = Φ V(μ i )/w i V(μ)  Normal  0  Poisson   Binomial  (1-  )  Tweedie  p, 1<p<2  Gamma  2  Inverse Gaussian  3  Recall: Var[ Y i ] = Φ V(μ i )/w i Variance Functions V(μ)

22 Variance at Point and Fit A Practioner’s Guide to Generalized Linear Models: A CAS Study Note

23 Variance of Y i and Fit at Data Point i  Var(Y i ) is big → looser fit at data point i  Var(Y i ) is small → tighter fit at data point i  Var(Y i ) is big → looser fit at data point i  Var(Y i ) is small → tighter fit at data point i

24 Why Exponential Family?  Distributions in Exponential Family can model a variety of problems  Standard algorithm for finding coefficients a 0, a 1, …, a m  Distributions in Exponential Family can model a variety of problems  Standard algorithm for finding coefficients a 0, a 1, …, a m

25 Modeling Number of Claims Number of PolicySexTerritoryClaims in 5 Years 1M020 2F010 3F 0 4F021 5F010 6F021 7M 2 8M 2 9M 1 10F011 ::::

26 Assume a Multiplicative Model  μ i = expected number of claims in five years  μ i = B F,01 x C Sex(i) x C Terr(i)  If i is Female and Terr 01 → μ i = B F,01 x 1.00 x 1.00  μ i = expected number of claims in five years  μ i = B F,01 x C Sex(i) x C Terr(i)  If i is Female and Terr 01 → μ i = B F,01 x 1.00 x 1.00

27 Multiplicative Model  μ i = exp(a 0 + a S X S(i) + a T X T(i) )  μ i = exp(a 0 ) x exp(a S X S(i) ) x exp( a T X T(i) )  i is Female → X S(i) = 0; Male → X S(i) = 1  i is Terr 01 → X T(i) = 0; Terr 02 → X T(i) = 1  μ i = exp(a 0 + a S X S(i) + a T X T(i) )  μ i = exp(a 0 ) x exp(a S X S(i) ) x exp( a T X T(i) )  i is Female → X S(i) = 0; Male → X S(i) = 1  i is Terr 01 → X T(i) = 0; Terr 02 → X T(i) = 1

28 Values of Predictor Variables Policy Sex X S(i) Territory X T(i) 1 M 1 02 1 2 F 0 01 0 3 F 0 0 4 F 0 02 1 5 F 0 01 0 6 F 0 02 1 7 M 1 1 8 M 1 1 9 M 1 1 10 F 0 01 0

29 Natural Log Link Function  ln(μ i ) = a 0 + a S X S(i) + a T X T(i)  μ i is in (0, ∞ )  ln(μ i ) is in ( - ∞, ∞ )  ln(μ i ) = a 0 + a S X S(i) + a T X T(i)  μ i is in (0, ∞ )  ln(μ i ) is in ( - ∞, ∞ )

30 Poisson Distribution in Exponential Family

31 Natural Log is Canonical Link for Poisson  θ i = ln ( μ i )  θ i = a 0 + a S X S(i) + a T X T(i)  θ i = ln ( μ i )  θ i = a 0 + a S X S(i) + a T X T(i)

32 Estimating Coefficients a 1, a 2,.., a m  Classical linear regression uses least squares  GLMs use Maximum Likelihood Method  Solution will exist for distributions in exponential family  Classical linear regression uses least squares  GLMs use Maximum Likelihood Method  Solution will exist for distributions in exponential family

33 Likelihood and Log Likelihood

34 Find a 0, a S, and a T for Poisson

35 Iterative Numerical Procedure to Find a i ’s  Use statistical package or actuarial software  Specify link function and distribution type  “Iterative weighted least squares” is the numerical method used  Use statistical package or actuarial software  Specify link function and distribution type  “Iterative weighted least squares” is the numerical method used

36 Solution to Our Example  a 0 = -.288 → exp(-.288) =.75  a S =.262 → exp(.262) = 1.3  a T =.095 → exp(.095) = 1.1  μ i = exp(a 0 ) x exp(a S X S(i) ) xexp( a T X T(i) )  μ i =.75 x 1.3 X S(i) x 1.1 X T(i)  i is Male,Terr 01 → μ i =.75 x 1.3 1 x 1.1 0  a 0 = -.288 → exp(-.288) =.75  a S =.262 → exp(.262) = 1.3  a T =.095 → exp(.095) = 1.1  μ i = exp(a 0 ) x exp(a S X S(i) ) xexp( a T X T(i) )  μ i =.75 x 1.3 X S(i) x 1.1 X T(i)  i is Male,Terr 01 → μ i =.75 x 1.3 1 x 1.1 0

37 Testing New Drug Treatment

38 Multiple Linear Regression

39 Logistic Regression Model  p = probability of cure, p in [0,1]  odds ratio: p/(1-p) in [0, + ∞ ]  ln[p/(1-p)] in [ - ∞, + ∞ ]  ln[p/(1-p)] = a + b 1 X 1 + b 2 X 2  p = probability of cure, p in [0,1]  odds ratio: p/(1-p) in [0, + ∞ ]  ln[p/(1-p)] in [ - ∞, + ∞ ]  ln[p/(1-p)] = a + b 1 X 1 + b 2 X 2 Link function

40 Logistic Regression Model

41 Which Exponential Family Distribution?  Frequency: Poisson, {Negative Binomial}  Severity: Gamma  Loss ratio: Compound Poisson  Pure Premium: Compound Poisson  How many policies will renew: Binomial  Frequency: Poisson, {Negative Binomial}  Severity: Gamma  Loss ratio: Compound Poisson  Pure Premium: Compound Poisson  How many policies will renew: Binomial

42 What link function?  Additive model: identity  Multiplicative model: natural log  Modeling probability of event: logistic  Additive model: identity  Multiplicative model: natural log  Modeling probability of event: logistic

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.

Similar presentations

Presentation on theme: "1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.

Similar presentations

Presentation on theme: "1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary."— Presentation transcript:

Similar presentations

About project

Feedback