Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.

Similar presentations


Presentation on theme: "1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary."— Presentation transcript:

1 1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University

2 2 Classical Multiple Linear Regression  Y i = a 0 + a 1 X i1 + a 2 X i2 …+ a m X im + e i  Y i are the response variables  X ij are predictors  i subscript denotes i th observation  j subscript identifies j th predictor  Y i = a 0 + a 1 X i1 + a 2 X i2 …+ a m X im + e i  Y i are the response variables  X ij are predictors  i subscript denotes i th observation  j subscript identifies j th predictor

3 3 One Predictor: Y i = a 0 + a 1 X i

4 4 Classical Multiple Linear Regression: Solving for a i

5 5 Classical Multiple Linear Regression  μ i = E[Y i ] = a 0 + a 1 X i1 + …+ a m X im  Y i is Normally distributed random variable with constant variance σ 2  Want to estimate μ i = E[Y i ] for each i  μ i = E[Y i ] = a 0 + a 1 X i1 + …+ a m X im  Y i is Normally distributed random variable with constant variance σ 2  Want to estimate μ i = E[Y i ] for each i

6 6

7 7 Problems with Traditional Model  Number of claims is discrete  Claim sizes are skewed to the right  Probability of an event is in [0,1]  Variance is not constant across data points i  Nonlinear relationship between X’s and Y’s  Number of claims is discrete  Claim sizes are skewed to the right  Probability of an event is in [0,1]  Variance is not constant across data points i  Nonlinear relationship between X’s and Y’s

8 8 Generalized Linear Models - GLMs  Fewer restrictions  Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.  Large and small policies can be put into one model  Y can be nonlinear function of X’s  Classical linear regression model is a special case  Fewer restrictions  Y can model number of claims, probability of renewing, loss severity, loss ratio, etc.  Large and small policies can be put into one model  Y can be nonlinear function of X’s  Classical linear regression model is a special case

9 9 Generalized Linear Models - GLMs  Same goal Predict : μ i = E[Y i ]  Same goal Predict : μ i = E[Y i ]

10 10 Generalized Linear Models - GLMs  g(μ i )= a 0 + a 1 X i1 + …+ a m X im  g( ) is the link function  E[Y i ] = μ i = g -1 (a 0 + a 1 X i1 + …+ a m X im )  Y i can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …  Variance can be modeled  g(μ i )= a 0 + a 1 X i1 + …+ a m X im  g( ) is the link function  E[Y i ] = μ i = g -1 (a 0 + a 1 X i1 + …+ a m X im )  Y i can be Normal, Poisson, Gamma, Binomial, Compound Poisson, …  Variance can be modeled

11 11 GLMs Extend Classical Linear Regression  If link function is identity: g(μ i ) = μ i  And Y i has Normal distribution → GLM gives same answer as Classical Linear Regression* * Least squares and MLE equivalent for Normal dist.  If link function is identity: g(μ i ) = μ i  And Y i has Normal distribution → GLM gives same answer as Classical Linear Regression* * Least squares and MLE equivalent for Normal dist.

12 12 Exponential Family of Distributions – Canonical Form

13 13 Some Math Rules: Refresher

14 14 Normal Distribution in Exponential Family

15 15 Normal Distribution in Exponential Family θb(θ)b(θ)

16 16 Poisson Distribution in Exponential Family θ

17 17 Poisson Distribution in Exponential Family

18 18 Compound Poisson Distribution  Y = C 1 + C 2 +... + C N  N is Poisson random variable  C i are i.i.d. with Gamma distribution  This is an example of a Tweedie distribution  Y is member of Exponential Family  Y = C 1 + C 2 +... + C N  N is Poisson random variable  C i are i.i.d. with Gamma distribution  This is an example of a Tweedie distribution  Y is member of Exponential Family

19 19 Members of the Exponential Family Normal Poisson Binomial Gamma Inverse Gaussian Compound Poisson Normal Poisson Binomial Gamma Inverse Gaussian Compound Poisson

20 20 Variance Structure  E[Y i ] = μ i = b' ( θ i ) → θ i = b' (-1) (μ i )  Var[ Y i ] = a(Φ i ) b' ' ( θ i ) = a(Φ i ) V(μ i )  Common form: Var[ Y i ] = Φ V(μ i )/w i Φ is constant across data but weights applied to data points  E[Y i ] = μ i = b' ( θ i ) → θ i = b' (-1) (μ i )  Var[ Y i ] = a(Φ i ) b' ' ( θ i ) = a(Φ i ) V(μ i )  Common form: Var[ Y i ] = Φ V(μ i )/w i Φ is constant across data but weights applied to data points

21 21 V(μ)  Normal  0  Poisson   Binomial  (1-  )  Tweedie  p, 1<p<2  Gamma  2  Inverse Gaussian  3  Recall: Var[ Y i ] = Φ V(μ i )/w i V(μ)  Normal  0  Poisson   Binomial  (1-  )  Tweedie  p, 1<p<2  Gamma  2  Inverse Gaussian  3  Recall: Var[ Y i ] = Φ V(μ i )/w i Variance Functions V(μ)

22 22 Variance at Point and Fit A Practioner’s Guide to Generalized Linear Models: A CAS Study Note

23 23 Variance of Y i and Fit at Data Point i  Var(Y i ) is big → looser fit at data point i  Var(Y i ) is small → tighter fit at data point i  Var(Y i ) is big → looser fit at data point i  Var(Y i ) is small → tighter fit at data point i

24 24 Why Exponential Family?  Distributions in Exponential Family can model a variety of problems  Standard algorithm for finding coefficients a 0, a 1, …, a m  Distributions in Exponential Family can model a variety of problems  Standard algorithm for finding coefficients a 0, a 1, …, a m

25 25 Modeling Number of Claims Number of PolicySexTerritoryClaims in 5 Years 1M020 2F010 3F 0 4F021 5F010 6F021 7M 2 8M 2 9M 1 10F011 ::::

26 26 Assume a Multiplicative Model  μ i = expected number of claims in five years  μ i = B F,01 x C Sex(i) x C Terr(i)  If i is Female and Terr 01 → μ i = B F,01 x 1.00 x 1.00  μ i = expected number of claims in five years  μ i = B F,01 x C Sex(i) x C Terr(i)  If i is Female and Terr 01 → μ i = B F,01 x 1.00 x 1.00

27 27 Multiplicative Model  μ i = exp(a 0 + a S X S(i) + a T X T(i) )  μ i = exp(a 0 ) x exp(a S X S(i) ) x exp( a T X T(i) )  i is Female → X S(i) = 0; Male → X S(i) = 1  i is Terr 01 → X T(i) = 0; Terr 02 → X T(i) = 1  μ i = exp(a 0 + a S X S(i) + a T X T(i) )  μ i = exp(a 0 ) x exp(a S X S(i) ) x exp( a T X T(i) )  i is Female → X S(i) = 0; Male → X S(i) = 1  i is Terr 01 → X T(i) = 0; Terr 02 → X T(i) = 1

28 28 Values of Predictor Variables Policy Sex X S(i) Territory X T(i) 1 M 1 02 1 2 F 0 01 0 3 F 0 0 4 F 0 02 1 5 F 0 01 0 6 F 0 02 1 7 M 1 1 8 M 1 1 9 M 1 1 10 F 0 01 0

29 29 Natural Log Link Function  ln(μ i ) = a 0 + a S X S(i) + a T X T(i)  μ i is in (0, ∞ )  ln(μ i ) is in ( - ∞, ∞ )  ln(μ i ) = a 0 + a S X S(i) + a T X T(i)  μ i is in (0, ∞ )  ln(μ i ) is in ( - ∞, ∞ )

30 30 Poisson Distribution in Exponential Family

31 31 Natural Log is Canonical Link for Poisson  θ i = ln ( μ i )  θ i = a 0 + a S X S(i) + a T X T(i)  θ i = ln ( μ i )  θ i = a 0 + a S X S(i) + a T X T(i)

32 32 Estimating Coefficients a 1, a 2,.., a m  Classical linear regression uses least squares  GLMs use Maximum Likelihood Method  Solution will exist for distributions in exponential family  Classical linear regression uses least squares  GLMs use Maximum Likelihood Method  Solution will exist for distributions in exponential family

33 33 Likelihood and Log Likelihood

34 34 Find a 0, a S, and a T for Poisson

35 35 Iterative Numerical Procedure to Find a i ’s  Use statistical package or actuarial software  Specify link function and distribution type  “Iterative weighted least squares” is the numerical method used  Use statistical package or actuarial software  Specify link function and distribution type  “Iterative weighted least squares” is the numerical method used

36 36 Solution to Our Example  a 0 = -.288 → exp(-.288) =.75  a S =.262 → exp(.262) = 1.3  a T =.095 → exp(.095) = 1.1  μ i = exp(a 0 ) x exp(a S X S(i) ) xexp( a T X T(i) )  μ i =.75 x 1.3 X S(i) x 1.1 X T(i)  i is Male,Terr 01 → μ i =.75 x 1.3 1 x 1.1 0  a 0 = -.288 → exp(-.288) =.75  a S =.262 → exp(.262) = 1.3  a T =.095 → exp(.095) = 1.1  μ i = exp(a 0 ) x exp(a S X S(i) ) xexp( a T X T(i) )  μ i =.75 x 1.3 X S(i) x 1.1 X T(i)  i is Male,Terr 01 → μ i =.75 x 1.3 1 x 1.1 0

37 37 Testing New Drug Treatment

38 38 Multiple Linear Regression

39 39 Logistic Regression Model  p = probability of cure, p in [0,1]  odds ratio: p/(1-p) in [0, + ∞ ]  ln[p/(1-p)] in [ - ∞, + ∞ ]  ln[p/(1-p)] = a + b 1 X 1 + b 2 X 2  p = probability of cure, p in [0,1]  odds ratio: p/(1-p) in [0, + ∞ ]  ln[p/(1-p)] in [ - ∞, + ∞ ]  ln[p/(1-p)] = a + b 1 X 1 + b 2 X 2 Link function

40 40 Logistic Regression Model

41 41 Which Exponential Family Distribution?  Frequency: Poisson, {Negative Binomial}  Severity: Gamma  Loss ratio: Compound Poisson  Pure Premium: Compound Poisson  How many policies will renew: Binomial  Frequency: Poisson, {Negative Binomial}  Severity: Gamma  Loss ratio: Compound Poisson  Pure Premium: Compound Poisson  How many policies will renew: Binomial

42 42 What link function?  Additive model: identity  Multiplicative model: natural log  Modeling probability of event: logistic  Additive model: identity  Multiplicative model: natural log  Modeling probability of event: logistic

43 43


Download ppt "1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary."

Similar presentations


Ads by Google