Presentation is loading. Please wait.

Presentation is loading. Please wait.

Risk Analysis & Modelling Lecture 4: The Aggregate Claims Distribution.

Similar presentations


Presentation on theme: "Risk Analysis & Modelling Lecture 4: The Aggregate Claims Distribution."— Presentation transcript:

1 Risk Analysis & Modelling Lecture 4: The Aggregate Claims Distribution

2 Course Website: ram.edu-sys.net Course Email: riskcoursehq@hotmail.com

3 Recap of Last Week In last weeks class we looked at two concepts central to the statistical modelling of Underwriting Risk: the Frequency and Severity of claims The Frequency Model describes the number of losses experienced by the insurer and in our model we used the Discrete Poisson Distribution The Severity Model describes the size of these losses and in our model we used the Continuous Pareto Distribution

4 Frequency-Severity Model Poisson Distribution How Many Claims? How Large Are Claims? Pareto Distribution

5 Modelling Frequency One risk an Insurance Company faces is that the number or frequency of claims is higher than average In last weeks class we saw how we can use the Poisson Distribution to simulate the behaviour of the frequency of losses about its average For example, for a class of policies sold there might be an average frequency of 9.5 claims per 100 years insured So the average frequency is 0.095 (9.5/100) claims per year per policy If an Insurance Company has sold 2,000 policies providing annual coverage then on average it expects to receive 190 (2000 * 0.095) claims over the next year (average frequency) Using the RandomPoisson function from last weeks class we can generate a sample for the number of losses experienced about this average….

6 Simulating Loss Frequency Risk Poisson Distribution Average of 190 claims per year. How does the frequency of losses fluctuate about its average? Using the RandomPoisson function to sample from the Poisson distribution we can simulate how the Frequency of losses will behave about the average of 190

7 Modelling Severity of Losses Although the insurer may know the average claim size there is a risk that the claims will be larger than average By assuming the size or severity of claims follows a Pareto distribution the fluctuations in size can be modelled using just the Average and Minimum Claim Size We could then generate a feasible sample of losses using the Inverse Transform Method For example, if we wanted to generate a sample of Pareto Distributed claims with an average of 4300 and minimum of 1000 we would use the formula ParetoICDF(rand(),4300,1000) If we wanted to sum 10 random claim severities each randomly generated from a Pareto Distribution with an average of 4300 and minimum of 1000 we would use the formula AggregateParetoLoss(4300,1000,10)

8 Simulating the Size of Losses Average loss is £4300 and the minimum is £1000. Generate a sample of possible losses from the Distribution. Pareto Distribution How will each claims fluctuate about their Average of £4300?

9 The Aggregate Loss Distribution The Severity Distribution (Pareto) describes the range of losses the insurer will experience on each individual claim Often the real risk for most Insurers is not that a single claim is greater than expected but the Total or Aggregate Claim across its Underwriting Portfolio is greater than expected The Total or Aggregate Loss experienced by the Insurance company depends on the number of claims AND the size of each of those claims Using the Monte Carlo method we can simulate a large range of values for the Aggregate Claim to directly observe the range of values it could take To make more sense of these large number of random simulated values we can sort them from the Smallest to the Largest Value……

10 Sorted Simulated Values Frequency Severity Model Smallest Aggregate Loss Largest Aggregate Loss Sorted Simulated Aggregate Loss

11 Empirical CDF The Monte Carlo Simulation gives us 1000s of possible scenarios, some larger than expected some smaller This output is much more useful if we sort it from the smallest to the largest From the sorted sample we can easily estimate how large the Aggregate Loss could become For example, if we have 1000 sorted simulated values we could easily find the 950 th largest of those sorted values and say that 95% of outcomes are less than or equal to this (the 95% Quantile) These sorted values provide a way of finding the probability of the Aggregate Claim being less than or equal to some value – they are essentially a Cumulative Distribution Function (CDF) Because this CDF is made from observations or empirical data it is called the Empirical CDF (ECDF) As we will see today the Empirical CDF has a lot of other uses…

12 Empirical CDF Rather than a mathematical function the Empirical CDF is made up of a large number of sorted observations which in this case is the output from a Frequency-Severity Monte Carlo Simulation 95% of the values in our sample of simulated values are less than 1180777 (95% Aggregate PML)

13 Exceedence Probability The CDF gives the Quanitile or the Probability of a random variable being less than or equal to some value From a risk measurement perspective it is often more natural to think of the probability of the outcome being greater than some value or the Exceedence Probability As we saw in last weeks class the Exceedence Probability is simply 1 minus the probability of the variable being less than or equal to some value If we graph out the Exceedence Probabilities for the various outcomes we obtain the EP Curve..

14 EP Curve for Aggregate Claims Only 10% of Losses are greater than 1005702 – this is the 10% EP Loss

15 Use of Aggregate Loss Distribution Example An Insurance Company has the opportunity to underwrite the portfolio of 2000 policies we just modelled If the company will only Underwrite this risk if the PML 95% on the entire portfolio is under £800,000 should they agree to sell this portfolio of policies? The Maximum Aggregate Loss its capital can sustain is £1 million - Using the EP Curve what is the probability of the Aggregate Loss on the Portfolio being greater than £1 million?

16 Applying Per Occurrence Limit Even though the Risk of Underwriting this Portfolio of Policies is too great the Insurer might be able to Underwrite part of the Risk One way to achieve this would be for the Insurer to place a Limit on the Size or Severity of each loss This Limit could simply be written into the Policy stating that there is a maximum loss the insurer will pay on any given claim Another way this limit could be introduced would be for the insurer to purchases Per Risk Excess of Loss Reinsurance For example, if a limit of £25,000 is set then a loss of £26,000 would mean the insurer will only pay the first £25,000 of this loss, while on a loss below the limit of £14,000 the insurer would pay the full £14,000

17 Effect of a Limit of £25000 on the Severity Distribution If a limit of £25000 is placed on the severity of each loss then all losses above £25000 are limited to £25000

18 Effect of Limit of EP Curve By Applying a Limit of £25,000 on each loss the EP Curve is shifted inwards and the PML 95% is greatly reduced. By experimenting with different limits it would be possible to see how high the limit could be set while maintaining an acceptable level of risk

19 Underwriting Profit and Loss Distribution The relationship between the Underwriting Profit and Loss and the Aggregate Claim level (excluding costs) is on business already written is: Where U is the Underwriting Profit (or Loss), P is the Premium Income (Earned) and C is the Aggregate Claims Level (Incurred) We will model this from the perspective of business already written – the Earned Premium Income is fixed or known By subtracting the Simulated Aggregate Claim from the known Premium we can obtain the Simulated Profit (or Loss)

20 By sorting these Simulated Profits we can obtain the Distribution for the Underwriting Profit When using the Underwriting Profit Distribution it is important to remember that the largest losses now occur and the lower end of the distribution For example, if we wanted to estimate the loss such with a 5% Exceedence Probability we would take the 5% Quantile (lower tail of the distribution)

21 Underwriting Profit and Loss Distribution The largest losses occur at the lower tail of the PL Distribution, 5% of the losses are less than (worse than) -£291665 5%

22 Return to VBA – The Subroutine In last weeks class we looked at how to make our own Functions in Excel Functions make a calculation like add two numbers or evaluate the CDF of the Pareto Distribution In this weeks class we will learn how to make our own Buttons and Subroutines Subroutines automate actions on a Spreadsheet like putting a number in Cell A1 or pressing F9 on the keyboard Functions go in a Module, Subroutines are attached to the Spreadsheet on which they do their “work” on If you combine Subroutines with Functions you can get Excel to do pretty much anything you would want…

23 The Cells Keyword Most Subroutines access the Cells on a spreadsheet In VBA the cells on a spreadsheet are accessed using the Cells function This function takes two parameters: Cells(Row, Column) For example, Cells(3,2) would access the cell at B3 (row 3, column 2) We can use the Cells keyword to either set a value in a cell or check the value in that cell

24 Two Cells Examples The following subroutine will place the value 4 in cell C2: Sub SetNumber() Cells(1,2) = 4 End Sub The following subroutine will check the value in cell A1 and if it is greater than or equal to 5 then it will add one to the value in B1: Sub CheckNumber() If Cells(1,1) > 5 Then Cells(1,2) = Cells(1,2) + 1 End If End Sub

25 The For loop One of the most important concepts in computer programming is the loop It basically tells the computer to do something a number of times, or do something until something happens The loop is central to the Monte Carlo simulation because it allows us to instruct the computer to run the simulation thousands of times On this course we will use the For loop which instructs the computer to perform a task for every number (integer) in a range

26 Loop Examples The following subroutine will recalculate the spread sheet 100 times Sub RecalculateLoop() For i = 1 to 100 Application.ActiveSheet.Calculate Next i End Sub The following subroutine will put the numbers from 1 to 1000 in column A, starting at cell A1 Sub PutNumbersLoop() For i = 1 to 1000 Cells(i,1) = i Next i End Sub

27 Monte Carlo Loop The following subroutine will press F9 1000 times and copy the resulting Simulated Profit and Loss into Column B Sub RunMonteCarlo() For i = 1 To 1000 Application.ActiveSheet.Calculate Cells(16+i,2) = Cells(12,2) Next i End Sub Recalculate the worksheet to rerun the Monte Carlo Simulation Copy the value in B12 into Column B Starting at B17 Loop 100 times

28 Severity Distribution 2: The Gamma Distribution The Gamma Distribution is another distribution widely used in the modelling of Claim Severities It is generally used for losses which are high in frequency and low in Seveity (Attritional Losses) Unlike the Pareto Distribution it cannot be used to model Catastrophic or Extreme Losses It is a flexible distribution, that can fit a wide variety of loss patterns It is frequently used to model the claims experienced by Motor Insurers The Gamma Distribution is related to the Exponential Distribution in that the sum Exponential Distributed random variables results in a Gamma Distributed random variable It’s PDF and CDF formula are complex mathematically….

29 Gamma Distribution Formula The formula for the CDF for the gamma function is: Where  is the incomplete gamma function and  is the gamma function, these are special mathematical functions (see Appendix)  is called the shape parameter,  is the scale parameter and M is the minimum value for the gamma random variable The PDF of the Gamma distribution is:

30 The Average (  ) and Variance (  2 ) of a Gamma distributed random variable can be calculated as follows: These formula can be inverted to get the shape and scale in terms of the average, variance and minimum:

31 Gamma CDF and Inverse CDF in VBA Excel has limited built in support for the Gamma function, we will create our own functions in VBA: Public Function GammaCDF(X, Average, Variance, Min) ShapeParam = (Average - Min) ^ 2 / Variance ScaleParam = Variance / (Average - Min) GammaCDF = Application.WorksheetFunction.GammaDist(X - Min, ShapeParam, ScaleParam, True) End Function Public Function GammaICDF(Probability, Average, Variance, Min) ShapeParam = (Average - Min) ^ 2 / Variance ScaleParam = Variance / (Average - Min) GammaICDF = Application.WorksheetFunction.GammaInv(Probability, ShapeParam, ScaleParam) + Min End Function

32 Gamma CDF and PDF where  =7   =6  and Min=3 =GammaCDF(6,7,6,3) =GammaICDF(0.95,7,6,3)

33 Gamma Review Question A Gamma Distributed Loss has an Average of 250, Variance of 1000 and Minimum of 200 Calculate the Probability the loss will be less than or equal to 350 (GammaCDF) Calculate PML 95% for this loss (GammaICDF) Calculate the 1% EP Loss (GammaICDF) Simulate the SUM of 5 losses from this distribution (rand and GammaICDF)

34 Deciding the Best Fit Distribution Using the Empirical CDF The Empirical CDF is also commonly used to help decide which distribution best fits the data Just as we sorted the output from a Monte Carlo Simulation we can sort actual data, such as historical claim data to obtain its Empirical Distribution We can then compare the fit provided by a Distribution such as the Pareto or Gamma to the data’s Empirical CDF This could be used to decided which Severity distribution to use in your model Another visual method is known as the Quantile- Quantile or Q-Q plot Numerical Methods such as Log-Likelihood, Chi- Squared and Kolmogorov-Smirnov can also be used to test the degree of fit (See Appendix)

35 Review Question On the worksheet “Which Distribution to Use” compare the fit of the Pareto and Gamma distribution to the sorted loss data Firstly calculate the average and variance of the data required to fit the distributions The probability or proportion of outcomes less than a value X for a Pareto Distributed Random Variable is =ParetoCDF(X,Average,Minimum) The probability or proportion of outcomes less than a value X for a Gamma Distributed Random Variable is =GammaCDF(X,Average,Variance,Minimum) For each sorted value calculate the proportion of observations the Pareto and Gamma distributions predict will be equal to or less than that value, and compare it to the observed proportions

36 Comparing the Fit of Gamma and Pareto From visual inspection we can see that the Gamma Distribution provides a much closer fit to the distribution of the dataset (Empirical CDF)

37 Alternatives to the Poisson Distribution Although the Poisson Distribution is by far the most widely used distribution to model the Frequency of losses there are alternatives The Poisson Distribution has very specific properties which might not always be suitable for modelling the number of claims One important property of a Poisson Distributed random variable is it has no upper limit and assumes there can be multiple claims per policy (which might not always be true - such as for Life Insurance) The mean and the variance of a Poisson Random Variable are equal, this is not always observed in actual claims data Two common alternatives to the Poisson Distribution are the Binomial and Negative Binomial The Binomial is commonly used to model the number of losses on a Portfolio of Life Insurance and the number of claims is limited by the number of policies The Negative Binomial is used to model the frequency of losses when the risk of the policies in the portfolio are Heterogeneous or different and can be viewed as a mixture of Poisson Distributions

38 Monte Carlo Frequency Severity Poisson  =  2 Binomial  >  2 Negative Binomial  <  2 Frequency Models Severity Models EVT Pareto Gamma

39 The Normal Distribution We will now look at the most important Distribution in Statistics and see how it can be used to radically simplify the Frequency-Severity Model (sometimes!) The Normal Distribution or Bell Curve was first introduced by the mathematician Abraham de Moivre in 1733 The Normal Distribution is frequently observed in the real world: returns on stock markets, aggregate levels of claims on certain classes of insurance, measurement errors in an experiment, individuals heights etc It’s occurrence in the World about us is explained by the remarkable Central Limit Theorem (CLT) which states that the average or sum of a large number of relatively small independent random variables will be Normally Distributed….

40 Normal Distribution PDF and CDF Normally distributed random variables are unbounded and can take on any value between minus and plus infinity The PDF (Probability Density Function) for the Normal Distribution is defined as: Where  is the mean or average of the random variable and  is the standard deviation The CDF of the normal distribution does not have a formula and must be evaluated numerically, but can be written as:

41 Normal CDF & PDF Where  = 0 and  = 2 Notice the Normal Distribution is symmetrical about its mean (ie Skew is zero)

42 NORMDIST and NORMINV The PDF and CDF for the Normal Distribution can be calculated in Excel using the Built-in NORMDIST function To calculate the PDF for a value X we use the formula: =NORMDIST(X, , ,FALSE) To calculate the CDF for a value X we use the formula: =NORMDIST(X, , ,TRUE) For example, if we wanted to calculate the probability that a normally distributed random variable with a mean of 2 and a standard deviation of 4 is less than 3 (CDF at 3): =NORMDIST(3,2,4,TRUE)

43 The Inverse of the Normal CDF is calculated using the Built-in NORMINV: =NORMINV(P, ,  ) Where P is the probability of the random variable being less than the level For example, if we want to calculate the level such that a Normally Distributed random variable with a mean of 2 and a standard deviation of 4 will be less than or equal to 5% (0.05) of the time: =NORMINV(0.05, ,  )

44 NORMSINV AND NORMDIST NORMINV(0.95,0,1) = 1.644 0.95 NORMDIST(0.5,0,1,TRUE) = 0.691 0.5 CDF of Standard Normal (Mean 0 and Std Dev 1)

45 Normally Distributed Random Numbers  =0,  =1 0 1 1 0.91 The computer generates a uniform random number 0.91 using rand() We use the inverse of the normal CDF function NORMINV The transformed random variable 1.34 is normally distributed NORMINV(0.91,0,1)=1.34 =NORMINV(rand(),0,1)

46 Normal Distribution Review Questions If the Aggregate Claim Distribution is Normally Distributed with  and   Using NormINV calculate the 99% PML? Using NormDIST calculate the probability of the Aggregate Claim being less than or equal to 250000? Using NormInv and Rand generate a random sample for this Aggregate Claim

47 Central Limit Theory Experiment The Normal Distribution is something that occurs in nature – it is not something that just exists in statistics text books! The reason we observe the Normal Distribution in the World about us is due to the Central Limit Theorem (CLT) which states the remarkable fact that if a random variable is the sum of a large number of other independent random variables:

48 Then the distribution of the sum Y will be normal regardless of the distributions of the individual X’s We notice that a portfolio is essentially the sum of the “random” assets and liabilities it contains We will not prove the Central Limit Theorem mathematically – instead we will perform an experiment using the Monte Carlo Method We will create a random variable made up from the average of 8 highly non-normal, uniformly distributed random variables:

49 Empirical CDF vs Normal CDF Fit The Empirical CDF for the sum is a very close match for the Normal Distribution

50 Our Experiment ++ + + !

51 Understanding the Central Limit Theorem To understand what is happening intuitively in the Central Limit Theorem (CLT) we have to look at what happens to the Moments of a sum of independent random variables The first Moment is the Average, if we sum N independent random variable each with average  then the average of the sum (  s ) will be: We see that the Average of the sum increases at a linear rate (N) The second moment is the Standard Deviation, if we sum N independent random variables each with standard deviation  then the standard deviation of their sum  s is:

52 It is interesting to note that if we look at the relative size of the risk or standard deviation to the average (Coefficient of Variance): We observe that as N increases the relative level of Risk declines The third moment is the Skew which measures the degree of asymmetry of a random variable, if we sum N independent random variables each with skew  then the skew of their sum  s is: We note that the skew actually decreases for the sum and approaches zero as N increases – the distribution of the sum will become symmetric like The Normal Distribution We could proceed to show that the higher order moments (such as the Kurtosis) converge to the moments of the Normal Distribution at an even faster rate Generally speaking if two random variables have the same Moments they have the same Distribution (assuming they have an MGF or Moment Generating Function)

53 Alternative to Estimating the Aggregate Claims Distribution Using the Normal Distribution The Aggregate Claim Level is often the sum of a large number of claims on individual policies Does the Central Limit Theorem suggest that the Aggregate Claims Distribution is Normal? Lets test our earlier Aggregate Claims distribution where each individual loss was Gamma Distributed

54 Empirical vs Normal CDF for Aggregate Claims Simulation Near Perfect Match Between Empirical and Normal CDF

55 Normal Approximation of the Aggregate Claims Distribution If the Central Limit Theory holds then all we need to know is the Mean and Variance of the Aggregate Claims Level This can be calculated using the following formula: Where  is the average size of each individual claim,  2 is the variance of each individual claim and is the average number of claims over the period Using this formula we can fit the normal distribution to the Aggregate Claims Distribution These formula are derived in the Appendix

56 Review Question Using the Normal Approximation formula calculate the Mean and Variance of the Annual Aggregate Claims Distribution from the “Aggregate Claims Simulation” sheet Information: Average Annual Frequency Variance Claim Severity Average Claim Severity

57 Answer Using the Normal Approximation Formula:

58 What the Central Limit Tells Us About the Aggregate Claims Distribution The Central Limit Theorem makes two very important statements about the nature of Underwriting Risk Firstly that the Aggregate Loss Distribution across a portfolio of policies should be Normal and therefore we only need to know the Mean and Variance of the individual losses – the exact nature of their distribution is not important Secondly that the Average Aggregate Claim will grow faster than the Standard Deviation of the Aggregate Claim and eventually the Underwriting Risk will become insignificant - the ratio of the Standard Deviation to Average (Coefficient of Variation) will converge to zero – this is essentially the Law of Large Numbers

59 Central Limit Theorem and Underwriting Risk The Distribution of individual losses does not matter because their sum will be Normal. All we need to know is the Average and Variance of each loss Individual Loss Aggregate Loss As the size of the Portfolio and the number of claims increases the standard deviation or risk becomes insignificant relative to the size of the average loss – the Law of Large Numbers

60 Central Limit Theorem Does Not Apply to Aggregate Loss in our Pareto-Poisson Simulation! The Normal Distribution No Longer Fits the Aggregate Claims Distribution, The Central Limit Theorem does not apply!!

61 When the Central Limit Theory Applies If the Central Limit Theory always applied to the Aggregate Claims Distribution Underwriting Insurance would be a lot less risky! In general, the Central Limit Theory applies when there are a large number of claims, each of these claims is independent (uncorrelated) and the size of each claim is a small proportion of the Aggregate or Total Claim These types of claims are known as Attritional Claims Our Gamma Distribution modelled Attritional Claims The Pareto Distribution we used in our first simulation had a shape of 1.3 which suggests each loss has the potential to be Catastrophic in size – a single loss has the potential to be a large proportion of the total Mathematically the reason this Pareto Distribution will not converge to Normal is because the shape is less than 2 and therefore the Variance and Standard Deviation are Infinite…

62 Combining Attritional & Catastrophic Claims We have seen that the Central Limit Theorem has the potential to greatly simplify the Frequency-Severity Model The Central Limit Theorem will apply to Attritional Claims which are high in Frequency and low in Severity For losses which have the potential to be Catastrophic in size – such as losses described by our Pareto Distribution the Normal Distribution is unlikely to provide a good estimate for the Aggregate Loss To estimate the Aggregate Loss Distribution of these CAT losses we need to use a Monte Carlo Simulation A popular and efficient modelling technique is to use the Normal Distribution to model the Aggregate of Attritional Claims and to use the Monte Carlo Method for the Aggregate Loss of Larger Catastrophic Claims (or Atypical Losses)

63 Splitting up the Data into Attritional and Large Claims Severity Attritional Losses: Mean & Variance with Central Limit Theory (CLT) Severity Larger Losses: Pareto Distribution & Monte Carlo Frequency All Claims Frequency Attritional Claims Frequency CAT/Atypical Claims

64 Modelling Attritional & Large Claims + Aggregate Attritional Claims are subject to the CLT and are modelled directly using the Normal Large Atypical or Catastrophic claims modelled with the Frequency-Severity model Poisson Pareto

65 Appendix: The Log Normal The Log Normal Distribution is widely used in actuarial science It is very closely related to the Normal Distribution and we will look at it in more detail in a later class Like the Normal can be entirely determined by the mean and variance Unlike the Normal distribution it is asymmetric or skewed In Excel we could find the probability of a log normal random with an average of 10, and a variance of 5 being less than 11 using =LogNormalCDF(11,10,5) It is sometimes used in place of the Normal Distribution to fit the Aggregate Claims Distribution when there is Positive Skew

66 Log Normal vs Normal Density Log-Normal Density Normal Density The Normal Density is symmetric The Normal Density is asymmetric Or Skewed

67 Empirical vs LogNormal The Aggregate Loss Distribution is Slightly Skewed so the Log Normal Distribution Provides a Better Fit

68 Appendix Alternative Frequency Model: Negative Binomial An alternative to modelling the frequency of claims for Non- Life Insurance is the Negative Binomial Distribution It is used when the variance frequency is higher than the average frequency (over dispersed) Like the Poisson Distribution the Sum of Negative Binomials is Negative Binomial For this distribution we have to specify both the mean and the variance of the frequency For example, the probability that the frequency is 1 given that the average frequency is 0.12 and the variance of the frequency is 0.13 would be =NegBinomial(1, 0.12, 0.13) If we compare the fit provided by the Negative Binomial to the Poisson we can see that it provides a better fit, this is because the frequency of claims is over-dispersed (variance higher than average)

69 Comparing Poisson and Negative Binomial Average Frequency Per Policy: 0.123 Variance Frequency Per Policy: 0.128 The Negative Binomial Distribution with an average of 0.123 and Variance of 0.128 predicts that 0.806% of claims will have a frequency of 2 =NegBinomial(2,0.123,0.128) The Negative Binomial Provides a better match to the Empirical Observations Variance > Average : Over Dispersion!

70 Appendix: Generating Negative Binomial Random Variables If a random variable is sampled from a Poisson Distribution whose Average is itself a random variable sampled from a Gamma Distribution whose shape (  ) can scale (  ) are equal to: Then the resulting random variable will have a Negative Binomial Distribution with Average A and Variance V. This method is implemented in the RandomNegativeBinomial function on this weeks Spreadsheet The Negative Binomial is a “Mixture” of Poisson Distributed Random Variables

71 Appendix: The Chi-Square Test For discrete distributions such as the Poisson Distribution the Chi-Square Test is a popular Goodness of Fit Test The test is based on the observation that if a sample of N Observations (O i ) is taken from some underlying distribution with some expected number of observations (E i ) then the following statistic should have a Chi-Square Distribution with N-1 degrees of freedom: The larger this statistic the larger the disparity between the Observed and Expected outcomes

72 Using the fact this statistic follows a Chi-Square distribution we can use this statistic to calculate the probability of having observed the disparity between the observed and expected distributions purely by chance – this is known as the “p-value” The lower the p value the less likely it is that these observations were sampled from the hypothetical underlying distribution

73 Chi-Squared Diagram The p value – what is the probability of observing a chi-square statistic of this size or greater just by chance assuming the underlying distribution is Poisson?

74 Appendix :Kolmogorov-Smirnov Test For continuous distributions such as the Pareto Distribution the Kolmogorov-Smirnov is a popular Goodness of Fit Test It is based on the observation that the Maximum difference between an underlying hypothetical CDF and an Empirical CDF sampled from that hypothetical distribution follows the Kolmogorov Distribution The Kolmogorov-Smirnov (K) statistic is defined as: Where D is the maximum difference between the Empirical and Hypothetical underlying distribution

75 It is possible to calculate a p-value from this statistic using the Kolmogorov Distribution It is also possible to compare the Kolmogorov-Statistic to critical values For example, if the Empirical Distribution has been calculated from a reasonably large sample (over 35) taken from the hypothetical distribution the Kolmogorov-Smirnov Statistic will only be greater than 1.358 5% of the time and will only be greater than 1.628 1% of the time These critical values can be used to see if it is reasonable to assume that the hypothetical distribution describes the data If the hypothetical distribution has be fitted using the data from the Empirical Distribution an adjustment needs to be made to the confidence levels and a Statistics Package should be used The Anderson-Darling and Cramer-Von Mises test are closely related to the Kolmogorov-Smirnov test, however are based about a weighted average of the square difference between all values not just the maximum difference

76 Kolmogorov-Smirnov Test The Kolmogorov-Smirnov test is calculated by taking the maximum difference between the Empirical and Hypothetical distribution and comparing this to critical values taken from the Kolmogorov Distribution

77 Appendix: Normal Approximation Formula A compound Poisson Random Variable (S) is made up of the sum of a random number of (N) random sum of random variables (X) The Law of Total Expectation tells us that for this random variable: Where E(S | i) is the average value of the sum given that there is a frequency of i and p i is the probability that the frequency is i, for example E(S|2) would be the average value of S if the frequency happens to be 2 and p 2 the probability the frequency is 2

78 It is very simple to see that: Where  is the average claim size for example if the frequency is known to be 2 then E(S|2) = 2.  Where is the average frequency For the variance we can derive

79 We can simplify this: This last term is infact the variance of the frequency which for a Poisson distribution is the same as its average frequency Substituting back:

80 Appendix: Pareto Variance The Variance of the Pareto Distribution is defined by the following integral:

81 When  is greater than 2 this integral converges to otherwise the variance is infinite


Download ppt "Risk Analysis & Modelling Lecture 4: The Aggregate Claims Distribution."

Similar presentations


Ads by Google