Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven.

Similar presentations


Presentation on theme: "1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven."— Presentation transcript:

1 1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven and Physically-based Models for Characterization of Processes in Hydrology, Hydraulics, Oceanography and Climate Change January 6-28, 2008 IMS, Singapore Modeling Extreme Events Data

2 2 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

3 3 References Beirlant Jan, Yuri Goegebeur, Johan Segers and Jozef Teugels (2004), Statistics of Extremes: Theory and Applications, NewYork: John Wiley & Sons. Castillo, E. and Hadi, A. S. (1994), Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution, Environmetrics, 5, 417– 432. Castillo, E. and Hadi, A. S. (1995), A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables, Computational Statistics and Data Analysis, 20, 421–439.

4 4 References Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J. M. (2006), Extreme Value and Related Models in Engineering and Science Applications, New York: John Wiley & Sons. Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, England. El-Shaarawi, A. H., and Hadi, A. S.,Modified Likelihood Function for Parameter and Quantile Estimation, Work in progress. Nadarajah, S. and El-Shaarawi, A. H. (2006). On the Ratios for Extreme Value Distributions with Applications to Rainfall Modeling. Environmetrics Kotz, S. and Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications. London:Imperial College Press.

5 5 Software: S-plus & R Stuart Coles S-plus package available at URL:http://www.math.lancs.ac.uk./~coless URL:http://www.math.lancs.ac.uk./~coless extRemes R package available at

6 6 Examples of Extreme Events Data In many statistical applications, the interest is centered on estimating some population characteristics based on random samples taken from a population under study. For example, we wish to estimate: the average rainfall, the average temperature, the median income, … etc.

7 7 Examples of Extreme Events Data In other areas of applications, we are not interested in estimating the average but rather in estimating the maximum or the minimum. 1. Ocean Engineering: In the design of offshore platforms, breakwaters, dikes and other harbor works, engineers rely upon the knowledge of the probability distribution of the maximum, not the average wave height. Some Examples:

8 8 Examples of Extreme Events Data 2.Structural Engineering: Modern building codes and standards require: Estimation of extreme wind speeds and their recurrence intervals during the lifetime of the building. Knowledge of the largest loads acting on the structure during its lifetime. Seismic incidence: the maximum earthquake intensity during the lifetime of the building.

9 9 Examples of Extreme Events Data 3.Designing Dams: Engineers would not be interested in the probability distribution of the average flood, but in the maximum floods. 4.Agriculture: Farmers would be interested in both the minimum and maximum rain fall (drought versus flooding). 5.Insurance companies would be interested in the maximum insurance claims.

10 10 Examples of Extreme Events Data 6.Pollution Control: The pollution of air and water has become a common problem in many countries due to large concentrations of people, traffic, and industries (producing smoke, human, chemical, nuclear wastes, etc.). Government regulations, require pollution indices to remain below a given critical level. Thus, the regulations are satisfied if, and only if, the largest pollution concentration during the period of interest is less than the critical level.

11 11 Nile meter

12 12 U.S. Bureau of the census, Watson and Pauly (2002) Living resources: food security

13 13 Niagara River Fraser River

14 14 Upstream-Downstream Water Quality Monitoring Human and Ecosystem Health: Regulations and Control

15 15 Time Plots: Fraser Hope

16 16 Evolution of the Flow along the Fraser River Hansard/Red Pass

17 17 Max of log (Flow) at Hope

18 18 Some Results for Max (Hope)

19 19 Yearly maximum significant wave-height data Two More Example: wave-height & Temperature (Basel)

20 20 Two Stations: Ratio of GEV Distributions W=X/(X+Y)

21 21

22 22

23 23 Seoul Rainfall Data

24 24 Microbiological Regulations (Human health)

25 25 Approximate expression for probability of compliance with the regulations

26 26 Sample size n=5 and 10 # of simulations =10000

27 27 Ratio of single sample rejection probability to that of the mean rule (n = 5,10 and 20)

28 28 The Temperature Data: Change-Point

29 29 Relative Likelihood Function for the Change Point

30 30 Relative Likelihood function for the Change Point (Temp. Data)

31 31 Q-Q plots for the two segements

32 32 Return Levels

33 33 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

34 34 Types of Extreme Events Data The choice of model and estimation methods depends on the type of available data. Data, x 1, x 2, …, x n, drawn from a possibly unknown population, are available. We wish to: 1. Find an appropriate parametric model, F ( x ;  ), that fits the data reasonably well 2. Estimate the parameters,  and quantiles, X ( p ), of such a model

35 35 Types of Extreme Events Data Examples: 1.Complete Data: All n observations are available. Daily/Monthly energy consumption Daily/Monthly rain fall, stream discharge or flood flow

36 36 Types of Extreme Events Data Examples: 2.Maxima/Minima: Only maxima or minima are available. Maximum/minimum daily/monthly temperatures Maximum daily/monthly wave heights Maximum daily/monthly wind speeds, pollution concentrations, etc.

37 37 Types of Extreme Events Data 3.Exceedances over/under a threshold: When using yearly maxima (minima), then an important part of the information large (small) values (other than the two extremes occurring the same year) is lost. The alternative is to use the exceedances over (under) a given threshold.

38 38 Exceedances Over/Under a Threshold We are interested in events that cause failure such as exceedances of a random variable over a threshold value. For example, waves can destroy a breakwater when their heights exceed a given value, say 9 meters. Then it does not matter whether the height of a wave is 9.5, 10 or 12 meters because the consequences of these events are similar.

39 39 Exceedances Over/Under a Threshold So, only failure causing observations exceeding a given threshold are available. Definition: Let X be a random variable and u be a given threshold value. The event { X = x } is said to be an exceedance at the level u if X > u.

40 40 Summary: Types of Data Extreme events data come in one of three types: 1. Complete observations, 2. Maxima/Minima, or 3. Exceedances over/under a threshold value

41 41 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

42 42 Commonly Used Models for Extremes The choice of model depends on the type of available data: Distributions of Order Statistics (DOS): Used when we have complete data Generalized Extreme Value (GEV) Distribution (AKA: Von Mises Family): Used for maxima/minima type of data Generalized Pareto Distribution (GPD): Used for exceedances over/under threshold type of data

43 43 Distributions of Order Statistics Let X 1, X 2, …, X n be a sample of size n from a possibly unknown cdf F ( x;  ), depending on unknown vector-valued parameter . Let X 1:n < X 2:n < … < X n:n be the corresponding order statistics. X i:n is called the ith order statistic. Of particular interest is the minimum, X 1:n, and the maximum, X n:n order statistics.

44 44 Distributions of Order Statistics The distributions of the the order statistics are well know. For example: The cdf of the maximum order statistics is: The cdf of the minimum order statistics is:

45 45 Problems with Distributions of OS The distributions of the order statistics have the following practical problems: 1. The cdf of the parent population, F ( x;  ), is usually unknown 2. When the data consist only of maxima or minima, the sample sizes are usually unknown

46 46 Non-Degenerate Limiting Distributions The answer to the above problem is: Theorem: 1. The only non-degenerate cdf family satisfying (1) is the Maximal Generalized Extreme Value Distribution (GEV M ). 2. The only non-degenerate cdf family satisfying (2) is the Minimal Generalized Extreme Value Distribution (GEV m ).

47 47 Generalized Extreme Value Distributions Thus, there are two GEV distributions, one maximal, GEV M, and one minimal, GEV m. The GEV (AKA, Von Mises) distributions were introduced by Jenkinson (1955). They are used when we have a large sample or the observations themselves are either minima or maxima. Their cdf are given later.

48 48 Generalized Extreme Value Distributions The GEV distributions are now widely used to model extremes of natural and environmental data. Examples are found in: Flood Studies Report of the USA’s Natural Environment Research Council (1975) Several articles in Tiago de Oliveira (1984) Hosking, Wallis, and Wood (1985) Castillo et al. (2006)

49 49 Maximal Generalized Extreme Value The cumulative distribution function (cdf) of the maximal GEV M distribution is:

50 50 Minimal Generalized Extreme Value The cumulative distribution function (cdf) of the minimal GEV m distribution is:

51 51 Relationship Between GEV M and GEV m Theorem: If the cdf of X is L (, ,  ), then the cdf of Y =  X is H ( , ,  ). Implication: One form of the cdf can be obtained from the other.

52 52 Maximal Generalized Extreme Value The GEV M family has three-parameters: is a location parameter  is a scale parameter (  > 0)  is a shape parameter The parameter  is the most important of the three. The pth quantile is (0 < p < 1):

53 53 Special Cases of the Maximal GEV The family of GEV M has three special cases: 1.The Maximal Weibull distribution is obtained when  > 0. Its cdf is:

54 54 Special Cases of the Maximal GEV 2.The Maximal Gumbel distribution is obtained when  = 0. Its cdf is:

55 55 Special Cases of the Maximal GEV 3.The Maximal Frechet distribution is obtained when  < 0. Its cdf is:

56 56 Weibull, Gumbel, and Frechet Weibull and Frechet converge to Gumbel

57 57 Summary The GEV family can be used when: 1. The cdf of the parent population, F ( x;  ), is unknown 2. The sample size is very large (no degeneracy problems) 3. The data consist only of maxima or minima (we do not need to know the sample sizes)

58 58 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

59 59 Types of Extreme Events Data Recall the three types of extreme events data: 1. Complete Data: All n observations are available. 2. Maxima/Minima: Only maxima or minima are available 3. Exceedances over/under a threshold: Only observations exceeding a given threshold are available Use distributions of order statistics if we know F(x) and n is not too large; else, use GEV. Use GPD. Use GEV.

60 60 Exceedances Over/Under a Threshold As mentioned earlier, we are interested in events that cause failure such as exceedances of a random variable over a threshold value. The differences between the actual values and the threshold value are called exceedances over/under the threshold.

61 61 Generalized Maximal Pareto Distributions Pickands (1975) demonstrates that when the threshold tends to the upper end of the random variable, the exceedances follow a generalized Pareto distribution, GPD M ( ,  ), with cdf

62 62 Generalized Maximal Pareto Distribution The GPD M family has a two-parameters:  is a scale parameter (  > 0)  is a shape parameter The pth quantile is (0 < p < 1): Note that when

63 63 Special Cases of the Maximal GPD The GPD M has three special cases: 1. When  = 0, the GPD M reduces to the Exponential distribution with mean . 2. When  = 1, the GPD M reduces to the Uniform U(0,  ). 3. When  < 0, the GPD M becomes the Pareto distribution.

64 64 Generalized Minimal Pareto Distribution A similar family exists for the case of exceedances under a threshold. These are called the the Generalized Minimal Pareto distributions or the Reversed Generalized Pareto distributions.

65 65 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

66 66 Parameter and Quantile Estimation Available estimation methods include: 1. The maximum likelihood (MLE): Jenkinson (1969) Prescott and Walden (1980, 1983) Smith (1984, 1985) 2.The method of moments (MOM)

67 67 Parameter and Quantile Estimation 3. The probability weighted moments (PWM): Greenwood et al. (1979),Hosking et al. (1985) 4.The Elemental Percentile method (EPM): Castillo and Hadi (1995) 5.Order Statistics (Least Squares): El-Shaarawi 5.Modified Likelihood Function (MLF): El- Shaarawi and Hadi (work in progress).

68 68 Problems With Traditional Estimators Traditional methods of estimation (MLE and the moments-based methods) have problems because: The range of the distribution depends on the parameters: x 0 x > +  / ,for  > 0 So, MLE do not have the usual asymptotic properties.

69 69 Problems With Traditional Estimators The MLE requires numerical solutions. For some samples, the likelihood may not have a local maximum. For  > 1, the MLE do not exist (the likelihood can be made infinite).

70 70 Problems With Traditional Estimators When  <  1, the mean and higher moments do not exist. So, MOM and PWM do not exist when  <  1. The PWM estimators are good for cases where –0.5 <  < 0.5. Outside this range of , the PWM estimates may not exist, and if they do exist their performance worsens as  increases.

71 71 Recently Proposed Estimation Methods 4.The Elemental Percentile method (EPM): Castillo and Hadi (1995) 5.Modified Likelihood Function (MLF): El-Shaarawi and Hadi (work in progress). This leaves us with two recently proposed methods for estimating the parameters and quantiles of the extreme models:

72 72 Elemental Percentile method (EPM) 1. Initial estimates are obtained by equating three distinct order statistics to their corresponding percentiles:

73 73 Elemental Percentile method (EPM) 2. Substitute the cdf of the GEV M, we obtain: These are three equations in three unknowns:, , and .

74 74 Elemental Percentile method (EPM) To solve these equations, we eliminate and , and obtain: where Solving this equation for  by the bisection method, we obtain an initial estimate

75 75 Elemental Percentile method (EPM) Substituting in two of the above equations and solve for and  :

76 76 Elemental Percentile method (EPM) Theorem: The initial estimates are asymptotically normal and consistent. Final estimates of, , and  are obtained by combining all possible triplets and obtain efficient estimates using a suitable function such as the trimmed mean.

77 77 The Modified Likelihood Function (MLF) The MLF method can be thought of as a marriage between the maximum likelihood method and the method of moments. The ideas behind the method are: 1. The log likelihood function is:

78 78 The Modified Likelihood Function (MLF) 2. The modified likelihood: A Taylor series expansion of around gives

79 79 The Modified Likelihood Function (MLF) 3. Let where are plotting positions. 4. Substitute these in the modified likelihood and solve for .

80 80 The Modified Likelihood Function (MLF) We think this will be a happy marriage, but to be sure we are: Investigating (analytically and using simulation) the properties of the proposed estimators and their dependence on the choice of the plotting positions p i:n. This is still work in progress.

81 81 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

82 82 Summary The choice of models for extremes depends on the type of data available: 1. Complete Data: All n observations are available. 2. Maxima/Minima: Only maxima or minima are available 3. Exceedances over/under a threshold: Only observations exceeding a given threshold are available Use GPD. Use GEV. Use distributions of order statistics if we know F(x) and n is not too large; else, use GEV.


Download ppt "1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven."

Similar presentations


Ads by Google