A major Hungarian project for flood risk assessment A.Zempléni (Eötvös Loránd University, Budapest, visiting the TU Munich as a DAAD grantee) Technical.

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Introduction to Algorithmic Trading Strategies Lecture 8 Risk Management Haksun Li
Visual Recognition Tutorial
Extremes ● An extreme value is an unusually large – or small – magnitude. ● Extreme value analysis (EVA) has as objective to quantify the stochastic behavior.
CF-3 Bank Hapoalim Jun-2001 Zvi Wiener Computational Finance.
Simulation Modeling and Analysis
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Climate Change and Extreme Wave Heights in the North Atlantic Peter Challenor, Werenfrid Wimmer and Ian Ashton Southampton Oceanography Centre.
Evaluating Hypotheses
„EXTREME-VALUE ANALYSIS: FOCUSING ON THE FIT AND THE CONDITIONS, WITH HYDROLOGICAL APPLICATIONS” Dávid Bozsó, Pál Rakonczai, András Zempléni Eötvös Loránd.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
Extreme Value Analysis, August 15-19, Bayesian analysis of extremes in hydrology A powerful tool for knowledge integration and uncertainties assessment.
Inferences About Process Quality
7. Nonparametric inference  Quantile function Q  Inference on F  Confidence bands for F  Goodness- of- fit tests 1.
Tests of Hypothesis [Motivational Example]. It is claimed that the average grade of all 12 year old children in a country in a particular aptitude test.
P á l Rakonczai, L á szl ó Varga, Andr á s Zempl é ni Copula fitting to time-dependent data, with applications to wind speed maxima Eötvös Loránd University.
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
AM Recitation 2/10/11.
Overview Definition Hypothesis
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Random Sampling, Point Estimation and Maximum Likelihood.
February 3, 2010 Extreme offshore wave statistics in the North Sea.
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
1 Statistical Distribution Fitting Dr. Jason Merrick.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
1 SMU EMIS 7364 NTU TO-570-N Inferences About Process Quality Updated: 2/3/04 Statistical Quality Control Dr. Jerrell T. Stracener, SAE Fellow.
1 A non-Parametric Measure of Expected Shortfall (ES) By Kostas Giannopoulos UAE University.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Extreme Value Theory: Part II Sample (N=1000) from a Normal Distribution N(0,1) and fitted curve.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Statistical Analyses of Extremes from a Regional Climate Model Chris Ferro Climate Analysis Group Department of Meteorology University of Reading Royal.
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.
: An alternative representation of level of significance. - normal distribution applies. - α level of significance (e.g. 5% in two tails) determines the.
Sampling and estimation Petter Mostad
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,
Extreme Value Analysis
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Chris Ferro Climate Analysis Group Department of Meteorology University of Reading Extremes in a Varied Climate 1.Significance of distributional changes.
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Application of Extreme Value Theory (EVT) in River Morphology
Chapter 9: Inferences Involving One Population
Sample Mean Distributions
When we free ourselves of desire,
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Discrete Event Simulation - 4
Sampling Distributions (§ )
Statistical Inference for the Mean: t-test
Fractional-Random-Weight Bootstrap
Presentation transcript:

A major Hungarian project for flood risk assessment A.Zempléni (Eötvös Loránd University, Budapest, visiting the TU Munich as a DAAD grantee) Technical University of Munich,

The project Title: “Establishing the engineering and scientific bases on flood risk assessment, development of new methods of flood frequency and risk estimation” (one of the approximately 100 national research and development programs accepted in 2001). Participants: Water Resources Research Centre, three universities, regional water directorates.

Main reason for it being proposed There were several major flood waves on river Tisza in the last decade.

Hungary

Current situation Last major analysis of flood data was done 30 years ago. During the dry period , the rivers were narrowed at several places. There is a natural tendency for the riverbed to deepen. The current flood estimates are based on water level data rather than the streamflow time series.

Water-level data example: Vásárosnamény At least two observations per day for each station (there are approx. 50 of them) for 100 years. Reduction: one observation per day.

Main questions Return level estimators (extreme value analysis) Time series methods for daily observations (in order to run simulations), to be presented in the next talk.

Analysis of extreme values Probably the most important part Classical methods: based on annual maxima Peaks-over-threshold methods: utilize all floods higher than a given (high) threshold. Multivariate modelling –Bayesian approach (dependence among parameters) –Joint behaviour of extremes

Extreme-value distributions Letbe independent, identically distributed random variables. If we can find norming constants a n, b n such that has a nondegenerate limit, then this limit is necessarily a max-stable or so-called extreme value distribution. X 1, X 2,…,X n [max(X 1, X 2,…, X n )-a n ]/ b n

Characterisation of extreme-value distributions Limit distributions of normalised maxima: Frechet:(x>0) is a positive parameter. Weibull:(x<0) Gumbel: (Location and scale parameters can be incorporated.)

Another parametrisation The distribution function of the generalised extreme-value (GEV) distribution:  : location,  : scale,  : shape parameters;  >0 corresponds to Frechet,  =0 to Gumbel  <0 to Weibull distribution if

Example: annual maxima of water levels Vásárosnamény Záhony

The fitted models to water level data Vásárosnamény Záhony, right-endpoint: 940 cm778 cm

Check the conditions Are the observations (annual maxima) – independent? It can be accepted for most of the stations. –identically distributed? Check by comparing different parts of the sample fitting models, where time is a covariate –follow the GEV distribution?

A typical example (water level data, Vásárosnamény )

Tests for GEV distributions Motivation: limit distribution of the maximum of normalised iid random variables is GEV, but –the conditions are not always fulfilled –in our finite world the asymptotics is not always realistic Usual goodness-of-fit tests: –Kolmogorov-Smirnov –χ 2 Not sensitive for the tails

Alternatives Anderson-Darling test: Computation: where z i =F(X i ). Sensitive in both tails. Modification: (for maximum; upper tails). Its computation:

Further alternatives Another test can be based on the stability property of the GEV distributions: for any m  N there exist a m, b m such that F(x)=F m (a m x+b m )(x  R) The test statistics: Alternatives for estimation: To find a,b which minimize h(a,b) (computer- intensive algorithm needed). To estimate the GEV parameters by maximum likelihood and plug these in to the stability property.

Limit distributions Distribution-free for the case of known parameters. For example: where B denotes the Brownian Bridge over [0,1]. As the limits are functionals of the normal distribution, the effect of parameter estimation by maximum likelihood can be taken into account by transforming the covariance structure. In practice: simulated critical values can also be used (advantage: small-sample cases).

Power studies For typical alternatives, there is no major difference between the test A-D and B. The power of h very much depends on the shape of the underlying distribution. The probability of correct decision: n Test Distr.NBexpNormal B A-D h

Applications For specific cases, where the upper tails play the important role (e.g. modified maximal values of real flood data), B is the most sensitive. When applying the above tests for the flood data (annual maxima; windows of size 50), there were a couple of cases when the GEV hypothesis had to be rejected at the level of 95%. Possible reasons: changes in river bed properties (shape, vegetation etc).

An example for rejection: Szolnok water level,

Further investigations Confidence bounds should be calculated, possible methods –based on asymptotic properties of maximum likelihood estimator –profile likelihood –resampling methods (bootstrap, jackknife) –Bayesian approach Estimates for return levels

Confidence intervals For maximum likelihood: –By asymptotic normality of the estimator: where is the (i,i)th element of the inverse of the information matrix –By profile likelihood For other nonparametric methods by bootstrap.

Profile likelihood One coordinate of the parameter vector is fixed, the maximization is with respect the other components: l(  ) is the log-likelihood function;  =(  i,  -i ) Let X 1,…,X n be iid observations. Under the regularity conditions for the maximum likelihood estimator, asymptotically (a chi-squared distribution with k degrees of freedom, if  i is a k-dimensional vector).

Use of the profile likelihood Confidence interval construction for a parameter of interest: where c  is the 1-  quantile of the  1 2 distribution. Testing nested models: M 1 (  ) vs. M 0 (the first k components of  =0). l 1 ( M 1 ), l 0 (M 0 ) are the maximized log-likelihood functions and D:=2{l 1 ( M 1 )- l 0 (M 0 )}. M 0 is rejected in favor of M 1 if D>c  (c  is the 1-  quantile of the  k 2 distribution).

Return levels z p : return level, associated with the return period 1/p (the expected time for a level higher than z p to appear is 1/p): The quantiles of the GEV: where Remark: the probability that it actually appears before time 1/p is more than 0.5 (approx if p is small) if   0 if  = 0

Backtests: return level estimators based on parts of the dataset

Investigation of the backtest Too many floods above the estimated level (mean=3.95 for the first 21 years) - simulation studies confirm this being a significant deviation from the iid case. Linear trend in the mean: m= (t-50) the other parameters are supposed to be constant. Estimated endpoint for 2000: 960 cm (actual observed value, not in the sample : 970 cm).

Peaks over threshold methods If the conditions of the Fisher-Tippett theorem hold, the conditional probability of X-u, under the condition that X>u, can be given as if y>0 and, where H(y) is the so called generalized Pareto distribution (GPD).  is the same as the shape parameter of the corresponding GEV distribution.

Peaks over threshold methods Advantages: –More data can be used –Estimators are not affected by the small “floods” Disadvantages: –Dependence on threshold choice –Original daily observations are dependent; declustering not always obvious (see Ferro- Segers, 2003 for a recent method).

Inference Similar to the annual maxima method: –Maximum likelihood is to be preferred –Confidence bounds can be based on profile likelihood –Model fit can be analyzed by P-P plots and Q-Q plots or formal tests (similar to those presented earlier) –Return levels/upper bounds can be estimated Our results for the flood data: very much the same as for the GEV.

GPD fit: Vásárosnamény, water level shape=-0.51, estimated upper endpoint=940 cm the upper endpoint of its 95% conf. int.: 1085 cm

Return level estimators by parts of the dataset: Vásárosnamény

Some references 1.Ferro, T. A.- Segers, J. (2003): Inference for clusters of extreme values. Journal of Royal Statistical Soc. Ser. B. 65, p Kotz, S. – Nadarajah, S. (2000): Extreme Value Distributions. Imperial College Press. 3.Zempléni, A. (1996): Inference for Generalized Extreme Value Distributions Journal of Applied Statistical Science 4, p Zempléni, A. Goodness-of-fit tests in extreme value theory. (In preparation.)