Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Spatial point patterns and Geostatistics an introduction
WFM-6204: Hydrologic Statistics
Lecture (9) Frequency Analysis and Probability Plotting.
Frequency Analysis Reading: Applied Hydrology Sections 12-2 to 12-6.
Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology 04/04/2006.
EURANDOM & KNMI, May 2009 Analysis of extremes in a changing climate in support of informed decisions for adaptation
TK 6413 / TK 5413 : ISLAMIC RISK MANAGEMENT TOPIC 6: VALUE AT RISK (VaR) 1.
USING DECISION SUPPORT SYSTEM TECHNIQUE FOR HYDROLOGICAL RISK ASSESSMENT CASE OF OUED MEKERRA IN THE WESTERN OF ALGERIA M. A. Yahiaoui Université de Bechar.
Hydrologic Statistics
Analysis of Extremes in Climate Science Francis Zwiers Climate Research Division, Environment Canada. Photo: F. Zwiers.
1 Alberto Montanari University of Bologna Simulation of synthetic series through stochastic processes.
Start Audio Lecture! FOR462: Watershed Science & Management 1 Streamflow Analysis Module 8.7.
Extremes ● An extreme value is an unusually large – or small – magnitude. ● Extreme value analysis (EVA) has as objective to quantify the stochastic behavior.
Climate Change and Extreme Wave Heights in the North Atlantic Peter Challenor, Werenfrid Wimmer and Ian Ashton Southampton Oceanography Centre.
Precipitation statistics Cumulative probability of events Exceedance probability Return period Depth-Duration-Frequency Analysis.
WFM 5201: Data Management and Statistical Analysis
Probability By Zhichun Li.
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
Extreme Value Analysis, August 15-19, Bayesian analysis of extremes in hydrology A powerful tool for knowledge integration and uncertainties assessment.
Analyses of Rainfall Hydrology and Water Resources RG744
Flood Frequency Analysis
Hydrologic Statistics
Presentation of Wind Data  The wind energy that is available at a specific site is usually presented on an annual basis.  There are several methods by.
Extreme Value Analysis What is extreme value analysis?  Different statistical distributions that are used to more accurately describe the extremes of.
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
February 3, 2010 Extreme offshore wave statistics in the North Sea.
Statistics & Flood Frequency Chapter 3 Dr. Philip B. Bedient Rice University 2006.
FREQUENCY ANALYSIS.
Frequency Analysis and Data Reading: Applied Hydrology Sections
Some advanced methods in extreme value analysis Peter Guttorp NR and UW.
Statistics and Modelling 3.1 Credits: 3 Internally Assessed.
Extreme Value Techniques Paul Gates - Lane Clark & Peacock James Orr - TSUNAMI GIRO Conference, 15 October 1999.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Extreme Value Theory: Part II Sample (N=1000) from a Normal Distribution N(0,1) and fitted curve.
Extreme value statistics Problems of extrapolating to values we have no data about Question: Question: Can this be done at all? unusually large or small.
For information contact H. C. Koons 30 October Preliminary Analysis of ABFM Data WSR 11 x 11-km Average Harry Koons 30 October.
Probability. Hydrologic data series 1.Complete series Use all of the data. DateDepth (cm) 4/28/ /20/ /30/ /11/ /5/ /22/050.3.
Identification of Extreme Climate by Extreme Value Theory Approach
Basic Hydrology & Hydraulics: DES 601
New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,
Probability distributions
Extreme Value Analysis
1 ES Chapter 18 & 20: Inferences Involving One Population Student’s t, df = 5 Student’s t, df = 15 Student’s t, df = 25.
FREQUENCY ANALYSIS Siti Kamariah Md Sa’at PPK Bioprocess
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
GEOG 441 Watershed Systems Precipitation Monday 1/26/2009.
Hydrological Forecasting. Introduction: How to use knowledge to predict from existing data, what will happen in future?. This is a fundamental problem.
Guide to Choosing Time-Series Analyses for Environmental Flow Studies Michael Stewardson SAGES, The University of Melbourne.
COST-733 WG4 Links between Weather Types and Flood events in Europe Christel Prudhomme.
UNIT – III FLOODS Types of floods Following are the various types of floods: 1.Probable Maximum Flood (PMF):This is the flood resulting from the most sever.
Analyses of Rainfall Hydrology and Water Resources RG744 Institute of Space Technology October 09, 2015.
Application of Extreme Value Theory (EVT) in River Morphology
Modeling and Simulation CS 313
Hydrological Statistics
The Exponential and Gamma Distributions
Basic Hydrology: Flood Frequency
Hydrology & Water Resources Eng.
Modeling and Simulation CS 313
Chapter 9: Inferences Involving One Population
Statistical Hydrology and Flood Frequency
Flood Frequency Analysis
Hazards Planning and Risk Management Flood Frequency Analysis
Statistics & Flood Frequency Chapter 3 – Part 1
Hydrologic Statistics
Flood Frequency Analysis
Statistics & Flood Frequency Chapter 3
Environmental Statistics
Identification of Extreme Climate by Extreme Value Theory Approach
Presentation transcript:

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009

Introduction Examples of extremes in environmental contexts Some statistical models for extremes –Block maxima, Peak over threshold –Including return levels –Return period Statistical models for extremes are concerned with the tails of the distributions

Problems Normal distribution inappropriate Bulk of data not informing us about extremes Extremes are rare, so not much data But there are some special statistical models for extremes –Block maxima, Peak over threshold Require parameter estimation which may prove difficult

Introduction Modelling extremes, because we need to know about maxima and minima in many environmental systems to ensure that we know –How strong to make buildings –How high to make sea walls –How to plan for floods –etc

Stream flow

Background Assume typically that we have a time series of observations (eg maximum daily temperature for the last 20 years) Assume that the data are independent and identically distributed (e.g might a Normal or Exponential be sensible or do we need other types of distributions?) Interest is in predicting unusually high (or low) temperatures Our statistical model needs to be good for the tails of the distribution Meet the distribution and cumulative distribution function

Background The usual notation is that assume we have a series of random variables X 1, X 2,… each with cumulative distribution function F Then F(x) is the probability (X<= x) Values x p with a specified probability p, of values lying above them in a distribution, known as quantiles –X p is the (1-p) quantile The inverse cumulative distribution function F -1 (x p ) is such that x p is the value of X such that Prob(X<=x p ) =1-p

How to communicate risk Return level x p is the value associated with the return period 1/p. That is x p is the level expected to be exceeded on average once every 1/p years. x p =F -1 (1-p) P=0.01 corresponds to the 100 year return period The return level and return period are some of the most important quantities to derive from the fitted model (and as such are subject to uncertainty). A plot of x p vs –log(-log(1-p)) is called a return level plot

Background There exists a class of statistical models developed specifically for dealing with this situation Generalised Extreme value (GEV) distribution, with three parameters and depending on the values of such parameters, can simplify to give Gumbel, Frechet and Weibull distributions for the maximum over particular blocks of time. Assumptions relating to the original time series: should be stationary (ie no trend)

Some simulations From the extremes script –A) simulation of 1000 values from different distributions and draw histograms –Expect to see very different shapes –B) use block maxima to look at the distributional shapes for the maximum

GEV distribution Generalised Extreme value (GEV) distribution, has three parameters, location, scale and shape (usually written as, (>0) and G(z) =exp{-[1+ (z- )/ ] -1/ } The Gumbel, Frechet and Weibull are all special cases depending on value of

Block maxima We can also break our time series X 1, X 2..into blocks of size n and only deal with the maximum or minimum in the block. E.g if we have a daily series for 50 years, we could calculate the annual maximum and fit one of the statistical models mentioned earlier to the 50 realisations of the maxima. GEV can then be applied to the block maxima etc Quite wasteful of data (throws lots away)

Fitting and model diagnostics for GEV Fitting by maximum likelihood (may need to be done numerically, so convergence issue) Probability plot Quantile plot Return level plot Density plot Probability and quantile plot should be straight lines. All possible in the ismev library

POT modelling There exists another type of statistical model developed specifically for dealing with this situation- known as Peak over threshold- (POT) modelling Again we assume that we have a time series of observations, and define (somehow) a threshold u. Typical distributions used here are Pareto, Beta and Exponential derived from the Generalised Pareto distribution (GPD) for the exceedances How to define the threshold u is a practical issue.

GPD model Asymptotic (so as u-> ) then distribution of y (given y>u) is H(y) = 1-(1+ y/ ) -1/ and are shape and scale parameters =0 gives the exponential distribution with mean = How to define the threshold u is the big practical question

Definition of return levels for POT The level x m that is exceeded once every m observations is the solution of u [1+ (x-u)/ ] -1/ = 1/m where u is Pr(X>u) Choose u such that GPD is a good fit

issues Non-stationarity- eg in climate change there are trends in frequency and intensity of extreme weather events There are cycles- annual, diurnal etc these are rather common other. What should be done? If there is a trend or cyclical component, then we need to de-trend/deseasonalise Perhaps introduce covariates that can explain the non-stationarity

Issues specifically for POT modelling Often threshold exceedances are not independent. Various ways to deal with this –Model the dependence –declustering Another approach (depending on the application) might be to model the frequency and intensity of threshold excesses Mean number of events in an interval [0, T] is T, where is the frequency of occurrence of an event (so a rate)

Example: Flood Estimation AIM: to estimate the probability of an extreme event occurring in a given time period In hydrology, there is a long history of methods designed to deal with extremes

Annual Floods p q = the probability that discharge equals or exceeds q at least once in any given year; p q = annual exceedence probability (1 – p q ) = probability that this flood does NOT occur in a given year Assume: stationarity; no long-memory

Recurrence Interval Often refer to recurrence interval of floods (eg 1 in 200 year flood) Recurrence interval: the average time between floods equaling or exceeding q Recurrence interval (RI q ) is the inverse of the exceedence probability (1/p q )

Flow frequency distributions River Dove

Flow frequency distributions

Estimating RI q One approach to estimate the q-year flood from N-years of data rank the data from highest (q 1 ) to lowest (q N ) The exceedence probability and recurrence interval can be estimated from the rank order With N = 50, what is the rarest flood that can be estimated?

Estimating Extremes: Graphical Method Rank the data from highest (rank=1) to lowest (rank=N) Estimate plotting positions from the ranks Compute recurrence intervals Plot of q (m) vs RI q(m) Fit a line to the data Extrapolate the best-fit line to the required RI

Example: annual maximum data, Skykomish R, Gold Bar

Analytical Techniques Fit an appropriate cumulative distribution function (CDF) to the data Fitting requires use of estimation procedures (distribution shapes are not known in advance) Use the CDF to estimate the discharge for a particular RI

Example Gulungul Ck example

use the extremes.r script to try out some of simpler of these analyses

Summary estimating extremes is inherently unreliable, even with large data sets many environmental data sets are short, various distributions may be used for estimation – which ones fit best in a particular situation is difficult to assess but diagnostic tools exist data are assumed to be stationary – changing driving conditions, and long memory processes, may violate this assumption for many environmental data

software and references some R packages available –ismev, evd Good book –Coles S, An introduction to modelling extremes Lots of very recent work looking at statistical models for extremes over space and time