Stochastic Hydrology Fundamentals of Hydrological Frequency Analysis

Slides:



Advertisements
Similar presentations
Random Processes Introduction (2)
Advertisements

Hydrology Rainfall Analysis (1)
STATISTICS Sampling and Sampling Distributions
STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Hyetograph Models Professor Ke-Sheng Cheng
STATISTICS Univariate Distributions
STATISTICS Joint and Conditional Distributions
R_SimuSTAT_2 Prof. Ke-Sheng Cheng Dept. of Bioenvironmental Systems Eng. National Taiwan University.
Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology 04/04/2006.
Applied Hydrology Regional Frequency Analysis - Example Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hydrologic Statistics
WFM 5201: Data Management and Statistical Analysis
Inferences About Process Quality
Lecture II-2: Probability Review
Flood Frequency Analysis
Hydrologic Statistics
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
FREQUENCY ANALYSIS.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Probability. Hydrologic data series 1.Complete series Use all of the data. DateDepth (cm) 4/28/ /20/ /30/ /11/ /5/ /22/050.3.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Linear Statistical.
STATISTICS Joint and Conditional Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering National Taiwan University 1/45 GEOSTATISTICS INTRODUCTION.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
CE 3354 ENGINEERING HYDROLOGY Lecture 6: Probability Estimation Modeling.
Hydrological Forecasting. Introduction: How to use knowledge to predict from existing data, what will happen in future?. This is a fundamental problem.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Fundamentals of Data Analysis Lecture 10 Correlation and regression.
Virtual University of Pakistan
Parameter, Statistic and Random Samples
Modeling and Simulation CS 313
Hydrological Statistics
Chapter 4 Basic Estimation Techniques
STATISTICS POINT ESTIMATION
Basic Hydrology: Flood Frequency
Modeling and Simulation CS 313
STATISTICS Joint and Conditional Distributions
STATISTICS Random Variables and Distribution Functions
Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering
Flood Frequency Analysis
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
REMOTE SENSING Multispectral Image Classification
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Precipitation Analysis
Stochastic Hydrology Random Field Simulation
Hydrologic Statistics
STATISTICS INTERVAL ESTIMATION
Stochastic Hydrology Hydrological Frequency Analysis (I) Fundamentals of HFA Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Goodness-of-Fit Tests Applications
STOCHASTIC HYDROLOGY Random Processes
Continuous distributions
Product moment correlation
HYDROLOGY Lecture 12 Probability
The Examination of Residuals
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Stochastic Simulation and Frequency Analysis of the Concurrent Occurrences of Multi-site Extreme Rainfalls Prof. Ke-Sheng Cheng Department of Bioenvironmental.
Stochastic Hydrology Simple scaling in temporal variation of rainfalls
Stochastic Hydrology Design Storm Hyetograph
Professor Ke-sheng Cheng
STATISTICS HYPOTHESES TEST (I)
Applied Statistics and Probability for Engineers
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Professor Ke-Sheng Cheng
Presentation transcript:

Stochastic Hydrology Fundamentals of Hydrological Frequency Analysis Professor Ke-sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

General concept of hydrological frequency analysis Hydrological frequency analysis is the work of determining the magnitude of hydrological variables that corresponds to a given exceedance probability. Frequency analysis can be conducted for many hydrological variables including floods, rainfalls, and droughts. The work can be better understood by treating the interested variable as a random variable. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Let X represent the hydrological (random) variable under investigation Let X represent the hydrological (random) variable under investigation. A value xc is chosen such that an event is said to occur if X assumes a value exceeding xc. Every time when a random experiment (or a trial) is conducted the event may or may not occur. We are interested in the number of Bernoulli trials in which the first success occur. This can be described by the geometric distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Geometric distribution Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Recurrence interval vs return period Average number of trials to achieve the first success. Recurrence interval vs return period 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

The general equation of frequency analysis 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Collecting required data. Estimating the mean, standard deviation and coefficient of skewness. Determining appropriate distribution. Calculating xT using the general eq. It is apparent that calculation of involves determining the type of distribution for X and estimation of its mean and standard deviation. The former can be done by GOF tests and the latter is accomplished by parametric point estimation. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Data series for frequency analysis Complete duration series A complete duration series consists of all the observed data. Partial duration series A partial duration series is a series of data which are selected so that their magnitude is greater than a predefined base value. If the base value is selected so that the number of values in the series is equal to the number of years of the record, the series is called an “annual exceedance series”. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Peak-over-threshold series Data independency Extreme value series An extreme value series is a data series that includes the largest or smallest values occurring in each of the equally- long time intervals of the record. If the time interval is taken as one year and the largest values are used, then we have an “annual maximum series”. Annual exceedance series and annual maximum series are different. Peak-over-threshold series Data independency Why is it important? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Parameter estimation Method of moments Maximum likelihood method Method of L-moments (Gaining more attention in recent years) Depending on the distribution types, parameter estimation may involve estimation of the mean, standard deviation and/or coefficient of skewness. Parameter estimation exemplified by the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Gamma distribution parameter estimation Gamma distribution is a special case of the Pearson type III distribution (with zero location parameter). Gamma density where , , and  are the mean, standard deviation, and coefficient of skewness of X (or Y), respectively, and  and  are respectively the scale and shape parameters of the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

MOM estimators 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Maximum likelihood estimator 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Evaluating bias of different estimators of coefficient of skewness 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Evaluating mean square error of different estimators of coefficient of skewness 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Techniques for goodness-of-fit test A good reference for detailed discussion about GOF test is: Goodness-of-fit Techniques. Edited by R.B. D’Agostino and M.A. Stephens, 1986. Probability plotting Chi-square test Kolmogorov-Smirnov Test Moment-ratios diagram method L-moments based GOF tests 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Probability plotting Fundamental concept Probability papers Empirical CDF vs theoretical CDF Misuse of probability plotting 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Suppose the true underlying distribution depends on a location parameter  and a scale parameter  (they need not to be the mean and standard deviation, respectively). The CDF of such a distribution can be written as where Z is referred to as the standardized variable and G(z) is the CDF of Z. If the random sample is truly from a cumulative distribution F(X), then Z=G-1(F(X)) and X are linearly related. In practice, Z can be found by using Z=G-1(Fn(X)). 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

where x represents the observed values of the random variable X. Also let Fn(X) represents the empirical cumulative distribution function (ECDF) of X based on a random sample of size n. A probability plot is a plot of on x where x represents the observed values of the random variable X. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Most of the plotting position methods are empirical Most of the plotting position methods are empirical. If n is the total number of values to be plotted and m is the rank of a value in a list ordered by descending magnitude, the exceedence probability of the mth largest value, xm, is , for large n, shown in the following table. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Misuse of probability plotting Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Misuse of probability plotting 48-hr rainfall depth Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Fitting a probability distribution to annual maximum series (Non-parametric GOF tests) How do we fit a probability distribution to a random sample? What type of distribution should be adopted? What are the parameter values for the distribution? How good is our fit? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Chi-square GOF test 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Chi-square Goodness-of-fit test in R 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Kolmogorov-Smirnov GOF test The chi-square test compares the empirical histogram against the theoretical histogram. By contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

In order to measure the difference between Fn(X) and F(X), ECDF statistics based on the vertical distances between Fn(X) and F(X) have been proposed. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Stochastic convergence Almost-sure convergence or Convergence with probability 1 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Hypothesis test using Dn 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Values of for the Kolmogorov-Smirnov test 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

K-S Goodness-of-fit test in R (ks.test) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Interpretation of the probability distribution of the test statistic 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

IDF curve fitting using the Horner’s equation The intensity-duration-frequency (IDF) relationship of the design storm depths 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

DDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

IDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Alternative IDF fitting (Return-period specific) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Further discussions on frequency analysis Extracting annual maximum series Probabilistic interpretation of the design total depth Joint distribution of duration and total depth Selection of the best-fit distribution 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Annual maximum series Data in an annual maximum series are considered IID and therefore form a random sample. For a given design duration tr, we continuously move a window of size tr along the time axis and select the maximum total values within the window in each year. Determination of the annual maximum rainfall is NOT based on the real storm duration; instead, a design duration which is artificially picked is used for this purpose. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Random sample for estimation of design storm depth The design storm depth of a specified duration with return period T is the value of D(tr) with the probability of exceedance equals  /T. Estimation of the design storm depth requires collecting a random sample of size n, i.e., {x1, x2, …, xn}. A random sample is a collection of independently observed and identically distributed (IID) data. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Probabilistic interpretation of the design storm depth 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

It should also be noted that since the total depth in the depth-duration- frequency relationship only represents the total amount of rainfall of the design duration (not the real storm duration), the probability distributions in the preceding figure do not represent distributions of total depth of real storm events. Or, more specifically, the preceding figure does not represent the bivariate distribution of duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

The usage of annual maximum series for rainfall frequency analysis is more of an intelligent and convenient engineering practice and the annual maximum data do not provide much information about the characteristics of the duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Joint distribution of the total depth and duration Total rainfall depth of a storm event varies with its storm duration. [A bivariate distribution for (D, tr).] For a given storm duration tr, the total depth D(tr) is considered as a random variable and its magnitudes corresponding to specific exceedance probabilities are estimated. [Conditional distribution] In general, 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Selection of the best-fit distribution Methods of model selection based on loss of information. Akaike information criterion (AIC) Schwarz's Bayesian information criterion (BIC) Hannan-Quinn information criterion (HQIC) Anderson-Darling criterion (ADC) Common practices of WRA-Taiwan SE and U SSE and SE 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Information-criteria-based model selection where is the log-likelihood function for the parameter  associated with the model, n is the sample size, and p is the dimension of the parametric space. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

WRA Practice p: Number of distribution parameters Weibull plotting position formula is used for calculation of cumulative probability. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Model selection based on information criteria using R The nsRFA package MSClaio2008(x) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

MSClaio2008 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Indicatively, AICc should be used when (n/p) < 40. When the sample size, n, is small, with respect to the number of estimated parameters, p, the AIC may perform inadequately. In those cases a second- order variant of AIC, called AICc, should be used: Indicatively, AICc should be used when (n/p) < 40. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Rationale of the information criteria The Akaike information criterion uses the Kullback-Leibler divergence as the discrepancy measure between the true model f(x) and the approximating model g(x). Information and entropy 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

What is information? Consider the following statements: I will eat some food tomorrow. A major earthquake will strike Taiwan tomorrow. Which statement conveys more information? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Definition of entropy 侯如真,2001. 訊息熵應用於雨量站網設計之理論探討。國立臺灣大學農業工程學研究所碩 士論文。 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

Kullback-Leibler Divergence 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University

where pj is the number of parameters of the jth model. If there are several candidate distributions, we only need to calculate H(X|qi(X)) since H(X|p(X)) is a constant. In practical applications, the above term is estimated as (Akaike, 1973) where pj is the number of parameters of the jth model. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University