Making rating curves - the Bayesian approach. Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the Standard Normal Distribution Review the Central Limit Theorem.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Bayesian Estimation in MARK
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Bayesian statistics 2 More on priors plus model choice.
Bayesian statistics – MCMC techniques
Visual Recognition Tutorial
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Presenting: Assaf Tzabari
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Visual Recognition Tutorial
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
1 Introduction to Estimation Chapter Concepts of Estimation The objective of estimation is to determine the value of a population parameter on the.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Chapter 10 Introduction to Estimation Sir Naseer Shahzada.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
Bayesian statistics Probabilities for everything.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Inference: Probabilities and Distributions Feb , 2012.
Lecture 2: Statistical learning primer for biologists
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo) Classical and Bayesian nonlinear regression.
Sampling and estimation Petter Mostad
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Markov Chain Monte Carlo in R
The normal distribution
The simple linear regression model and parameter estimation
Bayesian Semi-Parametric Multiple Shrinkage
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Data Mining Lecture 11.
Latent Variables, Mixture Models and EM
More about Posterior Distributions
The normal distribution
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Making rating curves - the Bayesian approach

Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given place in a river. The relationship should be on the form Q=C(h-h 0 ) b or a segmented version of that. Q=discharge, h=stage. It should be possible to deal with the uncertainty in such estimates. There should also be other statistical measures of the quality of such a curve. These measures should be easy to interpret by non-statisticians.

Making rating curves the old fashioned way For a known zero-stage, the rating curve can be written as q=a+bx, where q=log(Q), x=log(h-h 0 ) and a=log(C). For a set of measurements, one can then do linear regression with q as response, x as covariate and a and b as unknown linear parameters. Minimize SS analytically (standard linear regression).

The old approach – handling c=-h 0 The problem is that the effective bottom level, h 0 =-c, is not known. Solution: Minimize SS by stepping through all possible values of c. The advantage: This is the same as maximizing the likelihood for the regression problem: q i =a+b log(h i +c)+  i or Q i =C (h i -h 0 ) b E i where  i ~ N(0,  2 ) is iid noise and E i = e  i. This model makes hydraulic and statistical sense!

Problems with the old approach We have prior information about curves that we would like to use in the estimation. Inference and statistical quality measures are difficult to interpret. Difficult to get a grip on the discharge estimate uncertainty. There is a chance that one gets infinite parameter estimates using this method!

Bayesian statistics Frequentistic: treats the parameters as fixed and finds estimators that will catch their values approximately. Bayesian: treats the parameters as having a stochastic distribution which is derived from the observations and to prior knowledge. Bayes’ theorem: f(  | D) = f( D |  )f(  )/f(D) where f stands for a distribution, D is the data set and  is the parameter set.

Prior knowledge Prior info about a and b can be obtained from already generated rating curves (using the frequentistic approach) or by hydraulic principles. Prior info about the noise can be obtained from knowledge about the measurements. Problem: Difficult to set the prior for the location parameter h 0 =-c, but we know it will not be far below the stage measurements.

Prior knowledge of a and b from the database Histogram of generated a’s from the database. Normal approximation seems ok. Histogram of generated b’s from the database. Normal approximation seems less fine, but is used for practical reasons.

Bayesian regression Data given parameters is the same here; q i =a+b log(h i +c)+  i. D={h i, q i } i=1…n Problem; even though we have prior info, this does not give us the form of the prior f(  ),  =(a,b,c,  2 ). If the priors are on a certain form, one can do Bayesian linear regression analytically; q i =a+b x i +  i for x i =log(h i +c) for a given c. Same thought as for the frequentistic approach, handle a,b and  2 using a linear model, and handle c using discretization.

Problems with Bayesian regression While this gives us the form of f(a,b,  2 ), it does not give us the form of f(c). We know that the stage levels are not too far above the zero-level. We’d like to code this prior info but we don’t want to use the stage measurement (using them both in the prior and the likelihood). Jeffrey’s priors containing the covariates is a general problem with the Bayesian regression approach! Ok, if you really are in a regression setting, but this is not the case here.

Problems with the first Bayesian approach The form that makes the linear regression analytical is rather strange. It requires the form of the prior for  2 which influences the priors for (a,b). However, prior info about these two would be better kept separate. Difficult to set the prior info for users. Expected discharge is infinite in this approach! (Median will be finite.)

A new Bayesian regression approach Using a semi conjugate prior, (a,b)~N 2, independent of  2 ~IG, we separate prior knowledge about a,b and  2. We can no longer handle (a,b,  2 ) analytically for known c. However, (a,b,c,  2 ) can be sampled using MCMC methods. The sampling method must be effective, since users do not want to wait to long for the results.

A graphical overview of the new model a Va  a Vb a Va  a Vb  a b 22   qiqi For i in {1,…,number of measurements} Hyper-parameters: Parameters: Measurements: hihi

Sampling methods and efficiency Naïve MCMC: The Metropolis algorithm. Problem: (a,b,c) are extremely mutually dependent. Metropolis or independence-sampler for c, Gibbs sampling for (a,b,  2 ). Dependency of (a,b,c) makes trouble here, too. Solution: Sample (a,b,c,  2 ) together and then do a Metropolis-Hastings accepting. Sample c using first adaptive Metropolis, then indep. sampler. Sample (a,b,  2 ) given c and previous  2 using Gibbs-like sampling. Then accept/reject all four.  i-1 2 cici a i,b i i2i2 Iteration: i-1 i

Estimation based on simulations We can estimate parameters using the sampled parameters by either taking the mean or the median. We can estimate the discharge for a given stage value, either by mean or median discharge from the sampled parameters or by discharge from the mean or median parameters. Simulations show that median is better than mean.

Inference based on simulations Uncertainty in the parameters can be established by looking at the variance of sampled parameters. Credibility intervals can be arrived at from the quantiles of the parameters. Discharge uncertainty and credibility intervals can be obtained by a similar approach to the discharge for the drawn parameters.

Example – rating curve with uncertainty:

Example – prior to posterior Prior of b.Posterior of b.

Example - diagnostic plots Scatter plot of simultaneous samples from a and b. Note the extreme correlation between the parameters. Residuals. Note the “trumpet” form. There is heteroscedasticy here, which the model does not catch.

What has been achieved Discharge estimates with lower RMSE than frequentistic estimates. Measures of estimation uncertainty that are easy to interpret. Hopefully, quality measures should be less difficult to understand. The distribution of parameters can be used for decision problems. (Should we do more measurements?)

What remains Multiple segmentation. Need to find good quality measures in addition to estimation uncertainty. Possibility: Calculate the posterior probability of more advanced models. Learning about the priors: A hierarchical approach. There is still some prior knowledge that has not found it’s way into the model; namely distance between zero- stage and stage measurements. Heteroscedasticy ought to be removed. Should have a prior on b that closer reflects both prior knowledge (positive b) and the database collection of estimates. For example: b~logN. But this introduces problems with efficiency.

A graphical view of the model and a tool for a hierarchical approach  a V a   b V b a j b j j2j2   h j,i q j,i For j in {1,…,number of stations} For i in {1,…,number of measurements for station j} distribution with or without hyper-parameters parameters: measurements: hyper-

Solution to the prior for h+c Possible to go from a regression situation to a model that has both stochastic discharge and stage values. Possibility: A structural model where real discharge,, has a distribution. The real stage,, is a deterministic function of the curve parameters, (a, b, c). Observations, D=(q i, h i ), are the real values plus noise. The model gives a more realistic description of what happens in the real world. It also codes the prior knowledge about the difference between stage measurements and zero-stage, through the distribution of q and the distribution of (a, b).

Structural model – a graphical view  q  q 2 a b c hihi qiqi 22 h2h2 parameters: latent variables: measurements:          q  q  0  0 parameters:  b V a   b V b distribution with or without hyper-parameters hyper-

Advantage and problems of a structural model Advantage: More realistic modelling of the measurements and the underlying structure. Codes prior knowledge about the relationship between stage measurements and the zero-stage. Can solve heteroscedasticy. Gives a more detailed picture of how measurement errors occur. Since b can not be sampled using Gibbs, we might as well use a form that insures positive exponent. Problem: Difficult to make an efficient algorithm. More complex. Thus even if it codes more prior knowledge, the estimates might be more uncertain. This has not been tested.