1 Alberto Montanari University of Bologna Simulation of synthetic series through stochastic processes.

Slides:



Advertisements
Similar presentations
ELEN 5346/4304 DSP and Filter Design Fall Lecture 15: Stochastic processes Instructor: Dr. Gleb V. Tcheslavski Contact:
Advertisements

Random Variables ECE460 Spring, 2012.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
SE503 Advanced Project Management Dr. Ahmed Sameh, Ph.D. Professor, CS & IS Project Uncertainty Management.
Statistics review of basic probability and statistics.
Hydrologic Statistics
STAT 497 APPLIED TIME SERIES ANALYSIS
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Some Basic Concepts Schaum's Outline of Elements of Statistics I: Descriptive Statistics & Probability Chuck Tappert and Allen Stix School of Computer.
Simulation Modeling and Analysis
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Sampling Distributions
WFM 5201: Data Management and Statistical Analysis
Probability By Zhichun Li.
Statistical Background
CHAPTER 6 Statistical Analysis of Experimental Data
3-1 Introduction Experiment Random Random experiment.
Slide 1 Statistics Workshop Tutorial 4 Probability Probability Distributions.
Review of Probability and Random Processes
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Lecture II-2: Probability Review
Flood Frequency Analysis
Review of Probability.
Hydrologic Statistics
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Probability Theory and Random Processes
Common Probability Distributions in Finance. The Normal Distribution The normal distribution is a continuous, bell-shaped distribution that is completely.
STAT 497 LECTURE NOTES 2.
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
3-2 Random Variables In an experiment, a measurement is usually denoted by a variable such as X. In a random experiment, a variable whose measured.
FREQUENCY ANALYSIS.
1 Statistical Distribution Fitting Dr. Jason Merrick.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 EE571 PART 3 Random Processes Huseyin Bilgekul Eeng571 Probability and astochastic Processes Department of Electrical and Electronic Engineering Eastern.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Hydrological Forecasting. Introduction: How to use knowledge to predict from existing data, what will happen in future?. This is a fundamental problem.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Chapter 6 Random Processes
Theoretical distributions: the Normal distribution.
4. Overview of Probability Network Performance and Quality of Service.
Multiple Random Variables and Joint Distributions
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
STATISTICS Random Variables and Distribution Functions
Econ 3790: Business and Economics Statistics
Hydrologic Statistics
Machine Learning Week 4.
STOCHASTIC HYDROLOGY Random Processes
Statistical analysis and its application
Chapter 6 Random Processes
CH2 Time series.
Continuous Random Variables: Basics
Presentation transcript:

1 Alberto Montanari University of Bologna Simulation of synthetic series through stochastic processes

2 Stochastic simulation Stochastic (random) processes can be used for directly generating river flow data. Realisation of a stochastic process: a time series that is a random outcome from the process. Statistics of the synthetic series are similar to those of the observed time series. Be careful: nature is not stochastic. It follows just one trajectory which is not random. It is our assumption to describe unknown trajectories, which we cannot describe deterministically, with a random process

3 Some basic concepts of statistics and probability Statistics is the science of the collection, organization, and interpretation data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments (from Wikipedia). Probability is a way of expressing knowledge or belief that an event will occur or has occurred. Probability theory is used in statistics. The word probability does not have a consistent direct definition.

4 Some basic concepts of statistics and probability There are two broad categories of probability interpretations: 1.Frequentists talk about probabilities only when dealing with experiments that are random and well-defined. The probability of a random event denotes the relative frequency of occurrence of an experiment's outcome, when repeating the experiment. Frequentists consider probability to be the relative frequency "in the long run" of outcomes. 2.Bayesians assign probabilities to any statement whatsoever, even when no random process is involved. Probability, for a Bayesian, is a way to represent an individual's degree of belief in a statement, or an objective degree of rational belief, given the evidence.

5 Some basic concepts of statistics and probability Kolmogorov axioms of probability: 1.Probability lies in the range [0,1]. 2.Probability of the certain event is 1; probability of the impossible event is 0. 3.Pr (A U B U C U etc) = Pr(A) + Pr(B) + Pr(C) + Pr (etc) where A, B, C and etc are mutually exclusive. Engineeristic interpretation: probability expresses the likelihood of an event with a measure varying between 0 and 1. Random variable: numerical description of the outcome from an experiment.

6 Some basic concepts of statistics and probability Frequentist interpretation: define an experiment and estimate probability of each outcome by computing its frequency in the long run. Be careful: to understand the differences between Bayesian and frequentist probability is not easy. Basically, the first does not require the definition of a formal experiment and therefore allows for subjective estimation of probability. The two interpretations are not mutually exclusive; they should reach the same conclusions when applied to well defined experiments (well defined accordingly to frequentist’s interpretation).

7 Some basic concepts of statistics and probability Probability distribution of a random variable X, with outcome x. Associates to each x its probability (when the variable is discrete) or the probability of each value of X to fall within a certain range (when the variable is continuous). Probability distributions have different shapes and depends on one or more parameters. Example: Gaussian probability distribution (or Normal distribution). Central limit theorem.

8 Some basic concepts of statistics and probability When the random variable is discrete, the meaning of probability distribution is clear: it gives the probability P(X= x ) For continuous variables the physical interpretation is more difficult, because the probability of getting each individual value of a real random variable is infinitesimally small (practically zero). Therefore we may refer to the probability P(X= x ) as the probability for X to fall in a infinitesimally small range around x. Probability density p of X to assume a individual value x within a certain range  x: can be approximately computed by estimating the probability of X to fall in  x and dividing it for  x itself (we have infinite possible outcomes in  x so we have to divide for the range length to get the probability of each of them). For  x tending to 0 we converge to the estimate of p(X= x ) (the probability for X to fall into the range tends to zero with  x and their ration tends to p(X= x )).

9 Some basic concepts of statistics and probability Let’s refer to continuous variables. Distinction between Cumulative distribution function (CDF, F X ( x )) and probability density function (pdf, f X ( x )): they contain the same amount of information! The CDF gives the probability of not exceedance of a random variable, i.e.: F X ( x ) = P(X≤ x ) The CDF has a very important meaning in hydrology. The CDF can be derived by integrating the pdf. If the random variable is defined in the range (-∞, +∞) then one can write: F X ( x )= ∫ -∞ x x f X (  )d 

10 Gaussian probability distribution The Gaussian distribution is a simmetric distribution Premise: meaning of probability distribution for continuous random variables:

11 Uniform probability distribution

12 Probability and Return Period Return period: also known as a recurrence interval. It is a statistical measurement denoting the average recurrence interval between two events whose magnitude is equal or greater than a certain level. It is related to the probability of not exceedance: P(X≤ x ) = [ T( x )-1 ] / T( x )T( x ) = 1 / [ 1- P(X≤ x ) ] Usual design return periods: -Sewer systems: 2-10 yrs -Road drainage systems: yrs -Bridges: yrs -Dams: yrs

13 Independence Independence: two events are independent if the occurrence of one event makes it neither more nor less probable that the other occurs. A random variable is said to be independent if one outcome does not have any influence on the subsequent one(s). Remember: CDF and pdf can be used to describe independent random variables only. Remember: so far, we are talking about independent events, like annual maximum rainfall or discharge, coin tossing, etc.

14 Generation of independent random variables Under the above assumptions, it’s easy to generate outcomes from a probability distribution and therefore generate random variables. 1)Generate N outcomes from the Uniform distribution in the range [0,1]. 2)Generate independent outcomes from the random variable X as: x = F -1 X { U[0,1] }

15 Processes Time series like mean daily discharges can be regarded as collection of events. However, these events are not independent. In fact, daily discharges keep memory of them selves, for a time span that depends on catchment’s properties. Therefore time series cannot be generated by simply generating (independent) random variables. Collection (families) or random variables are called “stochastic processes”. A stochastic process is a family of random variables (“stochastic” is a synonym for “random”). Natural processes can be treated as “realisations” of stochastic processes.

16 Processes, ergodicity, stationarity Ergodicity: the average of a process parameter over time and the average over the statistical ensemble are the same. Right or not, the analyst assumes that it is as good to observe a process for a long time as sampling many independent realisations of the same process. Stationarity: a process is stationary when its statistical properties does not change over time or position. Remember: ergodicity and stationarity are necessary assumptions to make inference about a stochastic process.

17 How to generate realisations from stochastic processes? One needs to generate synthetic time series with the same statistical properties of the observed sample, including dependence properties. One has to match the probability distribution of the data and the dependence structure of the data. One way to decipher the dependence structure of data is to compute autocorrelation, which is a measure of linear dependence (a measure of the extent to which dependence can be approximated by a linear relationship). Autocorrelation at lag k for a stationary process:

18 How to generate realisations from stochastic processes? Plotting estimates of R(k) against k gives the autocorrelation function (ACF): Correlation can also be computed among two different processes X(t) and Y(t).

19 How to generate realisations from stochastic processes? Let us assume that the dependence structure of the considered process can be represented by a linear relationship. We are assuming a linear stochastic process. Therefore the process can be written in this way: X (t) =  1 X (t-1) +  2 X (t-2) + …. +  n X (t-n) +  (t) therefore obtaining an autoregressive process.

20 Autoregressive (AR) stochastic processes Assumptions: 1)Linear process and Gaussian process (only the Gaussian distribution is preserved through linear transformations). 2)X(t) is zero mean.  (t) is zero mean and uncorrelated. 4)Cross correlation between X(t) and  (t+k) is null for any positive k.

21 Generation of synthetic series from an autoregressive (AR) stochastic processes 1)Fit the stochastic process X(t); namely, fit its parameters and the variance of  (t). 2)Generate outcomes from  (t) from a Gaussian distribution with zero mean and proper variance. 3)Compute synthetic variables accordingly to the definition of stochastic process. Problems: non Gaussianity, non stationarity.

22 Generation of synthetic series from an autoregressive (AR) stochastic processes 1)Generate e time series of  (t). 2)Compute the realisation of X(t) accordingly to the formulation of the given stochastic process. 3)Remember: any time that a given X(t) is not available we should assume that the missing observation is equal to the mean value of X(t), namely to 0.

23 Stationarity assumption To meet the assumption of stationarity conditions must be imposed on the values of the coefficients. We do not go into details. For instance, for the AR1 process it is necessary that the absolute value of the autoregressive coefficient is less than 1. If the process is non stationary or non Gaussian a possible solution is to apply a preliminary transformation to the data.

24 Non Stationarity From a physical point of view, non stationarity can be caused by changes in climate or land use. A classical example of non stationarity is the presence of a linear trend in the data, or the presence of seasonality (also called ciclo-stationarity). Trend and seasonality can be removed with a preliminary transformation.

25 Removal of seasonality The literature proposes many techniques for removing seasonality from data. The most classical techniques are based on estimating a periodic component in the mean value and variance of the data. Let’s indicate these periodic components with the symbols:  (  )1 ≤  ≤ 365  (  )1 ≤  ≤ 365 The deseasonalised time series is computed as: X d (t,  ) = [X(t,  )-  (  )]/  (  ) where t indicates the usual time index while t indicates the time position in the year (day, month…) in which the observation is collected.

26 Removal of trend A similar technique is applied for removing the trend. A linear trend is estimated on the data and the value of the trend at time t is subtracted by the corresponding observation collected at the same time t.

27 Non Gaussianity Non Gaussianity can also be resolved by applying a preliminary transformation. The logaritmic transformation is often applied and is often successful. Alternative: normal quantile transform (NQT) 1)Compute the cumulative frequency of the data, namely: Fr[X(t)] = r[X(t)]/(N+1) where r is the rank of X(t) in the sample rearranged in descending order 2)Compute X T (t) = F -1 [Fr[x(t)]] where F -1 indicates the inverse of the Gaussian CDF with mean 0 and standard deviation 1 (canonic Gaussian distribution).

28 Statistical tests at the 95% confidence level for independence and Gaussianity Independence: 1) compute the ACF 2) check that any autocorrelation coefficient is lower, in absolute value, than 1.96 (1/N) 0.5, where N is the sample size. Gaussianity 1)Compute the cumulative frequency of the data, namely: Fr[X(t)] = r[X(t)]/(N+1) where r is the rank of X(t) in the sample rearranged in descending order 2)Compute |F[x(t)] - Fr[X(t)]| where F indicates the Gaussian CDF with mean and standard deviation equal to those of the sample. 3)Check that the above difference is lower than critical values given by a table.