Module 3: Introduction to Time Series Methods and Models.

Module 3: Introduction to Time Series Methods and Models

chee825 - Fall 2005J. McLellan2 What is a Time Series? … sequence of observations taken sequentially over time (Box and Jenkins, 1976)

chee825 - Fall 2005J. McLellan3 What is a Time Series? Mathematically, These are observations on {Y(t) }, which is a stochastic process. A given set of observations is sometimes referred to as a realization of a stochastic process: e.g., 1st time we watched: {y 1 (1), y 1 (2),…} 2nd time we watched: {y 2 (1), y 2 (2),…} Generally, no distinction is made, and we write {y(t)}

chee825 - Fall 2005J. McLellan4 What is a Stochastic Process? … a sequence of random variables {Y(t)} As a random variable, Y(t) »has a probability distribution »mean, variance, … associated with each time

chee825 - Fall 2005J. McLellan5 Stochastic Processes Visually, time y realization 1 realization 2 realization 3 initial condition is random, comes from set of possible values for Y (sample space)

chee825 - Fall 2005J. McLellan6 Stochastic Processes Visually, time y realization 1 realization 2 realization 3 if we “cut” (I.e., observe) at time t, the possible values follow a histogram, or probability density function t f Y(t) (y) y

chee825 - Fall 2005J. McLellan7 Describing Stochastic Processes We can consider - »distribution of Y(t) at a given t »joint distribution of several Y(t)’s e.g., joint distribution of Y(t), Y(t+s) Question #1 - does probability of an outcome of Y(t) depend on outcome of Y(t+s)? I.e., are Y(t), Y(t+s) statistically independent? Question #2 - does distribution of Y(t), or joint distribution of {Y(t),Y(t+s)} depend on s? »e.g., Y(1) vs. Y(5) - does the mean change with time, or does the variance change with time?

chee825 - Fall 2005J. McLellan8 The Autocovariance The autocovariance provides a measure of systematic linear “self” relationship. mean of y(t) values k time steps apart covariance Here, we have indicated that the covariance depends on the reference time t at which the covariance is being calculated. We will see that for a certain class of processes, the covariance depends only on the lag “k”.

chee825 - Fall 2005J. McLellan9 Stationarity Analysis is simplified if some, or all, statistical properties are fixed in time: Examples »mean of Y(t) is constant for all time »variance of Y(t) is constant for all time »auto-covariance of Y(t),Y(t+k) depends only on the lag “k” and not on reference time »entire joint distribution for {Y(t), Y(s), Y(p)} is completely independent of time t, and depends only on relative times

chee825 - Fall 2005J. McLellan10 Stationarity Strict Stationarity »joint probability distributions are identical, and depend only on relative times »most demanding and restrictive Weak Stationarity »mean is constant over time »variance is finite, and constant »autocovariance depends only on lag, not on reference time »sometimes referred to as “second order stationarity”, or “wide sense stationarity” »weak stationarity can also be up to order “p” - first “p” moments identical

chee825 - Fall 2005J. McLellan11 Our Situation We will assume “second order stationarity” »constant means »constant variances »autocovariances depend only on lags Autocovariance function becomes

chee825 - Fall 2005J. McLellan12 Autocorrelation Problem - autocovariance has scale »think of units - squared units of output (e.g., kPa 2 ) Solution - normalize by an indication of the range -- the standard deviation because of stationarity

chee825 - Fall 2005J. McLellan13 Autocorrelation & Autocovariance Notes: 1)variance is simply the autocovariance at lag 0 2) autocorrelation and autocovariance are symmetric in lag k

chee825 - Fall 2005J. McLellan14 Autocorrelation & Autocovariance 3) autocorrelation is bounded and normalized: 4) autocorrelation and autocovariance are parameters summarizing the probability behaviour of the stochastic process Y(t) »values must be estimated from data, using appropriate “statistics” - functions of data - to estimate the true values »sample autocorrelation »sample autocovariance

chee825 - Fall 2005J. McLellan15 Autocorrelation & Autocovariance 5) For a Gaussian (i.e., normally distributed) stochastic process, second order stationarity implies strict stationarity »normal distribution is fully determined by mean and variance/covariance structure 6) The autocorrelation can be expressed in terms of the autocovariance:

chee825 - Fall 2005J. McLellan16 Let’s look at some examples… Disturbance Example #1 Suppose Autocorrelations »lag 0 : »lag 1: »lag >1: current output always perfectly correlated with itself

chee825 - Fall 2005J. McLellan17 Disturbance Example #1 Time response note local trends

chee825 - Fall 2005J. McLellan18 Autocorrelation Plot - Disturbance #1 plot of autocorrelation values vs. lags non-zero values to lag 1 »lag 1 moving average disturbance »in short-hand, an MA(1) disturbance `

chee825 - Fall 2005J. McLellan19 Explanation for Autocorrelation Behaviour Autocorrelation arises from systematic relationships - “having something in common” - and y(t+1), y(t) share only one common random shock, a(t) »remaining shocks are statistically independent »since y(t) only depends on one past shock, there are no common shocks for lags > 1 a(t+1)a(t) a(t-1) a(t-2) y(t+1) y(t) y(t-1)

chee825 - Fall 2005J. McLellan20 Disturbance Example #1- Background This disturbance is weakly stationary » “second-order stationary”, or wide sense stationary »sum of stationary stochastic processes »mean is zero »variance is constant These properties are used in the following mathematical reasoning for the autocorrelation function…

chee825 - Fall 2005J. McLellan21 Disturbance Example #1 Mathematical reasoning... 0 0 0 because of independence of random shocks

chee825 - Fall 2005J. McLellan22 Disturbance Example #1 Mathematical reasoning... 0 0 because of independence of random shocks

chee825 - Fall 2005J. McLellan23 Disturbance Example #2 Suppose Autocorrelations »lag 0: »lag 1: »lag 2: »lag k: note dependence on past output

chee825 - Fall 2005J. McLellan24 Disturbance Example #2 Time response note local trends

chee825 - Fall 2005J. McLellan25 Autocorrelation Plot gradual decline to zero autoregressive disturbance »current value of output depends on previous value of output plus current shock »AR(1) disturbance

chee825 - Fall 2005J. McLellan26 Explanation for Autocorrelation Behaviour daisy-chained dependence »y(t+1) depends on y(t) which depends on y(t-1) which depends on...... a(t-2)a(t-1)a(t)a(t+1)... a(t-3)a(t-2)a(t-1)a(t)... a(t-4)a(t-3)a(t-2)a(t-1) y(t+1) y(t) y(t-1)

chee825 - Fall 2005J. McLellan27 Disturbance Example #2 - Background This disturbance is weakly stationary » “second-order stationary”, or wide sense stationary »sum of stationary stochastic processes - here, it is an infinite sum of the random shock a t ’s - think of impulse response representation »AR coefficient is 0.6 --> convergent impulse response »mean is zero »variance is constant These properties are used in the following mathematical reasoning for the autocorrelation function…

chee825 - Fall 2005J. McLellan28 Disturbance Example #2 Mathematical reasoning... 0 0 because of independence of random shocks

chee825 - Fall 2005J. McLellan29 Disturbance Example #2 Mathematical reasoning... 0 because of independence between random shock at time k+1 and output at time k

chee825 - Fall 2005J. McLellan30 Disturbance Example #2 Mathematical reasoning... 0 because of independence between random shock at time k+1 and output at time k

chee825 - Fall 2005J. McLellan31 Time Series Models Recall that we can have moving average and autoregressive components: In transfer function form - moving average component autoregressive component

chee825 - Fall 2005J. McLellan32 Disturbance Examples Disturbance #1 is a moving average disturbance: Disturbance #2 is an autoregressive disturbance:

chee825 - Fall 2005J. McLellan33 Detecting Model Structure From Data … for time series disturbance models Examine autocorrelation plot (“correlogram”) - »if a sharp cut-off at lag k is detected, then the disturbance is a moving average, order k, disturbance »if a gradual decline is observed, then the disturbance contains an autoregressive component long tails indicate either a higher-order autoregressive component, or a pole near 1 »if the autocorrelations alternate in positive and negative values, one or more of the roots is negative

chee825 - Fall 2005J. McLellan34 Estimating Autocovariances from Data Use the sample autocovariance function: where - »N is the number of data points »r y (0) is sample variance of y(t) »r y (k) is computed from data, and is a statistic with a sampling distribution need to consider confidence limits when assessing significance of values

chee825 - Fall 2005J. McLellan35 Estimating Autocorrelations from Data Scale covariance by estimated variance - Notes: »this is a statistic, with a sampling distribution - assess using confidence limits »range of values:

chee825 - Fall 2005J. McLellan36 Confidence Intervals and Autocorrelations Confidence limits for the autocorrelation are derived by examining how variability propagates through calculations »confidence limits are typically generated automatically by identification or time series analysis programs »under assumption of white noise, the standard error is approximately 1/n »95% confidence limits can be approximated by +/-2/n

chee825 - Fall 2005J. McLellan37 Estimated Autocorrelation Plot for Example #1 0123456 -0.2 0 0.2 0.4 0.6 0.8 1 note the sharp cut-off after lag 1 sharp cut-offmoving average disturbance confidence limits for testing for white noise are +/- 0.12 (N=300 data points)

chee825 - Fall 2005J. McLellan38 Estimated Autocorrelation Plot for Example #2 0123456 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 lag auto-correlation gradual decay due to autoregressive component gradual decayautoregressive disturbance confidence limits for testing for white noise are +/- 0.12 (N=300 data points)

chee825 - Fall 2005J. McLellan39 An Additional Tool - the Partial Autocorrelation Goal - identify the explicit dependence on a lagged output in an autoregressive relationship »what is the order of the AR component? Approach - compute autocorrelation between Y t and Y t+k after taking into account the dependence on values t+1, t+2, … t+k-1 i.e.,

chee825 - Fall 2005J. McLellan40 Computing the Partial Autocorrelation Approach 1 - using a property of autoregressive processes For an AR process of order p, the autocorrelations satisfy the following relationship:

chee825 - Fall 2005J. McLellan41 Approach 1 continued… In the partial autocorrelation, we are considering regressing Y t+k on Y t+k-1, …, Y t+1 and examining the correlation with Y t. In other words, we are considering

chee825 - Fall 2005J. McLellan42 Approach 1 continued … For such an AR(k) process: The lag 0, 1, …, k autocorrelations are calculated from the data. We now have a set of k equations in k unknowns -- solve for the phi parameters, and in particular, corrected

chee825 - Fall 2005J. McLellan43 Computing the Partial Autocorrelation Approach 2 - »regress Y t+k on Y t+k-1, …, Y t+1, Y t »the parameter of Y t is the partial autocorrelation coefficient This is a more reliable method for computing the partial autocorrelations - Approach 1 is susceptible to numerical problems if the AR parameters are close to the unit circle.

chee825 - Fall 2005J. McLellan44 Constructing the Partial Autocorrelation Plot compute the partial autocorrelations using approach 1 or 2, for each lag 1, 2, … Interpretation: »for an autoregressive process of order “p”, a sharp cut-off will be observed after lag “p” in which the partial autocorrelations go to zero -> no more explicit dependence beyond lag “p” »the PACF plot for moving average plots will exhibit a decay The autocorrelation and partial autocorrelation behaviours are dual for autoregressive and moving average processes.

chee825 - Fall 2005J. McLellan45 Determining Time Series Model Structure … is an iterative procedure: 1) Determine initial disturbance ARMA structure from auto- and partial autocorrelations 2) Estimate parameters, and assess diagnostics: »residuals - including autocorrelation structure of residuals - should be white noise (no systematic trends) »statistical significance of estimated parameters »percent variation explained, … 3) Adjust model and re-estimate, diagnose, … until an adequate model is obtained

chee825 - Fall 2005J. McLellan46 Autoregressive Component memory or inertia of process »reflects poles of process »dictates dynamic character of process, disturbance appears as gradual decay in »autocorrelation function - disturbance model autocorrelation values can alternate signs »indicates negative dependence on prior values

chee825 - Fall 2005J. McLellan47 Moving Average Component direct dependence on combinations of past inputs »without “integrating” through process MA terms appear as sharp cut-off in »auto-correlation function autoregressive components can be represented as infinite moving averages of inputs »analogy to impulse response

chee825 - Fall 2005J. McLellan48 Selecting Autoregressive(AR) and Moving Average (MA) Orders moving average terms –examine autocorrelation plot –sharp cutoff at lag k ? --> MA(k) –gradual decay? »autoregressive –alternating? »negative coefficient

chee825 - Fall 2005J. McLellan49 Selecting AR and MA Orders autoregressive terms –examine partial autocorrelation plot –sharp cutoff at lag k ? --> AR(k) –decay --> moving average component decay in both autocorrelation and partial autocorrelation plots –model contains both AR and MA terms

chee825 - Fall 2005J. McLellan50 Making the link with the real process... AI unmeasured feed composition disturbance feed disturbance - AR disturbance? - coefficient near 1 indicating slow disturbance coolant disturbance - maybe ARMA? - additional rapid fluctuations from condenser? disturbance in coolant supply T

chee825 - Fall 2005J. McLellan51 Cross-Covariance measure of systematic linear relationship between between output value and input value k time steps away mathematical definition values k time steps apart Note the assumption of second-order stationarity in this def’n

chee825 - Fall 2005J. McLellan52 Calculating the Cross-Covariance from process input-output data N - number of data points covariance -- SCALE DEPENDENT since sample cross-covariance is estimated from data –have variability -> confidence limits

chee825 - Fall 2005J. McLellan53 Removing the Scale Dependence … in the cross-correlation function Normalize by standard deviations of signals standard deviations

chee825 - Fall 2005J. McLellan54 Computing Cross-Correlations Divide by calculated standard deviations

chee825 - Fall 2005J. McLellan55 Process Example #1 Suppose process behaves as »lag 0: »lag 1: »lag >1: »assume input is a white noise sequence white noise

chee825 - Fall 2005J. McLellan56 Process Example #1 Step response

chee825 - Fall 2005J. McLellan57 Cross-Correlation Plot single spike at lag 1 deadtime + gain process »no inertia - no dependence on past y’s

chee825 - Fall 2005J. McLellan58 Process Example #2 suppose that process has inertia cross-correlations »lag 0: »lag 1 »lag2

chee825 - Fall 2005J. McLellan59 Process Example #2 Time response Note that process disturbance passes through inertia of process as well

chee825 - Fall 2005J. McLellan60 Cross-Correlation Plot indicates inertia in process first-order process with 1 delay lag gradual decay occurs because of inertia term (denominator of transfer function)

chee825 - Fall 2005J. McLellan61 Cross-Covariances and Impulse Response given »manipulated variable input is white noise with unit variance »process is operating in open-loop then cross-covariance values represent impulse response weights another perspective - convert process transfer function to impulse response by long division

chee825 - Fall 2005J. McLellan62 Cross-Covariances and Impulse Response In impulse response form, the process is: Cross-covariance when u(t) is white noise: since

chee825 - Fall 2005J. McLellan63 Confidence Intervals and Cross-Correlations sample cross-correlations are computed from data »they have statistical variation arising from variation in data decision-making –need to consider uncertainty by using confidence limits –confidence limits derived by examining how variability propagates through calculations »generated automatically by identification programs

chee825 - Fall 2005J. McLellan64 Confidence Intervals for Cross-Correlation An approximate set of 95% confidence limits for the cross-correlation can be set at where - »n is the number of data points »k is the lag for the cross-correlation Note that these limits increase as the lag becomes larger

chee825 - Fall 2005J. McLellan65 Estimated Cross-correlation Plots for Examples process example #1 - no inertia only significant value at lag 1 -505 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 lag cross-correlation cross-correlations for process example #1

chee825 - Fall 2005J. McLellan66 Estimated Cross-Correlation Plots for Examples process example #2 - with inertia -10-50510 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 lag cross-correlation cross-correlations for process example #2 gradual decay indicates denominator term

chee825 - Fall 2005J. McLellan67 More on Cross-Correlation Plots Process data can contain cross-correlations at negative lags –depends which signal leads the other –indication of “causality” –can be caused by closed-loop operation (u depends on past y because of control action based on error y sp -y)

chee825 - Fall 2005J. McLellan68 What is Non-Stationary Data? Non-stationary disturbances –exhibit meandering or wandering behaviour –mean may appear to be non-zero for periods of time –stochastic analogue of integrating disturbance Non-stationarity is associated with poles on the unit circle in the disturbance transfer function –AR component has one or more roots at 1

chee825 - Fall 2005J. McLellan69 Non-Stationary Data 0100200300 -4 -2 0 2 4 AR parameter of 0.3 output 0100200300 -5 0 5 AR parameter of 0.6 output 0100200300 -10 -5 0 5 10 AR parameter of 0.9 time output 0100200300 -20 -10 0 10 20 Non-stationary time output Note the differences in the vertical scales

chee825 - Fall 2005J. McLellan70 How can you detect non-stationary data? Visual –meandering behaviour Quantitative –slowly decaying autocorrelation behaviour –difference the data –examine autocorrelation, partial autocorrelation functions for differenced data –evidence of MA or AR indicates a non-stationary, or integrated MA or AR disturbance

chee825 - Fall 2005J. McLellan71 Differencing Data … is the procedure of putting the data in “delta form” Start with y(t) and convert to –explicitly accounting for the pole on the unit circle

chee825 - Fall 2005J. McLellan72 Detecting Non-Stationarity -2024681012 0 0.5 1 response Autocorrelation for Non-Stationary Disturbance -2024681012 -0.5 0 0.5 1 time response Autocorrelation for Differenced Disturbance

chee825 - Fall 2005J. McLellan73 Estimating Models for Non-Stationary Data Approaches Estimate the model using the differenced data Explicitly incorporate the pole on the unit circle in the disturbance transfer function

chee825 - Fall 2005J. McLellan74 Impact of Over-Differencing Over-differencing can introduce extra meandering and local trends into data Differencing - “cancels” pole on unit circle Over-differencing - introduces artificial unit pole into data

chee825 - Fall 2005J. McLellan75 Recognizing Over-Differencing Visual –more local trends, meandering in data Quantitative –autocorrelation behaviour decays more slowly than initial undifferenced data

chee825 - Fall 2005J. McLellan76 Summarizing Frequency Behaviour The Spectrum

chee825 - Fall 2005J. McLellan77 Frequency Response Analysis in Process Control One method of analyzing process dynamics is to study behaviour of the process in response to sinusoid inputs of different frequencies –“frequency response” - summarized using Bode plots –fast processes - allow high frequencies to pass –slow processes - absorb high frequencies »e.g., use of surge drum to damp out feed fluctuations

chee825 - Fall 2005J. McLellan78 Periodic Components variability may have periodic components »fluctuations from rotating equipment »night/day fluctuations »over-tuned controllers frequencies at which fluctuations occur indicates type of process »e.g., variability at low frequencies indicates slow moving, or static process frequency information can be investigated by decomposing covariance with respect to frequency the frequency framework is a way of summarizing whether signal has rapid or slow components

chee825 - Fall 2005J. McLellan79 The Spectrum defined by decomposing covariance function by frequency –autocovariance --> (auto) spectrum –cross-covariance --> cross-spectrum decomposition is accomplished by taking discrete Fourier transform of covariance function summarize energy distribution graphically

chee825 - Fall 2005J. McLellan80 The Spectrum example - AR disturbance 10 -2 10 10 0 1 10 0 1 frequency - rad/s spectral density more energy at lower frequencies

chee825 - Fall 2005J. McLellan81 The Spectrum comparison of MA, AR disturbance spectra 10 -2 10 10 0 1 -2 10 10 0 1 frequency - rad/s spectral density MA disturbance - contains higher frequencies AR disturbance contains lower frequencies - inertia damps out higher frequencies

chee825 - Fall 2005J. McLellan82 Interpreting the Spectrum Look for - peaks –resonance in control systems –rotating equipment –“seasonal” fluctuations - shift, or ambient T energy distribution –low frequency - autoregressive components with poles near unit circle - slow moving processes

chee825 - Fall 2005J. McLellan83 The Periodogram also represents frequency content in data direct decomposition of data into frequencies »determined as Discrete Fourier Transform of data without calculating covariance function first provides similar information »peaks »energy distribution

chee825 - Fall 2005J. McLellan84 How is variance related to the spectrum? spectrum - indicates how variability is distributed by frequency variance - measure of total variability over all frequencies INTEGRAL OF THE SPECTRUM OVER ALL FREQUENCIES EQUALS THE PROCESS VARIANCE

chee825 - Fall 2005J. McLellan85 How is variance related to the spectrum? 10 -2 10 10 0 1 10 0 1 frequency - rad/s spectral density 10 -2 10 10 0 1 10 0 1 frequency - rad/s spectral density variability at a specific frequency total variance

Making Predictions Using Time Series and Process Models “Forecasting”

chee825 - Fall 2005J. McLellan87 Example Suppose we have an ARX(1) process and disturbance model: We can use the model to predict one step ahead given present and past values -- {y(t), u(t)} (note - e(t) denotes the random shock sequence here) What is the optimal predictor? »Optimal - in a least squares sense - minimize prediction error variance »the unknown quantity in this instance is e(t+1), which occurs in the future »on average, this shock will not contribute anything to the prediction since it has zero mean --> optimal predictor is the conditional expectation of y(t+1) given information up to an including time t

chee825 - Fall 2005J. McLellan88 Conditional Expectation Recall conditional probability: »probability of X given Y, where X and Y are events For continuous random variables, we have a conditional probability density function expressed in terms of the joint and marginal distribution functions: Using this, we can define the conditional expectation of X given Y:

chee825 - Fall 2005J. McLellan89 Conditional Expectation of 1-step ahead prediction We can think of our prediction model as: Now, our conditional expectation is for e(t+1) given {e(t), e(t-1), e(t-2), …} - I’m assuming u(t) isn’t random here. Now, y(t) using independence of random shocks here

chee825 - Fall 2005J. McLellan90 Conditional Expectation of 1-step ahead prediction Thus, »in other words, the future shock makes no contribution on average to the 1-step ahead prediction because it has zero mean

chee825 - Fall 2005J. McLellan91 Conditional Expectations - Manipulations In general, it isn’t necessary to apply the detailed definition of the conditional expectation. Instead, it suffices to recognize which quantities are fixed relative to the expected value, and which aren’t: For the conditional expectation given information up to and including time t, quantities such as y(t), u(t) are KNOWN, i.e., non-random since they are given

chee825 - Fall 2005J. McLellan92 k-step ahead predictions - general procedure To obtain a k-step ahead prediction from a process + disturbance model, –use the difference equation model to define y(t+k) in terms of information at t+k, t+k-1, t+k-2, …, t, t-1, … –take conditional expectation given information to time t: E{ |t} Example - 2 step ahead forecast for ARX(1) model: We will use this procedure as a basis for prediction error estimation methods for determining parameter values.

chee825 - Fall 2005J. McLellan93 Forecast error for k-step ahead predictions The forecast error is Given the way the conditional estimate is obtained, the error is a moving average in the future shocks, i.e., the shock terms that were deleted because they have zero mean. Example - 1-step ahead prediction for ARX(1) model

chee825 - Fall 2005J. McLellan94 Forecast error for k-step ahead predictions Example - 2 step ahead prediction for ARX(1) model

chee825 - Fall 2005J. McLellan95 Final Comments Forecasting and forecast errors are important for »forecasting! »setting up prediction error parameter estimation methods »controller performance assessment minimum variance control eliminates the forecasted disturbance, leaving only the prediction error minimum achievable variance is the variance of the moving average in shocks over the deadtime horizon

Module 3: Introduction to Time Series Methods and Models.

Similar presentations

Presentation on theme: "Module 3: Introduction to Time Series Methods and Models."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Module 3: Introduction to Time Series Methods and Models.

Similar presentations

Presentation on theme: "Module 3: Introduction to Time Series Methods and Models."— Presentation transcript:

Similar presentations

About project

Feedback