Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Analysis  What is Time Series Analysis? The analysis of data organized across units of time.  Time series is a basic research design Data.

Similar presentations


Presentation on theme: "Time Series Analysis  What is Time Series Analysis? The analysis of data organized across units of time.  Time series is a basic research design Data."— Presentation transcript:

1

2 Time Series Analysis  What is Time Series Analysis? The analysis of data organized across units of time.  Time series is a basic research design Data for one or more variables is collected for many observations at different time periods Usually regularly spaced May be either univariate - one variable description multivariate - causal explanation

3 Time Series vs. Cross Sectional Designs  It is usually contrasted to cross-sectional designs where the data is organized across a number of similar units The data is collected at the same time for every observation Thus: A data set consisting of 50 states for the year 1998 is a cross-sectional design. A data set consisting of data for Alabama for 1948 – 1998 is a time series design.

4 Why time series or cross sections?  Depends on your question  If you wish to explain why one state is different from another, use a cross- sectional design  If you wish to explain why a particular state has changed over time, use a time series design

5 Time-Series Cross-Sectional Designs  There are techniques for combining the two designs.  Due to concerns for autocorrelation, and estimation, we will examine this design later in the course

6 Conceptual reasons to consider time series models  Classic regression models assume that all causation is instantaneous. This is clearly suspect.  In addition, behaviors are dynamic - they evolve over time.

7 What is Time anyway?  Time may be a surrogate measure for other processes i.e. maturation, aging, growth, inflation, etc.  Many of the processes we are interested in are described in terms of their temporal behavior Policy impact Arms races, growth and decay models, compound interest or inflation learning

8 Why time series?  My personal view is that Time Series models are theoretically fundamentally more important than cross-sectional models.  The models that we are really interested in are those that help us model how systems change across time - vis a vis what they look like at any given snapshot in time.  Statistical tools may often improve their degrees of freedom by using time series methods. (Sometimes this means larger n)

9 The Nature of Time Series Problems  Please note: Time series problems are theoretical one - they are not simply statistical artifacts. When you have a time series problem, it means some non- random process out there has not been accounted for.  And since there is usually something left out or not measurable, you usually have a time series problem!

10 A Basic Vocabulary for Time Series  Period  Cycle  Season  Stationarity  Trend  Drift

11 Periodicity  The Period A Time Series design is simple to distinguish because of its period. The data set is comprised of measures taken at differing points in time. The unit of the analysis is the period. (i.e. daily, weekly, monthly, quarterly, annual, etc.) Note that the period defines the discrete time interval over which the data measurement are taken.

12 Cycle Uses classic trigonometric functions such as the sine and cosine functions to examine periodicity in the data. This is the basis for Fourier Series and Spectral Analysis. Used primarily in economics where they have data series measured over a long period of time with multiple regularly occurring and overlapping cycles. (Rarely used in Political Science, but try the commodity markets, with hog/beef/chicken cycles)

13 Cycles (cont.)  A simple cyclic or trigonometric function might look like this:

14 Cycles (cont.)  You could estimate a model like But why would you?  What theory do you have that suggests that political data follow such trigonometric periodicity? Are wars cyclic? Sunspots? Would elections be cyclic?

15 Seasonality  Season Sometimes, when a relationship or a data series has variation related to the unit of time, we often refer to this as seasonality. (e.g. Christmas sales, January tax revenues.) This most often occurs when we have discrete data. Seasonality is thus the discrete data equivalent of the continuous data assumed by spectral analysis

16 Example of Seasonality

17 A closer look

18 Regression with Seasonal Effects  Estimating a regression model with seasonal behavior in the dependent variable is relatively easy:  Where S 1 is a seasonal dummy. S 1 is coded 1 when the observation occurs during that season, and 0 otherwise.)

19 Estimating Seasonality  Like all dummy variable models, at least one Season (Category) must be excluded from the estimation  The intercept represents the mean of the excluded season(s). Failure to exclude one of the seasonal dummies will result in: A seasonal variable being dropped, or biased estimation at best and in all likelihood error messages about singular matrices or extreme multicolinearity.  The slope coefficients represent the change from the intercept.  t-tests are tests of whether the seasons are different from the intercept, not just different from 0.

20 Estimating Seasonality  Estimating regression models with seasonality is a popular and valuable method in many circumstances. (i.e. estimating tax revenues)

21 For Example

22 Stationarity  If a time series is stationary it means that the data fluctuates about a mean or constant level.  Many time series models assume equilibrium processes.

23 Non-Stationarity  Non-stationary data does not fluctuate about a mean. It may trend away or drift away from the mean

24 Example of Non-Stationarity

25 Trend  Trend indicates that the data increases or decreases regularly.

26 Drift  Drift means that the series ‘drifts’ away from its mean, but then drifts back at some later point.

27 Variance Stationarity  Variance Stationarity means that the variation in the data about the mean (or trend line) is constant across time.  Non-stationary variance would have higher variation at one end of the series or the other.

28 For instance Oxygen 18 isotope levels in Benthic Foraminifera

29 A Basic Vocabulary for Time Series  Random Process/ Stochastic Process The data is completely random. It has no temporal regularity at all.

30 A Basic Vocabulary for Time Series  Trend Means that the data increases or decreases over time. Simplest form of time series analysis Uses a variable as a counter {X i = 1, 2, 3,.. n} and regresses the variable of interest on the counter. This gives an estimate of the periodic increase/decrease in the variable (i.e. the monthly increase in GDP) Problems occur for several reasons: The first and last observations are the most statistically influential. Very susceptible to problems of autocorrelation

31 Random- walk  If the data is generated by We call it a random-walk.  If B is equal to 0.0, the data is a pure random walk.  If B is non-zero, then the series drifts away from the mean for periods of time, but may return (hence often called drift, or drift non- stationarity).

32 Implications of a random walk  Random walks imply that memory is infinite.  Stocks are often said to follow random walks  And if so, they are largely unpredicatable!

33 Unit Root tests  A number of tests have emerged to test whether a data series is a Trend Stationary Process (TSP) or a Difference Stationary Process (DSP). Among them, the Dickey- Fuller test. (More on this in a few weeks)  The current literature seems to suggest that regression on differences is safer than regression on levels, due to the implications of TSP and DSP. We will return to this later.

34 More Unit Roots  Defining unit roots as  We can see that unit roots, a random walk, nonstationarity, and a stochastic trend can all be treated as the same thing. We can also see that if we difference a random walk, the resulting data is stationary.

35 Autocorrelated error  Also known as serial correlation  Detected via: The Durbin Watson statistic The Ljung-Box Q statistic ( a  2 statistic) Note that Maddala suggests that this statistic is inappropriate. Probably not too bad in small sample, low order processes. Q does not have as much power as LM test. The Portmanteau test Lagrangian Multiplier test

36 Autocorrelated error (Cont.)  If autocorrelation is present, then the standard errors are underestimated, often by quite a bit, especially if there is a trend present.  Test for AC, and if present, use the Cochran-Orcutt method the Hildreth-Lu method Durbin’s method Method of first differences Feasible Generalized Least squares Prais-Winsten Estimator Others!

37 Certain models are quite prone to autocorrelation problems  Distributed Lags The effect of X on Y occurs over a longer period, There are a number of Distributed Lag Models Finite distributed lags Polynomial lags Geometric Lags Almon lag Infinite Distributed Lags Koyck scheme

38 Lagged Endogenous variables  In addition, there are models which describe behavior as a function of both independent influences as well as the previous level of Y.  These models are often quite difficult to deal with.  The Durbin-Watson D is ineffective - use Durbin’s h

39 Some common models with lagged endogenous variables  Naive expectations  The Adaptive Expectations model  The Partial Adjustment model  Rational Expectations

40 Remedies for autocorrelation with lagged endogenous variables.  The 2SLSIV solution regress Y t on all X t ’s, and X t-1 's. Then take the Y-hats and use as an instrument for Lagged Y’s in the original model. The Y-hats are guaranteed to have the autocorrelated component theoretically purged from the data series. Have fun!

41 Non-linear estimation  Not all models are linear.  Models such as exponential growth are relatively tractable.  They can be estimated with OLS with the appropriate transformation  But a model like is somewhat more difficult to deal with.

42 Nonlinear Estimation (cont.)  The (1-c) parameter may be estimated as a B, but the t-test will not tell us if c is different from 0.0, but rather whether 1-c is different from 0.0. Thus the greater the rate of decay, the worse the test. Hence we wish to estimate the equation in its intractable form. There may be analytic solutions or derivatives that may be employed, but conceptually the grid search will suffice for us to see how non-linear estimation works.

43 Intractable Non-linearity  Occasionally we have models that we cannot transform to linear ones.  For instance a logit model Or an equilibrium system model

44 Intractable Non-linearity  Models such as these must be estimated by other means.  We do, however, keep the criteria of minimizing the squared error as our means of determining the best model

45 Estimating Non-linear models  All methods of non-linear estimation require an iterative search for the best fitting parameter values.  They differ in how they modify and search for those values that minimize the SSE.

46 Methods of Non-linear Estimation  There are several methods of selecting parameters Grid search Steepest descent Marquardt’s algorithm

47 Grid search estimation  In a grid search estimation, we simply try out a set of parameters across a set of ranges and calculate the SSE.  We then ascertain where in the range (or at which end) the SSE was at a minimum.  We then repeat with either extending the range, or reducing the range and searching with smaller grid around the estimated SSE  Try the spreadsheetspreadsheet  Try this for homework!homework

48 Mathematical Operators  Today is a special day You have very few of them in life like today (Although two weeks ago was special in the same way, and in 5 or 6 weeks there will be another like it.  You get to learn a new mathematical operator!

49 A List of Common Operators  These are the ones you know: + Addition -Subtraction x or * Multiplication / or ÷Division X n exponentiation √root !factorial Σsummation | x |absolute value

50 And some you may or may not know! ΔΔ ∂∂ ∫∫  Plus a number of relational operators and symbols ±,, ≤, ≥, =, ≠, ≅ ∞, e, π  So it’s time for a new one!

51 The Backshift Operator  The backshift operator B refers to the previous value of a data series.  Thus  Note that this can extend over longer lags.

52 UNIVARIATE TIME SERIES  Autoregressive Processes  A simple Autoregressive model This is an AR(1) process. The level of a variable at time t is some proportion of its previous level at t-1. This is called exponential decay (if  is less than unity - 1.0)

53 Autoregressive Processes  An autoregressive process is one in which the current value is a function of its previous value, plus some additional random error.  1st order autocorrelation in the residuals in regression analysis is the most frequently discussed example. Keep in mind that with serial correlation in regression analysis we are talking about the residuals, not a variable. An autoregressive process may be observed in the X’s, the Y or the residuals

54 Higher Order AR Processes Autoregressive Processes of higher order do exist: AR(2),... AR(p) In general be suspicious of anything higher than a 3rd order process:  Why should life be so abstractly complex? Autoregressive Processes  The general form of the AR(p) process using Backshift notation is:

55 Moving average processes  Moving averages depend not on the level of the last time point, but rather on the last time point’s error.  Thus an MA(1) is represented by

56 The General MA(q) Process  The general MA(q) model is:  Again, higher order processes do exist MA(2),... MA(q).  As with AR(p) processes, be suspicious of anything higher than a 3rd order process. Again, why should life be so abstractly complex?  The general form using Backshift notation is:

57 The General MA(q) Process  The general MA(q) model is:  Again, higher order processes do exist MA(2),... MA(q).  As with AR(p) processes, be suspicious of anything higher than a 3rd order process. Again, why should life be so abstractly complex?  The general form using Backshift notation is:

58 Mixed Processes  You can have both going on at the same time.  Again, question the use of increasing statistical model complexity without some theoretical appeal.

59 ARIMA Models  Hence we have the following basic or frequently encountered models ARIMA(0,0,0) ARIMA(0,1,0) ARIMA(1,0,0) ARIMA(1,1,0) ARIMA(2,0,0) ARIMA(2,1,0) ARIMA(0,0,1) ARIMA(0,1,1) ARIMA(0,0,2) ARIMA(0,1,2) ARIMA(1,0,1) ARIMA(1,1,1) ARIMA(2,0,2) ARIMA(2,1,2) ARIMA(p,d,q)

60 Seasonality  In some types of data there is a seasonal regularity. In regression we used seasonal dummies. In ARIMA, we use seasonal differencing. Hence a stationary series of monthly observations might require seasonal differencing. Thus the Mona Loa Co2 data might be an ARIMA(p,d,q)(P,D,Q) I would guess a (1,1,0)(0,12,0) model

61 Fitting Box-Jenkins Models  There is a three step process to fitting a Box- Jenkins ARIMA Model. Identification Estimation Diagnosis  Here is a Flowchart Here is a Flowchart

62

63 The Autocorrelation function  We identify the nature of these processes by looking at the Autocorrelation Function (ACF), and the Partial autocorrelation function (PACF).  These are essentially graphs of simple Pearson’s r’s calculated by correlating the variable with its lag at varying intervals.  Plots of these ACFs and PACFs reveal certain characteristic patterns for certain processes.

64 Identification  Visually inspect for stationarity Difference the data if trend or drift is present. Take logs if differenced data appears to have variance non-stationarity  Examine autocorrelations and partial autocorrelations.  Select a trial Noise model.

65 R Code for ARIMA  ts.sim <- arima.sim(list( ar = 0.7), n = 2000) (Simulate an ARIMA(1,0,0)  ts.sim <- arima.sim(list( ma = 0.7), n = 200)(Simulate an ARIMA(0,0,1)  ts.plot(ts.sim)(Plot the simulated data)  acf(ts.sim)(Plot the autocorrelations)  pacf(ts.sim)(Plot the partial autocorrelations)  arima(ts.sim, order=c(1,0,0))(Estimate an ARIMA(1,0,0))  ts.def<-arima(ts.sim, order=c(1,1,0))(save model estimates)  r0<-residuals(ts.def)(extract the residuals)  acf(r0)examine residual acf  pacf(r0)examine residual pacf  Box.test(r0, lag = 12, type = c("Ljung-Box"), fitdf = 0) (test residuals)  install.packages("TSA") Gets adv time series library “TSA”  library(TSA) (Loads library for Time series)

66 Autocorrelation patterns AR(1) ϕ =.7

67 Autocorrelation patterns MA(1) θ=-.7

68 Autocorrelation patterns MA(1) θ=.7 * * Note alternating pattern in PACF

69 Autocorrelation patterns AR(2) ϕ 1 =.7, ϕ 2 =.-.3

70 Autocorrelation patterns MA(2) θ=-.7, θ=.-.3

71 Estimation  Fit the trial noise model with the estimation routine.  Ensure that the parameters are significant.

72 Diagnosis  Ensure that the residuals are a white noise process via the  2 test. (Note Maddalla’s objection to this test - but a significant  2 test can still be accepted as non- random residuals)  Where two models appear comparable, choose the one with the lower rmse (root mean squared error.)  If the noise model is not random, re-specify and estimate again.

73 The Full ARIMA Model Specification  The full model appears complex…  …And it is!And it is!

74 Intervention analysis  In many types of models we are interested in the impact of a policy upon some dependent variable.  The policy might be any number of things, many of which do not lead themselves to easy measurement. The Clean Air Act The Wage and Price controls of the Nixon administration The Arab Oil Embargo The 3 Strikes and you’re out law The 55 MPH speed limit. Moratorium on Death Penalty Row v. Wade

75 Time Series Design  All of these are examples of policy impact assessment. They are simple interrupted time series design.  In Campbell and Stanley, this is O O O O O O X O O O O O O  Note that this design is quite subject to the accident of history

76 How to Measure a Policy  There is are two crucial measurement issues here (1) When did the policy change (2) Was the change permanent or temporary (step) (pulse)

77 General Impact Assessment Models These Models use ARIMA models as a noise component. The noise component is simply the temporal regularity remaining in the output series Y t after the impact of the Intervention (I t ) has been captured. There are two basic types Pulse Step The General form of the model is

78 Step Function  A simple step function represents a change in equilibrium.  Some times referred to as a mean shift model.

79 Asymptotic Change Model  Not all impacts are instantaneous  Some events take time to run their full course  Thus we would model such an event as:

80 Ramp Model

81 The Pulse  The Simple Pulse model describes temporary change.

82 Impulse Decay

83 Equilibrium Shift Model

84 For example  I have used Intervention models to: Estimate the impact of Advanced Waste Treatment on Water Quality The impact of the Arab Oil embargo on US Foreign Policy towards Arab Nations and Israel The impact of Oil Shocks on Low sulfur Residual Fuel Oil spot market prices  From the Literature Rick Waterman B. Dan Wood Chubb & Moe

85 Transfer functions  Full multivariate Time series – the Box- Jenkins tradition – is called transfer function analysis.  The temporal dynamics of one process are transferred to another.

86 Modeling the impact  If X does indeed cause Y, then the ARMA process inherent in X will also be reflected in Y.  In order to see the impact, we must remove (pre-whiten) the ARMA process in X from Y.  Then we need to model the remaining impact and noise model left in Y

87 Cross-correlations  To do this, we look at the cross-correlations between X and Y.  This also lets us assess the direction of causation.  See Peace Project in the Middle East for a really strange example of this.Peace Project in the Middle East

88 Class Exercise  Using the US Budget Data setUS Budget Data (http://www.polsci.wvu.edu/duval/ps791c/Notes/Stata/outlays-2002.dta) 1.Examine the spending data. 2.Select a sector of the budget and identify it’s ARIMA process 3.Calculate a ratio to the deficit and estimate its ARIMA process. 4.Lastly, specify an Intervention (i.e Presidential Administration) and add that to the model to test for a step function.


Download ppt "Time Series Analysis  What is Time Series Analysis? The analysis of data organized across units of time.  Time series is a basic research design Data."

Similar presentations


Ads by Google