Presentation is loading. Please wait.

Presentation is loading. Please wait.

John G. Zhang, Ph.D. Harper College

Similar presentations


Presentation on theme: "John G. Zhang, Ph.D. Harper College"— Presentation transcript:

1 John G. Zhang, Ph.D. Harper College jzhang@harpercollege.edu
Looking Ahead of the Curve: an ARIMA Modeling Approach to Enrollment Forecasting John G. Zhang, Ph.D. Harper College

2 Topics Why forecast How to forecast Why ARIMA What is ARIMA
How to ARIMA How ARIMA did Discussion 47th AIR Annual Forum

3 Why Forecast Queries and Reports: what was Dashboard: what is
Forecasts: what will be Forecast for enrollment: more valuable for resources planning 47th AIR Annual Forum

4 How to forecast Naïve forecast: random walk, moving average
Exponential smoothing Markov chain Regression ARIMA Others Combining methods 47th AIR Annual Forum

5 Why ARIMA Naïve forecast: best guess if no patterns
Exponential Smoothing: usually designed for one-step ahead forecast Markov chain: see reference Regression: frequently violates the assumption of uncorrelated errors ARIMA: worked well, more later Others: see reference Combining Methods: non-directional 47th AIR Annual Forum

6 What is ARIMA AutoRegressive Integrated Moving Average
Generally, the model is given by 47th AIR Annual Forum

7 where Xt is a time series value at time t, 0 is a constant,
B is a backshift or lag operator, i is a number of lags or spans,  is an error term at time t,  and θ are AR and MA parameters, and p, d, and q are the orders of AR, I, MA 47th AIR Annual Forum

8 If p = 1, 1 = 1, d = 0, θ1= 0, random walk: (1 - B)(Xt – θ0) = t
if p = 1, d = 0, q = 1, ARMA(1, 1): (1 - 1B)(Xt – θ0) = (1 - θ1B) t If p = 1, d = 0, θ1 = 0, AR(1) model: (1 - 1B)(Xt – θ0) = t If p = 1, 1 = 1, d = 0, θ1= 0, random walk: (1 - B)(Xt – θ0) = t If 1 = 0, d = 0, θ1 = 0, constant: (Xt – θ0) = t 47th AIR Annual Forum

9 How to ARIMA Box and Jenkins (1976) notation: (p d q)(p d q)s
Four stages: Identification Estimation Validation Forecasting 47th AIR Annual Forum

10 How to ARIMA SPSS Trends module: version 12 worked well
version 13 and 14: algorithms changed same data, same program, different forecast SAS ETS module: ARIMA procedure more flexible forecast consistant automation possible thanks to macros 47th AIR Annual Forum

11 Identification Series Plot Autocorrelation plot
Dickey-Fuller test of unit root hypothesis AR models to compare the log likelihood values for a series and its transformed series 47th AIR Annual Forum

12 Identification Degree of differencing Order of AR Order of MA
Seasonality if any 47th AIR Annual Forum

13 Estimation Q statistics Goodness-of-fit criteria: variance estimate
Akaike information criterion Schwartz Bayesian criterion Significance of parameters Residuals analysis Mean Absolute Percent Error 47th AIR Annual Forum

14 Data Time series data Date variable: year, quarter, month, week, day, hour, minute, second Enrollment data: FTE, headcount, seatcount Data points Nature of the series determines the forecast 47th AIR Annual Forum

15 Patterns of Data Trend: steady increase or decrease in the values of a times series Cycle: long-term patterns of rising and falling data Seasonality: regular change in the data values that occurs at the same time in a given period 47th AIR Annual Forum

16 FTE 47th AIR Annual Forum

17 FTE Pattern Trendy: FTE increasing from 1998 to 2006, suggesting non-stationary and differencing necessary Seasonal: higher in the Fall and Spring and lower in the Summer each and every year, implying a seasonal factor present as part of the model building process 47th AIR Annual Forum

18 Autocorrelations and Partial Autocorrelations (ACF and PACF)
Lag Correlation | |********************| | |************* | | |****** | | *| | | ********| | | *********| | | *********| | | ********| | | ********| | | *| | | |***** | | |*********** | | |***************** | | |*********** | | |***** | | *| | | *******| | | ********| | | ********| | | *******| | | *******| | | *| | | |**** | | |********* | PACF Lag Correlation | |************* | | ****| | | ******| | | *******| | | |**** | | *****| | | ******| | | *********| | | |*************** | | |* | | |***** | | |**** | | *****| | | |*** | | |** | | |*** | | ****| | | |*** | | *| | | |** | | ****| | | |** | | *| | 47th AIR Annual Forum

19 Q Statistics Autocorrelation Check of Residuals To Chi Pr > Lag Square DF ChiSq Autocorrelations < < < < Q Statistics show autocorrelations among various lags highly statistically significant Autocorrelations were very high Further actions needed 47th AIR Annual Forum

20 FTE Forecast 47th AIR Annual Forum

21 How ARIMA Did Accuracy: what matters most
2-period ahead: 0.74% (FTE) 0.50% (HC) 6-period ahead: 1.43% (FTE) 1.65% (HC) 10-period ahead: 1.40% (FTE) 2.52%(HC) Forecast error bigger into distant future Eleanor S. Fox (2005) 1.2% (4) 4.1% (8) NCES (2003) 1.9% (2) 3.6% (6) 47th AIR Annual Forum

22 Discussion Theoretically factors includable along with the time series itself like in regression Unemployment rate Consumer Price Index (CPI) High school student population District population Tuition Forecasts used for forecasting? 47th AIR Annual Forum

23 Discussion Stationarity and homogeneity Scarcity and spuriousness
Seasonality and outliers Raw or cooked data Data mining and stepwise Fit and accuracy Additive or multiplicative (subset/factored) 47th AIR Annual Forum

24 Discussion Science and art Objective and Subjective
Quantitative and qualitative Over-differencing and over-fitting Parsimony and uncertainty Simple or complex 47th AIR Annual Forum


Download ppt "John G. Zhang, Ph.D. Harper College"

Similar presentations


Ads by Google