Thursday, March 31, 2011

CB features: ARIMA Modeling in CB Predictor

As mentioned in the announcement post for the new release, it is now possible to model time series using univariate ARIMA (Auto Regressive Integrated Moving Average) models. We have the capability of modeling both seasonal and non-seasonal data with these models. ARIMA modeling is an advanced, powerful and flexible modeling technique for time series with complex patterns. Popularized by Box and Jenkins in the ‘70s, they are now used extensively to model different types of time series.

Automatic and Custom ARIMA Modeling
The basic ARIMA modeling technique is a subjective and iterative technique consisting of the following steps:
  • Model identification and selection
  • Estimation of autoregressive (AR), integration or differencing (I), and moving average (MA) parameters
  • Model checking
One can effectively perform all these tasks using the Crystal Ball Predictor (henceforth, CB Predictor) implementation of ARIMA. But these steps are time-consuming, require in-depth knowledge of the technique, and the result might still vary when the same steps are performed by two experts. In CB Predictor, we have also added the capability of automatically estimating the best ARIMA model for a given series. In the default settings, CB Predictor automatically calculates an ARIMA model (either seasonal or non-seasonal, depending on the seasonality of the series), and adds this model in the list of results in the result window. There, the ARIMA results can be compared with the results from other classic methods.

If on the other hand, an user wants to explore ARIMA modeling herself, we have the ability of fitting custom ARIMA models to the time series in CB Predictor. In this mode, multiple ARIMA models, both non-seasonal and seasonal, can be fitted to the data, and the result of each of the model is added to the result window for comparison. Note: seasonal models can not be fitted to non-seasonal data.

GUI Interface
Figure 1: ARIMA UI
In keeping with our tradition of developing user-friendly interface for quick analysis, the new ARIMA modeling interface is both easy-to-use and feature rich, with advanced settings just a click away. The interface is shown in figure 1.

Figure 2: Advanced ARIMA Options
As mentioned above, one can choose either the automatic ARIMA modeling or use custom models. In automatic ARIMA, based on the characteristics of the data, we evaluate a number of models for each series, and choose the one that minimizes the selected information criteria. More on this later. The advanced options are available by clicking on the ‘ARIMA Options...’ button. The dialog is shown below in figure 2.

Algorithmic details
We have used quite a few recent references available in the literature to implement the ARIMA modelling technique. Other than the classic text of Box and jenkins ([1]), the references include the authoritative text from Hamilton ([2]) and Wei ([3]) among others. For a full list of references, please check the user manual.

In univariate ARIMA modeling, current values of a data series are correlated with past values in the same series to produce the AR component, also known as p. Current values of a random error term are correlated with past values to produce the MA component, q. Mean and variance values of current and past data are assumed to be stationary, unchanged over time. If necessary, an I component (symbolized by d) is added to correct for a lack of mean stationarity through differencing. The variance non-stationarity can be addressed by performing Box-Cox transformation before using ARIMA model. In the automatic mode, we detect the mean non-stationarity using KPSS test for non-seasonal data and Canova-Hansen test for seasonal data. If a series is detected as non-stationary, it is differenced till the stationarity tests pass. We can also automatically detect variance non-stationarity and apply Box-Cox transformation using specific lambda value to render the series stationary.

For estimating parameters of a given ARIMA model, we use a combination of the maximization of conditional loglikelihood function and exact loglikelihood function. Truth be told, the computations involved are rather hairy, and not for the faint of hearts.

Check out the new ARIMA modeling capabilities in CB Predictor and let us know how it looks.

[1] Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: John Wiley & Sons. 2008.
[2] Hamilton, J. D. Time Series Analysis. Princeton, NJ: Princeton University Press. 1st ed. 1994.
[3] Wei, W. W. S. Time Series Analysis: Univariate and Multivariate methods. 2nd ed. New York: Pearson, 2006.

Update (4/5/2011): Formatting.