Thursday, October 27, 2011

Event modeling in depth in CB Predictor (Part 1)

Event modeling is a new feature in the most recent release of Crystal Ball (11.1.2.1). In the last post, we talked about the basics of event modeling using CB Predictor, and introduced the UI for this feature. In this and the following post, we will dive deeper and discuss in greater detail about certain aspects of event modeling in CB Predictor. In this post, we will touch upon a few limitations of the CB event modeling feature. We will conclude with a discussion about the relationship between data screening and event modeling.

Limitations of CB Predictor event modeling
A few limitations of CB Predictor which you may or may not realize while modeling events are:
  • This is mentioned in the last post: we need to have at least one instance of the event happening in the historical period in order to use that event in the future.
    • Workaround: If you think you have an event for which you know, or have already calculated, the effect on the time series, you can directly add the effect of that event in the future periods. Unfortunately, this has to be a manual process, CB Predictor cannot be used.
  • CB Predictor does not allow defining overlapping events in the historical period or in the future. That is, there cannot be any common day between two events. It is difficult to calculate the effect of each event when two events overlap, hence the limitation.
    • Workaround: Similar to previous point, you have to estimate and remove the effect of one of the event from the time series. For example, if events A and B both happen on a specific day, and you can estimate the effect of either event A or B, then you can let Predictor handle only A (or B), by subtracting the effect of B (or A) from the series, and then adding back the effect of B (or A) after the forecasting is done.
  • CB Predictor issues an warning if more than 10% of the historical values are defined as events. Since CB Predictor calculates the effect of an event by using the same algorithms as imputing missing values, too many events defined could affect the predictive accuracy.
  • If you are using multiple linear regression in CB Predictor and an event is defined on all the series, then, as you would expect, the effect of these events are not explicitly added to the dependent variable(s). Rather, the effects of the events are expected to be included through the regression equation. On the other hand, when an event is defined for a few of the series designated as independent variables, then we should explicitly account for events defined only on the dependent variable. Currently that is not done. Please let us know if you face this situation often, and what you would think about the suggested approach.
Event modeling and data screening
Events modeling and data screening are both available in the ‘Data preparation’ phase in CB Predictor, and are available in the ‘Data Attributes’ panel. We use similar approach to both, once an outlier has been identified, it is treated as a missing value and the approximate value is then imputed using one of the available algorithms. Here are some notes which would be helpful if you are planning to use both features together:
  • Purpose of both the features are same, to find out and explain outliers.
  • One would use data screening to detect outliers which are unusual values without a known cause. On the other hand, one would use the events feature of Predictor to define identifiable occurrences that have affected historical data and could affect predicted data.
  • CB Predictor takes into account events first before screening data for outliers. That means, if you define an event, the effect of that event is first factored out, the time-series models are then run and the outliers detected as per the user-set rules.
  • In general, that would mean that, if an event has been modeled correctly, an outlier would not appear on the same date(s) as that of an event. If you notice otherwise, please let us know.

Summary
Event modeling is a powerful new tool in CB Predictor, and some extra care is needed when using this feature.

Update (10/31/2011): Miscellaneous corrections.

Saturday, October 1, 2011

Presentation on introductory OR Problems in Analytics

I was recently invited by Prof. Sriram Sankaranarayanan of Computer Science in University of Colorado at Boulder for a guest lecture in his linear programming class. Sriram is my old friend from undergraduate years in IIT Kharagpur, and has a stellar academic record over there (as well as in his graduate years).

Anyways, since this was going to be an one-off presentation, I decided to touch on a few different things in the presentation. I started off with a few examples of optimization that we face in our daily life and do not often realize. Examples include route planning, some version of traveling salesman problem, revenue optimization etc. I followed these examples up with a simple example of modeling an optimization problem where the constraint needs to be figured out from given and somewhat unstructured data - an introduction to a so-called Analytics problem. After a few follow-up slides on the different type of computational problems in Analytics, I wrapped up with career options in OR/Analytics and a few online resources for optimization.

If you are interested in the presentation material, check it out.

Update (10/3/2011): Updated title.

CB 11.1.2.1 features: event modeling in CB Predictor

As mentioned in the announcement post for the new release, another new feature in Predictor in this release is the ability to model events in your historical data. Event modeling is an important aspect in time-series modeling and analysis, since, if they are not accounted for, the resulting forecasts can be way off. Events can be either unplanned or planned, happening irregularly or at regular intervals. Examples include machine breakdown in a production environment, planned or unplanned maintenance, workers strike etc.

This post is in two parts. In this first part, we will discuss our implementation of events in Crystal Ball PS1 release. In the second part, we will analyze some results regarding best practices to adopt when modeling time-series with embedded events.

Why perform event modeling?
As mentioned above, if we do not account for events, usual or unusual, in the historical data, we face a few problems:
  • Events always tend to distort the actual pattern in the data, by either emphasizing or attenuating signals.
  • The model gets fitted to the distorted data. In case of CB Predictor, that may result in the wrong model being selected as the best model (since, we just look at the user-selected error measure when deciding the best model), or the parameter values are incorrectly calculated for the (eventually) right model.
  • The above two issues lead to noisy forecasts.
This is precisely the reason why event modeling is often considered as one of the “data cleansing” steps. We definitely regard this as one important step in preparing data for time-series forecasting, which is why we have included this step in the “Data Attributes” section, along with other data cleansing steps like filling-in missing values and outlier detection.

GUI Interface
As mentioned above, the entry point to events modeling is in the “Data Attributes” panel, as shown in figure 1.

Figure 1: Access to events
Once the ‘View Events ...’ button is clicked, we get a dedicated window for events modeling, shown in figure 2. From this window, we can add or update an event, using the dialog shown in figure 3, or we can delete an event.

Figure 2: Events GUI

Figure 3: Add/Edit an event
Event Types
CB Predictor supports different type of events in this framework. An event could have happened once or multiple times in the historical date range. The same event can be predicted to happen in future forecast periods as well, resulting in an uplift (or downgrade) of the original forecast value. Common sense restrictions apply, like, an event has to have at least one occurrence within the historical date range, in order to be used in future forecasts (otherwise, its effect cannot be calculated). An event occurring at multiple times, either in the past or in future, can be defined at regular or custom interval. An event can also span multiple consecutive time periods in the historical data or in the future forecasts. For more information about defining different types of events, please refer to the Predictor User’s Guide (Setting Up Predictor Forecasts > Selecting Data Attributes - Seasonality, Events, Screening > Viewing and Managing Events).

Algorithmic Details
We use a simple algorithm for analyzing events in historical date range. For each occurrence of an event, we calculate the effect of the event by using one of the algorithms for imputing missing historical data values. If there is a single occurrence of an event in the past, the same effect (either uplift or downgrade) is used for future occurrences of the event. If there are multiple occurrences of an event, we extrapolate the effects using a line fit, to calculate the effects of the same event in future.

Conclusion
Check out the new event modeling capabilities in CB Predictor and let us know how it works for you.