Thursday, October 27, 2011

Event modeling in depth in CB Predictor (Part 1)

Event modeling is a new feature in the most recent release of Crystal Ball ( In the last post, we talked about the basics of event modeling using CB Predictor, and introduced the UI for this feature. In this and the following post, we will dive deeper and discuss in greater detail about certain aspects of event modeling in CB Predictor. In this post, we will touch upon a few limitations of the CB event modeling feature. We will conclude with a discussion about the relationship between data screening and event modeling.

Limitations of CB Predictor event modeling
A few limitations of CB Predictor which you may or may not realize while modeling events are:
  • This is mentioned in the last post: we need to have at least one instance of the event happening in the historical period in order to use that event in the future.
    • Workaround: If you think you have an event for which you know, or have already calculated, the effect on the time series, you can directly add the effect of that event in the future periods. Unfortunately, this has to be a manual process, CB Predictor cannot be used.
  • CB Predictor does not allow defining overlapping events in the historical period or in the future. That is, there cannot be any common day between two events. It is difficult to calculate the effect of each event when two events overlap, hence the limitation.
    • Workaround: Similar to previous point, you have to estimate and remove the effect of one of the event from the time series. For example, if events A and B both happen on a specific day, and you can estimate the effect of either event A or B, then you can let Predictor handle only A (or B), by subtracting the effect of B (or A) from the series, and then adding back the effect of B (or A) after the forecasting is done.
  • CB Predictor issues an warning if more than 10% of the historical values are defined as events. Since CB Predictor calculates the effect of an event by using the same algorithms as imputing missing values, too many events defined could affect the predictive accuracy.
  • If you are using multiple linear regression in CB Predictor and an event is defined on all the series, then, as you would expect, the effect of these events are not explicitly added to the dependent variable(s). Rather, the effects of the events are expected to be included through the regression equation. On the other hand, when an event is defined for a few of the series designated as independent variables, then we should explicitly account for events defined only on the dependent variable. Currently that is not done. Please let us know if you face this situation often, and what you would think about the suggested approach.
Event modeling and data screening
Events modeling and data screening are both available in the ‘Data preparation’ phase in CB Predictor, and are available in the ‘Data Attributes’ panel. We use similar approach to both, once an outlier has been identified, it is treated as a missing value and the approximate value is then imputed using one of the available algorithms. Here are some notes which would be helpful if you are planning to use both features together:
  • Purpose of both the features are same, to find out and explain outliers.
  • One would use data screening to detect outliers which are unusual values without a known cause. On the other hand, one would use the events feature of Predictor to define identifiable occurrences that have affected historical data and could affect predicted data.
  • CB Predictor takes into account events first before screening data for outliers. That means, if you define an event, the effect of that event is first factored out, the time-series models are then run and the outliers detected as per the user-set rules.
  • In general, that would mean that, if an event has been modeled correctly, an outlier would not appear on the same date(s) as that of an event. If you notice otherwise, please let us know.

Event modeling is a powerful new tool in CB Predictor, and some extra care is needed when using this feature.

Update (10/31/2011): Miscellaneous corrections.

No comments:

Post a Comment