Wednesday, June 30, 2010

A different run (Part 2): Excel RAND() function in CB models - handle with care

This is the second part of a series of posts detailing the reasons of having different simulation output when rerunning Monte Carlo simulation models using CB. The other parts can be found here:
Part 1: CB Excel functions are not tied to seed
Part 3: Running OptQuest with low-confidence testing at Extreme Speed
Part 4: Multi-threading and seed

I hinted at this in the first post: using Excel RAND() function in a CB model would result in having different result for each simulation run. The RAND() function generates an uniformly distributed random number between 0 and 1 in the cell. This is a volatile function, each instance of the function in the spreadsheet model returns new value each time the worksheet is recalculated. There is no control over how these random numbers are generated: you cannot specify a seed for this function. So essentially, every time you run the simulation model, and the sheet gets recalculated for each trial, you get different values from the RAND() function, and if you have forecasts linked to these values, they will be different in each run. They may have similar look, and might fit to the same probability distribution though, but that is not guaranteed.

Workaround: Use CB uniform distribution assumption in [0,1] to generate similar random numbers, but now you can control the stream of numbers by using seed. But do not use the CB.Uniform(..) Excel function - as mentioned in the first post, that would lead to the same problem. Use other CB assumptions to generate other random numbers to suit your need.

One common use of the RAND() function is to generate integers between two integer limits, and use these integers as index in a lookup function to return a different value from a range, essentially for picking a value from a range with replacement. For example, you might want to pick a person from a list of 20 people, and it is fine to pick the same person more than once. In this case, assuming that each person is equally likely to get picked at any pick, you would want to use a discrete uniform distribution. Rather than using a formula  like "=INT(RAND() * 20)+1" in Excel to generate a discrete uniform random number between 1 and 20, use the corresponding assumption from CB. Of course, it is also easy to tweak this scheme for picking a value from a range without replacement.

Thursday, June 24, 2010

ISF 2010 Update: Nice experience

I am into the last day of ISF2010 conference. Overall it was a nice conference. In the past, I have attended conferences like the INFORMS conferences (the annual conference, the practice conference), and the IIE conference, all of which have forecasting tracks. This is the first time I have attended a conference exclusively covering forecasting, so I was not sure what to expect. But I was not disappointed - this conference had lots of nice talks to keep me interested and busy.

Some of the talks I attended focused on ARCH and GARCH models, ARIMA models, ways to improve forecasting by aggregating different forecast streams, neural network (NN) based forecasting, and ways to aggregate forecasts across different levels. There were also talks featuring classic methods like exponential smoothing and moving average. Learned a few new things, and got a few ideas that I need to try out with CB Predictor.

I also presented on the forecasting capabilities of CB Predictor. The other speakers in the session were forecasting heavyweights AutoBox and SAS. Of course, we do not compare anywhere in the feature list of what these software offers, but CB Predictor does offer the user-friendly, non-intimidating environment, familiar Excel spreadsheet goodness, and most importantly, the capability of mixing risk analysis using Monte Carlo Simulation. I think that is a strong value proposition, considering the fact that lot of small and medium sized businesses do not have team of dedicated forecasters who work only on forecasting. These companies typically have a few analytic folks, who take care of various analytic duties, including risk analysis and forecasting. CB Predictor suits well for this kind of scenarios.

My ISF2010 presentation can be found here.

Update (8/26/2011): Language.
Update (12/24/2011): Labels.

Saturday, June 19, 2010

A different run (Part 1): CB Excel functions are not tied to seed

We often get asked about CB Monte Carlo simulation models or OptQuest optimization models which produce different result each time a complete simulation or optimization is run. There are a few situations which can result in this. In a series of posts, I will try to explain each reason and possible workaround.

The first one is actually an interesting one. Most of our power users are aware that we publish a few functions in Excel from CB, which can be used as functions in Excel spreadsheet models. These functions show up when you select 'Crystal Ball' in the 'Insert Function' dialog in Excel. There are a bunch of distribution assumption specific functions there, like CB.Beta(...), CB.Normal(...) etc., which returns random numbers from these distributions. The caveat to be aware of when using these functions is that, they are not tied to the seed value which you can set in CB 'Run Preferences'. That is, irrespective of whether you select to use a specific seed value or not in this preference, these functions will always use new seed value for each evaluation of the function. So, each time you run a simulation, reset it and rerun the simulation, you are going to get new set of values, and hence, the result will be different.

This type of function result is termed as 'volatile' in Excel, an example of which is the RAND() function. In fact, heads up, using RAND() function in your model also causes the same problem: you get different result every time you run the simulation (I will do a separate short post on that later). These functions return new values each time the worksheet is recalculated.

Workaround: Use the CB assumptions. Either define them using the GUI or using the Developer Kit. You will get the usual color-coded cells, and these assumptions will listen to the preferences you set in 'Run Preferences' dialog.

Links to other posts in this series:
Part 2: Excel RAND() function in CB models - handle with care
Part 3: Running OptQuest with low-confidence testing at Extreme Speed
Part 4: Multi-threading and seed

Wednesday, June 16, 2010

Session on CB Predictor in ISF 2010

I will be presenting in International Symposium on Forecasting, 2010, being held at San Diego, CA. The presentation is titled "Introducing Oracle Crystal Ball Predictor: a new approach to forecasting in Excel". The abstract of the talk is as follows:

In this presentation we will have an overview of the functionality provided by Oracle Crystal Ball's (CB) Predictor engine. In brief, CB Predictor has an intuitive interface for selecting, managing and cleaning data, running multiple forecasting algorithms (including seasonal and nonseasonal models) and regression on large datasets, and a coherent way of presenting and extracting results or generating reports. Some of the innovative features include outlier detection and filling in missing values. We will also have a sneak preview of forthcoming features.

The talk is scheduled for Monday, June 21st, at the 2:00PM to 3:30PM session. I will post the slides after the presentation. I am also planning to blog on interesting presentations while at the conference.

Update: Here is a link to the presentation.
Update (12/24/2011): Labels.