Wednesday, June 30, 2010

A different run (Part 2): Excel RAND() function in CB models - handle with care

This is the second part of a series of posts detailing the reasons of having different simulation output when rerunning Monte Carlo simulation models using CB. The other parts can be found here:
Part 1: CB Excel functions are not tied to seed
Part 3: Running OptQuest with low-confidence testing at Extreme Speed
Part 4: Multi-threading and seed

I hinted at this in the first post: using Excel RAND() function in a CB model would result in having different result for each simulation run. The RAND() function generates an uniformly distributed random number between 0 and 1 in the cell. This is a volatile function, each instance of the function in the spreadsheet model returns new value each time the worksheet is recalculated. There is no control over how these random numbers are generated: you cannot specify a seed for this function. So essentially, every time you run the simulation model, and the sheet gets recalculated for each trial, you get different values from the RAND() function, and if you have forecasts linked to these values, they will be different in each run. They may have similar look, and might fit to the same probability distribution though, but that is not guaranteed.

Workaround: Use CB uniform distribution assumption in [0,1] to generate similar random numbers, but now you can control the stream of numbers by using seed. But do not use the CB.Uniform(..) Excel function - as mentioned in the first post, that would lead to the same problem. Use other CB assumptions to generate other random numbers to suit your need.

One common use of the RAND() function is to generate integers between two integer limits, and use these integers as index in a lookup function to return a different value from a range, essentially for picking a value from a range with replacement. For example, you might want to pick a person from a list of 20 people, and it is fine to pick the same person more than once. In this case, assuming that each person is equally likely to get picked at any pick, you would want to use a discrete uniform distribution. Rather than using a formula  like "=INT(RAND() * 20)+1" in Excel to generate a discrete uniform random number between 1 and 20, use the corresponding assumption from CB. Of course, it is also easy to tweak this scheme for picking a value from a range without replacement.

No comments:

Post a Comment