Thursday, October 14, 2010

The extended family of CB Assumptions (Part 2)

This is the second part of a series of posts on how to model new (and possibly exotic) distributions in Crystal Ball. The other parts can be found here:
Part 1: The extended family of CB Assumptions

In the last post, we discussed the way to model two distributions: Erlang and two-parameter lognormal. In this post, we will continue with a few more distributions that can be simulated in CB by modifying one or more parameters of one of the existing distributions. Data can also be fitted to these distributions by locking one or more parameters to specific values. For a detailed discussion on locking parameters while fitting to distributions, check out our coverage of two-parameter lognormal distribution in the previous post, and our help documents.

Example 3: The Maxwell distribution or Maxwell-Boltzman distribution
This distribution is used in statistical physics, specifically as the distribution of molecular speeds in thermal equilibrium.Mathematically, this is a variant of the gamma distribution.
  • Notes and formulas: Check out the Wikipedia entry and the MathWorld website
  • Quick summary: This distribution is a special form of the gamma distribution. A Maxwell-Boltzman distribution with parameter 'a' can be modeled by a gamma distribution with location = 0, scale = 2a2 and shape = 3/2.
  • Generate random numbers: Define a gamma distribution assumption with location = 0, scale = 2a2 and shape = 3/2, where 'a' is the parameter of the distribution.
  • Fit to this distribution: To fit a dataset to this distribution, we have to lock both the location and shape of the gamma distribution to 0 and 3/2 respectively. The parameter 'a' can be found out from the fitted scale. If the fitted scale is 's', then: a = sqrt(s/2).

Example 4: The Chi-square distribution
This distribution is commonly used in statistical inference. One of the common Goodness-of-fit (GOF) statistic used in distribution fitting is the Chi-squared statistic, which, of course, follows the Chi-square distribution. To model this distribution in CB, we will use the same technique that we used to model the two-parameter lognormal distribution.
  • Notes and formulas: Check out the Wikipedia entry and the NIST website
  • Quick summary: This distribution is a special form of the gamma distribution. A Chi-square distribution with 'd' degrees of freedom can be modeled by a gamma distribution with location = 0, scale = 2 and shape = d/2. We use this method in CB to directly construct the specific Chi-squared distribution and calculate the critical values (p-values) of the Chi-squared GOF in distribution fitting, that is reported in the distribution fitting results window.
  • Generate random numbers: Define a gamma distribution assumption with location = 0, scale = 2 and shape = d/2, where 'd' is the degrees of freedom of the Chi-square distribution.
  • Fit to this distribution: Fitting to a Chi-square distribution is slightly tricky, since we do not have the ability to lock the scale of a gamma distribution in distribution fitting. We will come back to this later in a future post in this series.
Example 5: The Rayleigh distribution
Often used in the physical sciences, this distribution is a special case of the Weibull distribution.
  • Notes and formulas: Check out the Wikipedia entry
  • Quick summary: This distribution is a special form of the weibull distribution with shape = 2.
  • Generate random numbers: Define a Weibull distribution assumption with shape = 2.
  • Fit to this distribution: Fitting to a Rayleigh distribution is also easy, just lock the shape value of the Weibull distribution to 2 while fitting.
Example 6: The Pearson type V distribution or Inverse gamma distribution
Mostly used to measure the time taken to perform a task. Also known as inverse gamma distribution.
  • Notes and formulas: Check out the Wikipedia entry
  • Quick summary: This distribution can be modeled using gamma distribution. If X ~ PearsonV(scale = a, shape = b), then Y = 1/X ~ gamma(Location = 0, scale = 1/a, shape = b), so it follows that X = 1/Y.
  • Generate random numbers: Given the parameters of the PearsonV distribution (scale=a, shape=b), set up a gamma distribution with parameters: Location = 0, scale = 1/a, shape= b. Next, set up a forecast having the inverse of the gamma assumption. As you run the simulation, the forecast values would represent the random numbers from Pearson-V distribution.
Figure 1: Excel worksheet for simulating Pearson-V distribution in CB
Figure 2: CB chart showing the Pearson-V distribution with statistics
In the above figures we show the details of simulating a PearsonV(scale=3, shape=5) distribution. Notice the proximity of the mean and variance values in the spreadsheet (Figure 1) with those in the statistics window of the forecast (Figure 2).
  • Fit to this distribution: Fitting to a Pearson-V distribution is also easy. Given a dataset, follow the steps below:
    • Calculate the inverse of the values
    • Fit these values to a gamma distribution with location locked to 0.
    • The scale of the PearsonV is the inverse of the fitted gamma scale. The shape of both the distributions are the same.
Figure 3: Fitting data to a Pearson V distributio
In the figure above (Figure 3), we started with a set of 1000 datapoints from a Pearson-V(scale=3, shape=5) distribution, generated using the technique mentioned above. The best fit gamma distribution has the following parameters: scale = 0.31, shape = 5.42 (upto 2 places of decimal). Transforming back to Pearson-V distribution, we get a scale of 3.23 (=1/0.31) and shape of 5.42. The fit is reasonably close for a dataset of 1000 points.

    Monday, October 4, 2010

    The extended family of CB Assumptions (Part 1)

    Over the years, we have received lot of requests for adding new distributions to our distribution gallery. We have added new distributions in the recent past (like betaPERT), but not at the rate at which we get the requests. One of the primary reason for not implementing some of requests is that many of the requested distributions are special forms of distributions we already support (no, that is not the only reason, but it is often the reason). Here we will discuss some of the distributions which fall under this category, and are highly unlikely to be supported natively for this reason.

    Before starting though, let's make sure we understand, what is meant by supporting a distribution in Crystal Ball. If we support a distribution, we will have to be able to do the following two things:
    • Generate random numbers from this distribution
    • Fit data to this distribution and generate Goodness-of-fit (GOF) statistics
    The above two tasks are, not surprisingly, ordered by the ascending order of difficulty. If you want to use a new distribution in your simulation model, you might want to do one of the above two tasks, and in some cases, both. So, let's get started with two easy examples in this post.

    Example 1: The Erlang distribution
    This one is really straightforward. The Erlang distribution is a special case of the Gamma distribution where the shape parameter is an integer. Since we accept both integer and non-integer values for the shape parameter of our gamma distribution, simulating Erlang distribution with the gamma distribution is easy.
    • Notes and formulas: Wikipedia entry linked above is sufficient.
    • Generate random numbers: Define a gamma distribution assumption as usual. Just use the integer shape value.
    • Fit to this distribution: Fitting to an Erlang distribution is slightly tricky. We will come back to this later in a future post in this series.

    Example 2: The two parameter lognormal distribution
    Some time back, we introduced the 3-parameter lognormal distribution, the extra parameter being location. The 3-parameter lognormal distribution is somewhat unusual, since it is not covered in any textbook and the use of the distribution in this form is not well documented. Our main reason for introducing it was to offer more flexibility in the distribution (like having values < 0). Nevertheless, although we had made sure at that time that this new distribution is completely backward-compatible with the classic two-parameter distribution (all models having the two parameter distribution will run the same way without any change), I am sure this change might have taken a few souls by surprise (yes, that might be putting it mildly in some cases). So, let's go back in time and simulate the classic 2-parameter lognormal in the current version of Crystal Ball.
    • Notes and formulas: Wikipedia entry will serve us good here.
    • Generate random numbers: That's easy - define a 3-parameter lognormal distribution with the location set to 0.
    • Fit to this distribution: This is also easy with a few new features which were also introduced in the same version of Crystal Ball. When you get to the 'Fit Distribution' dialog from the 'Distribution Gallery', switch on 'Lock parameters', available at the bottom left of the screen.That would bring up the 'Lock Parameters' dialog. Select to lock the location of the lognormal distribution to 0, as shown in the screenshot below. This would result in fitting to the classic lognormal distribution, and you would get all the GOF statistics for the fit.
    Fitting data to a 2-parameter lognormal distribution
    The same technique can be used to model the 2-parameter versions of gamma and Weibull distirbutions. The textbook notes of these distributions typically have the scale and shape parameter. CB implementation contain the location parameter for added flexibility, but one can model the two-parameter versions of these distributions easily as discussed above.

    Part 2:  The extended family of CB Assumptions