Thursday, October 14, 2010

The extended family of CB Assumptions (Part 2)

This is the second part of a series of posts on how to model new (and possibly exotic) distributions in Crystal Ball. The other parts can be found here:
Part 1: The extended family of CB Assumptions

In the last post, we discussed the way to model two distributions: Erlang and two-parameter lognormal. In this post, we will continue with a few more distributions that can be simulated in CB by modifying one or more parameters of one of the existing distributions. Data can also be fitted to these distributions by locking one or more parameters to specific values. For a detailed discussion on locking parameters while fitting to distributions, check out our coverage of two-parameter lognormal distribution in the previous post, and our help documents.

Example 3: The Maxwell distribution or Maxwell-Boltzman distribution
This distribution is used in statistical physics, specifically as the distribution of molecular speeds in thermal equilibrium.Mathematically, this is a variant of the gamma distribution.
  • Notes and formulas: Check out the Wikipedia entry and the MathWorld website
  • Quick summary: This distribution is a special form of the gamma distribution. A Maxwell-Boltzman distribution with parameter 'a' can be modeled by a gamma distribution with location = 0, scale = 2a2 and shape = 3/2.
  • Generate random numbers: Define a gamma distribution assumption with location = 0, scale = 2a2 and shape = 3/2, where 'a' is the parameter of the distribution.
  • Fit to this distribution: To fit a dataset to this distribution, we have to lock both the location and shape of the gamma distribution to 0 and 3/2 respectively. The parameter 'a' can be found out from the fitted scale. If the fitted scale is 's', then: a = sqrt(s/2).

Example 4: The Chi-square distribution
This distribution is commonly used in statistical inference. One of the common Goodness-of-fit (GOF) statistic used in distribution fitting is the Chi-squared statistic, which, of course, follows the Chi-square distribution. To model this distribution in CB, we will use the same technique that we used to model the two-parameter lognormal distribution.
  • Notes and formulas: Check out the Wikipedia entry and the NIST website
  • Quick summary: This distribution is a special form of the gamma distribution. A Chi-square distribution with 'd' degrees of freedom can be modeled by a gamma distribution with location = 0, scale = 2 and shape = d/2. We use this method in CB to directly construct the specific Chi-squared distribution and calculate the critical values (p-values) of the Chi-squared GOF in distribution fitting, that is reported in the distribution fitting results window.
  • Generate random numbers: Define a gamma distribution assumption with location = 0, scale = 2 and shape = d/2, where 'd' is the degrees of freedom of the Chi-square distribution.
  • Fit to this distribution: Fitting to a Chi-square distribution is slightly tricky, since we do not have the ability to lock the scale of a gamma distribution in distribution fitting. We will come back to this later in a future post in this series.
Example 5: The Rayleigh distribution
Often used in the physical sciences, this distribution is a special case of the Weibull distribution.
  • Notes and formulas: Check out the Wikipedia entry
  • Quick summary: This distribution is a special form of the weibull distribution with shape = 2.
  • Generate random numbers: Define a Weibull distribution assumption with shape = 2.
  • Fit to this distribution: Fitting to a Rayleigh distribution is also easy, just lock the shape value of the Weibull distribution to 2 while fitting.
Example 6: The Pearson type V distribution or Inverse gamma distribution
Mostly used to measure the time taken to perform a task. Also known as inverse gamma distribution.
  • Notes and formulas: Check out the Wikipedia entry
  • Quick summary: This distribution can be modeled using gamma distribution. If X ~ PearsonV(scale = a, shape = b), then Y = 1/X ~ gamma(Location = 0, scale = 1/a, shape = b), so it follows that X = 1/Y.
  • Generate random numbers: Given the parameters of the PearsonV distribution (scale=a, shape=b), set up a gamma distribution with parameters: Location = 0, scale = 1/a, shape= b. Next, set up a forecast having the inverse of the gamma assumption. As you run the simulation, the forecast values would represent the random numbers from Pearson-V distribution.
Figure 1: Excel worksheet for simulating Pearson-V distribution in CB
Figure 2: CB chart showing the Pearson-V distribution with statistics
In the above figures we show the details of simulating a PearsonV(scale=3, shape=5) distribution. Notice the proximity of the mean and variance values in the spreadsheet (Figure 1) with those in the statistics window of the forecast (Figure 2).
  • Fit to this distribution: Fitting to a Pearson-V distribution is also easy. Given a dataset, follow the steps below:
    • Calculate the inverse of the values
    • Fit these values to a gamma distribution with location locked to 0.
    • The scale of the PearsonV is the inverse of the fitted gamma scale. The shape of both the distributions are the same.
Figure 3: Fitting data to a Pearson V distributio
In the figure above (Figure 3), we started with a set of 1000 datapoints from a Pearson-V(scale=3, shape=5) distribution, generated using the technique mentioned above. The best fit gamma distribution has the following parameters: scale = 0.31, shape = 5.42 (upto 2 places of decimal). Transforming back to Pearson-V distribution, we get a scale of 3.23 (=1/0.31) and shape of 5.42. The fit is reasonably close for a dataset of 1000 points.

    No comments:

    Post a Comment