Monday, June 4, 2012

Embedding a google drive form in your website

If you do not want to embed the complete google form iframe in your webpage, you can bypass that and have some simple code instead. The following lines show a way where I create the input field myself and set up everything, but the response gets stored in a google drive spreadsheet.
<form method="post" action="">
<span style="font: 16px calibri">Enter email: </span>
<input type="text" name="entry.0.single" value="" style="font: 16px calibri" />
<input type="submit" value="Submit Email" style="font: 16px calibri" />
Check it out - can be useful in certain circumstances.

Here is a slightly longer version, with a form having four fields, where I take care of the form submission myself, using jQuery and jQuery Mobile (slightly dated version of both - for reference only). The jQuery .ajax(..) call is used here - works great. Note: do not use dataType: "jsonp" while POST-ing, since the call returns a complete HTML page which we have no use and can't process directly. While developing, use a tool like Firebug, so you are sure what is getting in and what is getting out.

    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <meta charset="utf-8" />

    <!-- Load all required scripts. -->
    <link rel="stylesheet" href="" />
    <script type="text/javascript" charset="utf-8" src=""></script>
    <script type="text/javascript" charset="utf-8" src=""></script>
<div id="app-request" data-role="page" class="backdrop">
    <script type="text/javascript" charset="utf-8">
        function formSubmit()
            console.log("Submitting form");
            // Collect form data
            var formData =
                "entry.0.single" : $("#entry_0", $.mobile.activePage).val(),
                "" : $("#req_form", $.mobile.activePage).find("input[type='radio'][name='']:checked").val(),
                "pageNumber" : 0,
                "backupCache" : ""
            // Show the loading image.
            $("#req_form", $.mobile.activePage).html("<img src='../common/images/progressbar.gif' />");
                type: "POST",
                url: "",
                data: formData,
                success: function (data)
                        "Thanks for your interest!! We will be in touch shortly.<br />" +
                        "<br />" +
                        "Regards,<br />" +
                        "ConferenceToGo Team"
                failure: function (data)
                        "Sorry !! Couldn't send your request.<br />" +
                        "Can you please try again?<br />" +
                        "<br />" +
                        "Regards,<br />" +
                        "ConferenceToGo Team"
    <div data-role="header" style="height: 45px">
        <a href="#" class="ui-btn-left top-btn" data-icon="back" data-rel="back">Back</a>
    </div><!-- /header -->

    <div data-role="content" style="padding-top:5px;">
        <h2>ConferenceToGo for your Conference</h2>
        <div id="req_form">
            Thanks for your interest in the ConferenceToGo app. Please fill out this short
            form and one of our team members will get back to you shortly.<br />
            <span style="color:Red">* Required</span>
            <form data-ajax="false" onsubmit="formSubmit(); return false;">
                <input type="text" name="entry.0.single" placeholder="Your Name (*)" id="entry_0" value="" required />
                <fieldset data-role="controlgroup">
                    <legend>Typical number of attendees:</legend>
                    <label for="group_4_1">Less than 500</label>
                    <input type="radio" name="" value="Less than 500" id="group_4_1" />
                    <label for="group_4_2">500 to 1000</label>
                    <input type="radio" name="" value="500 to 1000" id="group_4_2" />
                    <label for="group_4_3">1000 to 5000</label>
                    <input type="radio" name="" value="1000 to 5000" id="group_4_3" />
                    <label for="group_4_4">More than 5000</label>
                    <input type="radio" name="" value="More than 5000" id="group_4_4" />
                <input type="submit" name="submit" value="Submit" data-inline="true" />
        <br />
        <br />
        <span style="font-size:small">Powered by <a href="" target="_new">Google Docs</a></span>
    </div><!-- /content -->
</div><!-- /page -->

 Update (3/15/2013): Added a longer example.

Thursday, May 24, 2012

New version of Oracle Crystal Ball (Release is out

Here is an excerpt from the official text:
Crystal Ball Users,

Crystal Ball is released! The following is a summary of what’s new in Crystal Ball

  • Grouped Assumptions in Sensitivity Charts
  • Data Filtering When Fitting Distributions
  • Parameter Edits When Fitting Distributions
  • Expanded Distribution Parameters
  • Predictor Parallelization
  • Localization into Additional Languages (Japanese, Spanish, French, German, Portuguese)

More information can be found in the online New Features Guide:

A trial version is downloadable from the Oracle Technology Network:

If you have not already done so, I also recommend visiting the Crystal Ball Solution Factory. Here you will find useful information about new releases in addition to presentations, discussions, and resources that cover different applications of Crystal Ball.

Crystal Ball Solution Factory ( Access URL/Page Token: cb4me)

We appreciate your engagement and use of Crystal Ball in your business.

I will have a series of posts on some of the features in the days ahead. If there are questions about the availability of the software or any other aspect, please get in touch.

Sunday, April 29, 2012

Restoring between Installatron and Softaculous

Recently I changed my webhosts. On the positive side, both hosts use DirectAdmin based control panel, and have the same version of DirectAdmin software. But apparently, that was not enough. On my old hosting account, I had an installation of Mantis (a bug tracking software), which was installed through an automated installation system called Installatron. The new host doesn't support Installatron, they only support another software installation system called Softaculous. 

Softaculous has a page which talks about restoring the backup of an installation from a downloaded backup file. Unfortunately, these instructions do not work when the backup is made through Installatron installation system. But, irrespective of which installation system is used to install Mantis, the database structure of Mantis remains the same (if both installation systems install same version of Mantis). And since Mantis is completely database driven, it is possible to restore a Mantis installation by restoring the database. This is what I resorted to at the end.

Using the DirectAdmin control panel of the old hosting account, I downloaded the database backup corresponding to the Mantis installation. The database backup download link can be found at DirectAdmin control panel -> Your Account -> MySQL management. The screenshot below shows the links to backup the database downloads.
Screenshot for downloading and uploading database backups
After I saved the downloaded backup in my computer, I used the DirectAdmin control panel of the new hosting account to upload the backup to the database corresponding to the Mantis installation there. Note: You have to have a new installation of Mantis in the new hosting account in order for this to work.

Once the database backup was uploaded and the data was restored, I got back all my data with the new hosting provider. Hope this helps someone.

Friday, March 23, 2012

Text mining with perl: Resources

One of the exciting area of the analytics movement is text mining, where you mine relevant information from unstructured text. This is, hence, different from data mining, where you mine data for patterns. It is generally accepted that text mining is considerably harder than data mining, mostly because the source is unstructured. but the benefits are often substantial - for example, there is an investment fund which uses analysis of twitter feeds for predicting stock movement. This question on a quant Q&A site discusses a few similar applications. This is just one example application. Another example is IBM's Watson, who famously won the jeopardy in February of 2011.

Text mining and natural language processing
Text mining has close association with natural language processing (NLP). Since you have to search for information in unstructured text which are mostly text in natural languages, you end up using quite a few of the algorithms used in NLP. Recently, I have been moving a bit deeper into text mining and NLP, for a side project of mine (maybe a later blog post, but let me tell you, it is not for investment purposes :-).  Text mining and NLP is a good fit for me (I am a developer of a Monte Carlo simulation software called Oracle Crystal Ball in my day job) - they use a lot of probabilities and modeling, similar to what I do for Monte Carlo simulation. There isn't much optimization and O.R. yet, but it is analytics all the way. 

To that end, I am learning the tricks of the game, and a suitable language to program.

Perl for data extraction
Although I mostly work in the .NET world, often times, for some processing jobs, I revert to perl. I have used perl for many years, and love the language. Initially, I learned perl to do some simple web-CGI programming when I was in IIT Kharagpur, in my undergraduate years. Then I used perl for heavy database scripting at a job I did immediately following my undergraduate degree. I am by no means an expert in perl, but I know decent amount to get by. Since then, I have used perl in a variety of capacity: in my current work, in hobby web-CGI programming and so on. Although, lot of people have moved on to php for web-CGI and python for non-CGI scripting, I never could do so. There is nothing wrong with perl anyway, why bother?

I also think perl is a great language for people who deal with data. As a software developer for an analytic tools, I deal with data all the time, and lots of them. The strength of perl comes from its rich feature set suitable for data extraction (indeed, perl is, unofficially, an acronym for Practical Extraction and Reporting Language). I have used perl (along with unix shell e.g., bash, sed and awk) scripts for extracting data and formatting the way I need with great success.

Perl for text mining
When you think about the strengths of perl, it naturally becomes a contender for the language of choice for NLP. Perl has strong but easy to use regular expressions, and various modules for extracting anything from anywhere. Given that, and my familiarity with perl, I decided to take a look at the resources available for this purpose. Not so surprisingly, there are quite a few.

Resources for text mining
Here is my compiled list of resources, with an obvious bias towards using perl.
  • Books: Let's start with books. There are no dearth of books on text mining and NLP, but I will list a few ones I thought are good.
    • Manning and Schutze. Foundations of Statistical Natural Language Processing. The MIT Press. 1st Ed. 1999. This book is a gem, sorta bible for modern NLP. It has got nothing to do with perl, but since text mining has lot of NLP stuff, this book is a great reference.
    • Manu Konchady. Text Mining Application Programming. Charles River Media. 1st Ed. 2006. This is probably the best book out there to learn text mining using perl. Note that the book is not that heavy on perl code though, but stresses on developing the concepts and the models that are used in text mining. The book covers text mining by talking about various problems that appear in (and can be solved using) text mining. I liked that approach. It also helps that the author maintains an open source library for text mining written in perl.
    • Roger Bilisoly. Practical Text Mining with Perl. Wiley. 1st Ed. 2008. This is probably the only book that directly deals with both text mining and perl. This is also probably a pretty good book, but unfortunately it did not meet my exact needs. I found the topic covered somewhat lacking when compared to Konchady's book.
    • Weiss et. al. Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer. 1st Ed. 2005 (Softcover - 2010). The contents of this book seemed interesting, similar to Konchady's book but the treatment more rigorous. I have put this in my "to check out" list.
  • Software: As with books, there are no dearth of software for text mining either. Since I was looking for perl related solutions, those are the only ones I list here.
    • Text mine: The open source perl library I mentioned earlier.
    • Text::Mining module on CPAN: This module seems comprehensive, but is rather poorly documented. It is not very clear from the documentation available in the module about how to use various sub-modules effectively. Hope the authors work on that.
  • Video learning: There are quite a few lectures available on Youtube. But the best one has to be an online course from Stanford, which is going on right now. The course is from Prof. Manning, the author of the first book in my list. The online course instructs the assignments to be submitted in java or python, and I understand their logic - those two being at the top of the heap of popular languages, but I have decided to use perl for the purpose of doing the assignments. I won't be graded then, but hey, I did not want to be graded in the first place ;-). The reason for choosing perl: my familiarity and knowledge of the strengths of perl in this area (although, I am familiar with java too, at almost the same level), the lure of being able to use the same code in my side project, and possibly reuse some of that stuff in a web-CGI context at some point.
I will update this post if I come across more relevant resources.

Monday, March 5, 2012

Confidence Intervals (Prediction Intervals) in CB Predictor (Part 1): The Calculations

In the CB Predictor result window, we show the confidence interval (CI) [also known in the literature as Prediction Intervals (* See Below)] of the forecasts. Prof. Chatfield has a great paper about prediction intervals, their importance, and how to compute them - interested readers, please have a look.

The default CI’s in CB Predictor result window are at 2.5% and 97.5%, but this can be easily changed using the dropdown at the bottom-right corner of the result window (see figure 1). One can even use custom percentiles for the CI’s to show up in the result window (and also in the Predictor reports). Although this part is quite straight forward, there are quite a few steps in calculating the standard error which might be able to use some further explanation. In this and subsequent posts, we will discuss about some of the interesting stuff around the CI in Predictor.

Figure 1: Result window

Confidence intervals for classic methods
The way we calculate the confidence intervals for classic forecasting methods (four non-seasonal and four seasonal methods) is not very obvious from our charts and reports. We use an empirical formula to calculate the confidence intervals for the forecast of each period.

The formula is from the following reference:
Bowerman, B.L., and R.T. O'Connell (contributor). Forecasting and Time Series: An Applied Approach (The Duxbury Advanced Series in Statistics and Decision Sciences). Belmont, CA: Duxbury Press. 1993.
See Section 8.6, Prediction Intervals (page 427).

Google Books project seem to have only snippets available from the book due to licensing issues, so if anyone is interested in the actual text and do not have access to the book, please feel free to contact me (or Oracle support, for that matter) to receive a scan of the relevant pages.

The method makes a few assumptions:
  • Historical data amount is sufficiently large
  • The forecast errors are normally distributed
Method summary
Without reproducing the formula for this calculation, which is already available in our reference manual and in the reference above, let us describe the process here. Note that, the symbols used in the formula in our manual (or in the book) are not the same as the symbols used in the description below, but the idea is the same.

Let's say we have data points for periods 1 to t: Y(1) to Y(t). We calculate the forecast for the period (t+1)-th using whichever forecasting method is selected. Let's say the forecast is F(t+1). From the fitted data (using the forecasting model) we also can calculate the RMSE, let’s assume that is RMSE(t+1). Now, the normal distribution assumption is defined as: N(F(t+1), RMSE(t+1)). Of course, this normal distribution is then used to calculate the CI’s at different percentile levels. Note here, that the fitted values which we get from the model, which are used to calculate both F(t+1) and RMSE(t+1), are nothing but 1-period ahead forecasts at each period, starting from period, say, k (*), and ending at period t, using the equations of the forecasting model. It is, as if, we were forecasting 1-period ahead at every period in (k,...,t).

(*) Important note: When I say the phrase “starting at period k”, k is the starting period where we can begin to calculate the 1-period ahead forecasts. For some models (like Double Moving Average, or the seasonal models), one can’t really start calculating 1-period ahead forecasts at period 2.

This is the insight that is used in the heuristic. For period (t+2), the forecast F(t+2) is obtained from the method equations. For the standard error, we calculate 2-period ahead forecasts at each period, starting from period, say, k+1, and ending at period t, and then calculate the RMSE of these values w.r.t. the dataset, calling it RMSE(t+2). The normal distribution assumption then, of course, is N(F(t+2), RMSE(t+2)), which is used in calculating CI’s. For period (t+3), we use the same procedure, except now we look at 3-period ahead forecasts, and so on.

The procedure is simple and intuitive, and works pretty well in most situations. Using the above description, one should be able to easily validate the numbers seen in Predictor by implementing a forecasting method and the standard error for that method in Excel.

Confidence Intervals for ARIMA methods
For Box-Jenkins ARIMA methods, we do not use the heuristic mentioned above. Rather, we use the theoretical formula to calculate the standard errors.

A reference for this procedure is below. 
Box, G. E. P., Jenkins, G. M., and Reinsel, G. C. Time Series Analysis: Forecasting and Control. 4th ed. Hoboken, NJ: John Wiley & Sons. 2008. Chapter 5, Section 5.1.1 and 5.2.4.
We follow the procedure exactly (yes, including the calculation of $\psi$ functions), so please refer to the text if you have questions regarding the calculations.

(*) Prof. Hyndman has a recent post where he says that the terms "prediction interval" and "confidence interval" mean different things and they should not be used interchangeably.  I still, personally think that the difference is too technical !!

Update (03/15/2013): Link on prediction interval vs. confidence interval.

Wednesday, February 29, 2012

Using Oracle Crystal Ball API in your windows applications

[We are moving away from the API solution mentioned in this blog post because of various reasons. This blog post is available for historical reasons, but the API solution mentioned in this post is not available for either sales or support. We might still be able to help you in your desktop application by using Excel as the intermediary. If you want a discussion on that front, please feel free to post your questions in our Support channels (LinkedIn or CBUG).]

In this post, I wanted to mention an often overlooked gem in the Oracle Crystal Ball suite: the Open Crystal Ball API, or OCB API in short. If you are a Crystal Ball customer (i.e., running Crystal Ball on your desktop with Excel), then you can use the OCB API in your own applications.

OCB API is at the core of the magic that happens when you run Crystal Ball to simulate, optimize or forecast. Excel is but one of the front ends for this API (albeit, a very powerful one), but you can go far beyond what you can do within the Excel environment. Using the OCB API, you can directly access all the mathematics and analytics in your own application, in the way you want it.

OCB API is developed in C#, which is a Microsoft .NET language. That means - you are limited to using C# or other .NET languages for developing with the OCB API. Depending on your perspective, that may or may not be limiting. For example, you can use newer .NET languages with OCB API, e.g., F# or IronPython. Crystal Ball with Python - how cool is that?

Development platform and requirements
Although using OCB API does not require Excel, the OS requirements are the same as desktop Crystal Ball. Also, OCB assemblies (DLL’s) are built as platform agnostic, meaning that they can execute as part of either a 32-bit or 64-bit application.

One would presumably use Visual Studio for developing applications using OCB API, but one of the free/open source editors, like SharpDevelop and MonoDevelop, can also be used. Currently, the minimum requirement for using our API is .NET2.0 - which means you have to use at least Visual Studio 2008 to develop apps using the OCB API. Other later versions of .NET and Visual Studio are all supported.

We have never tried compiling OCB API with Mono library, but applications cannot possibly be developed with Mono and ported on Unix variants or Mac. If anyone wants to test this further, let me know.

If you have an Oracle Crystal Ball desktop license, you are also licensed to use the OCB API, there is no extra charge. You can use the same username and serial number in your code for licensing purposes. Note that, if you want to run a simulation using OCB API, you will have to provide a valid licensing code.

Availability and miscellaneous items
As mentioned, the API is available to use with a valid Crystal Ball desktop license. You have to reference the DLL’s in the installation folder in order to develop using OCB API. You can also contact sales and directly purchase a license for OCB API only.

The OCB API does not include charting. The amazing charts in Crystal Ball are generated using ChartFX library, available from SoftwareFX. They have a free lite version, which you can use for your charting needs. .NET4.0 includes charting APIs, which can also be used for generating charts.

Additionally, The OCB API does not include access to OptQuest libraries for optimization, due to licensing restrictions. The time-series forecasting part is included, however, in another DLL.

The core OCB API DLL’s are obfuscated to protect our intellectual property.

Our OCB API is well documented, and comes with a large set of sample codes to get you started. We have examples for both simulation and time series forecasting. The documentation does not get installed automatically though. Please contact an Oracle Crystal Ball sales representative to get access to the documentation.

The support process is same as desktop Crystal Ball, i.e., through Oracle Tech Support; however, this is for issues found in the OCB API, not for assistance in developing your application. We work with integration partners, who can help develop and deploy custom applications using OCB API. Other community support channels like CBUG or LinkedIn can also be used to ask questions.

Use Cases
Why might you want to use OCB API’s? Here are a few reasons.
  • If you have existing custom applications, you can use OCB API to enhance the analytics in the application, without re-building the model in Excel.
  • You can overcome some of the limitations of Excel by writing custom applications directly using OCB API:
    • Data volume/transfer
    • Modeling complex business rules or workflow
    • Model security/distribution
    • Automation
  • If you want to generate charts specific to your industry, that are not provided by Crystal Ball, or cannot be generated easily in Excel, then you can use the API along with other charting libraries.
  • Although, our Excel add-in code has really good performance, if you want to squeeze out even more performance for large datasets, you can skip the Excel overhead and directly use the OCB API on your data. Writing a .NET app with OCB API will also enable you to take advantage of the Task Parallel Library (TPL) to utilize multiple cores in modern machines. TPL is available as an extension with .NET 3.5, and as part of .NET4.0 and above.
  • If your data is not in Excel and resides in some other system (like external database etc.), then directly connecting to the data source using .NET and using the OCB API to run simulations or time-series forecasting would be easier. Alternative is to get the data in Excel using ODBC or other importing techniques.
  • Using OCB API, you can also expose simulation methods to a web service. Please contact an Oracle Crystal Ball sales representative before taking this route, since there might be additional license implications here.
Update (06/19/2013): Announcing the unavailability of API solution.

    Friday, February 24, 2012

    Monte Carlo simulation examples in healthcare

    I have written about examples of Monte Carlo simulation in the past. For a recent presentation on Monte Carlo simulation and stochastic optimization using Oracle Crystal Ball, I did some research on using Monte Carlo (MC) simulation in health-care. This blog post covers a few such references and applications I have seen so far.

    From my limited research, there seems to be three major types of uses of MC simulation in health-care industries.
    • It seems that large (or small) health-care organizations are using MC simulation and stochastic optimization for better understanding and managing their budgets in various departments. That makes sense, since cost control is one of the main thrusts of health-care systems nationwide.
      • I have seen quite a few articles on using MC simulation in the context of managing preventive care in mental health situations. The article from SAMSHA (The Substance Abuse and Mental Health Services Administration) I mention later is a great example.
    • A few companies, which are manufacturers of health-care equipments, are using MC simulation in the manufacturing context, as in DFSS (Design for Six-Sigma), process capability optimization etc.
    • Finally, a few other companies are using MC simulation for controlling parameters of certain health-care processes, e.g., simulating to find out what would be the optimal level of staffing to achieve certain quality of service.
    Note that there are other uses of generating random numbers from distributions, where you simulate the inter-arrival time for patients in a hospital emergency room etc. This type of problems are better handled using the discrete event simulation paradigm, rather than the MC simulation paradigm, so I will leave them out.

    I found quite a few research papers or article references in the literature which talk about application of MC simulations in the areas I mentioned above. I have to confess though; these papers are somewhat hard to penetrate in limited time by an outsider (to health-care) like me, since they seemed pretty heavy with terms used frequently in the health-care world and rarely outside. One such example of a term is FTE or full-time-equivalent, used in the context of work hours of health-care staffs.

    Here are a few articles which talk about MC simulation in healthcare. None of the links are now available however, feel free to search the articles on one of the search engines.
    • Speaking the Language of Finance - nursing leadership - Statistical Data Included by Pamela S. Hunt 
    • Estimating the cost of preventive services in mental health and substance abuse under managed care by Anthony Broskowski and Shelagh Smith (SAMSHA Pub ID: SMA02-3617R) 
    • Making sure your operating asset allocation is on target by Thomas H. Todd. Strategic Financial Planning. Winter 2009.
    • Application of Decision Sciences to Mental Health Policy by Center for Health Decision Science at Harvard School of Public Health. The page contains links to a few relevant papers.
    Finally, here is a link to my slides which talk about the resources mentioned above and describe some of the problems mentioned in these articles. For a link to my complete presentation on the introduction to Monte Carlo simulation, see here.

    Update (03/15/2012): Links.
    Update (04/01/2019): Removed dead links.

    Saturday, February 18, 2012

    New and improved online resource portal for Crystal Ball

    We have a new and improved Online Resource Portal, which is an awesome resource for all things Crystal Ball. If you are a member of our LinkedIn users group, you might have noticed this message from my colleague Hilary. Otherwise, here is an excerpt of the message:
    We wanted to let everyone know about a big change/improvement we’ve made to our online Crystal Ball resources. We have a brand new Solution Factory (that’s Oracle-speak for online resource portal) that has a ton of information and resources. We’ve grouped them into collections so (we hope!) it’s easier for you to find what you’re looking for – example models, white papers, recorded demos, and all that. We’d also like to invite you to sign up to get more regular updates on Crystal Ball. There’s a big “Sign Up” button on the Solution Factory. We hope you’ll sign up; and more importantly, we hope you find the revamped Solution Factory useful.
    The URL for the portal is:
    If you have comments or questions about the portal, please comment to this post (or send comments to the LinkedIn post), and I will follow up.