Sunday, November 20, 2011

Optimization problems in cloud computing

In the just concluded INFORMS Annual Meeting 2011 at Charlotte, NC, I attended an interesting session on a survey of optimization problems in cloud computing. There were actually a bunch of talks on cloud computing, definitely more than what I noticed in the last annual meeting in 2010. That, of course, closely correlates with the importance and popularity of cloud computing, and more vendors entering this space.

The survey was very well presented. One aspect I pointed out to the speaker was the somewhat absence of stochastic optimization problems in this domain. I think quite a few of the problems mentioned in the talk are actually stochastic in nature, but were presented as deterministic mostly for simplicity.

I encourage anyone interested in this area of research to get in touch with the presenter for the slides.

Friday, November 18, 2011

Accuracy of solutions in OR solvers

In the INFORMS annual meeting 2011 which just concluded (I had a great time, by the way), I attended a talk on improving the accuracy of optimization solver. It was a well-attended talk, and the method was also interesting. The method was in the context of LP and MIPs. The speaker suggested that, once the initial optimization solution was found, to reformulate the problem by doing the following (if I have got this right):
  • Scale the problem so that the number of decimal places gets near the decimal point. E.g., if the discrepancy is coming at 6-th places of decimal, scaling it so that the discrepancy appear at 1st or 2nd place of decimal.
  • Consider only active or near-active constraints (the constraint set can be filtered using the shadow prices of each constraint).
  • Solve the problem using warm-start from the previous solution. Iterate this scheme a few times.
I guess that this scheme can be extended to constrained nonlinear  programming problems as well (and, I think, the author mentioned as much), but my first instinctive question was completely different. I am not sure why someone would do this, i.e., need that much of accuracy, in the first place, other than bragging rights when competing with other solvers. Unless one is solving textbook problems with given data, typical data accuracy does not go beyond the decimal point, and even if it does, seldom beyond 1-2 places of decimal. Why solve a problem accurately upto 10 places of decimal when the input data is not that accurate?

Now, I am aware of industries where the data are often between 0 and 1, for example chemical processes, but shouldn't we be rescaling these problems before solving, rather than doing the above procedure after solving? May be I am missing something here !!

Thursday, November 10, 2011

Development burden on lone consultant

In a recent session of the INFORMS Rocky Mountain chapter, an OR/MS consultant was presenting on an implementation of vehicle routing problem using Microsoft Excel, Microsoft Mappoint and Microsoft Solver Foundation. He also had a slide on the key takeways from the project. I found the slide pretty interesting, and here are my comments on a few points mentioned there.

The Jack-of-all-trades aspect
I am not sure whether this consultant works completely alone or has a small team (like 2-3 people), but the diverse range of things he had to do for the project, like data cleaning (and writing VBA macros for that), OR modeling, OR software (he developed and solved a part of the problem using Excel and Microsoft Solver Foundation Express Edition, and a part of it using a custom heuristic program that uses Tabu search, developed in C#.NET), report generation (VBA Macros) etc., displays the versatility of this guy (or his small team). A large part of OR consultancy requires proficiency in various non-OR programming languages.

The OR/MS aspect of the project
The first few points on the slide was about comparing the time that was required to develop the solution (let's call this X - we love algebraic modeling, don't we?), and the amount of time that was actually put in OR stuff (let's call this Y). I was not surprised when he noted that:
X - Y ≥ 3*Y => X ≥ 4*Y
I have heard this premise before. In INFORMS Annual Meeting 2010, I was attending a panel discussion about  "OR/MS Consulting - The Challenges". The panelists were all consultants and a few were 2-3 person shops. Interestingly, the presenter of the talk I mentioned at the beginning of this post was among the panelists. This panel discussion, by the way, is one of my favorite session ever attended in an INFORMS annual conference - I wish somebody recorded a video of the session. Their consensus seemed to be that if one is the OR modeler in a small shop, the OR work is often less than 15% of the total development work. If one is part of a slightly larger organization, one would be lucky to have it near 25% mark.

Thoughts for would-be OR/MS consultants.

Update [12/22/2016]: Label consolidation.

Ateji in deadpool: interesting entrepreneurial feedback

Recently came to know from an entry in Erwin's blog that Ateji has ceased to exist. I was aware of this company, and its technology, but I would not say that I was a close follower, mainly because I work in .NET domain. I also liked the OptimJ GUI. Regarding technology, as I understand it, Ateji introduced a language-based extension to Java, to express optimization problems (like objective functions, constraints etc.) more succinctly. They also had language extensions (and library support, again in Java) for easily implementing parallelism. This can possibly be compared to the availability of TPL (Task Parallel Library) in .NET (starting from 4.0). Please correct me if my understanding here is flawed.

One thing of mutual importance was (and is) the multi-core execution of optimization solvers, or injecting parallelism easily in mathematical optimization problems. From that perspective, the closing post from Ateji is interesting.  Here is a verbatim excerpt from that post on the issues the company (and the technology) faced regarding adoption:

Adoption issues
We did anticipate that introducing a new language-based technology and getting it adopted would take a long time. The truth is, it takes much longer than anticipated. Many “new” languages that are coming into fashion now were first introduced 10 or 20 years ago.
There are however examples of companies that successfully built a business on a language. Think of Oracle and SQL. They had a strong business case: suddenly, all software developers became able to access databases in a simple and intuitive way. We thought we had a similar strong business case with Ateji PX and parallel programming: suddenly, all Java developers became able to write parallel code for multicore processors in a simple and intuitive way.
Our largest mistake has probably been to surf on the popularity of the Java language and on the hype of the “multicore crisis” (all software has to be adapted for the coming generation of new hardware) without developing a fine enough understanding of the market.  We discovered the hard way that Java developers just don’t want to hear about parallel programming, regardless of how easy we made it. They’d rather use only one core out of their expensive 16-core server than write parallel code (we’ve seen the case). They long for solutions that hide parallelism. This may change when 1000′s of cores become the norm, future will tell.
At the other end of the spectrum, developers that do parallel programming just don’t want to hear about Java, because 10 years ago it used to be slower than C++.

I guess, the key take away here is that although people (more specifically, developers who consume APIs) understand and appreciate the value of multi-core execution, it is not mainstream yet, and there is still a large inhibition from actually using it. I am reasonably sure that things will change, but it will take time. We are already seeing a proliferation of languages supporting parallelism constructs out-of-the-box, but it still puts the onus on the developers to use them. In the mean time, the best way to go, if one is into the business of developing computing libraries, is to incorporate multi-core execution in the library itself and make it completely transparent to these users. In other words, the only way to achieve developer adoption,  is to have a no-brainer drop-in type solution. As Erwin puts it in his post, the target audience for Ateji's technology was too niche to support the company.

Best wishes to the talent pool for their future endeavors.

Saturday, November 5, 2011

Sensitivity chart does not converge in Crystal Ball

Recently I received a support request on a sensitivity chart issue, where the top few assumptions contributing to the variance of the target forecast always seem to change when the simulation is reset and run (and the seed is not set). In general, even if the seed is not set, one should expect convergence of the list of top contributor assumptions when enough trials are run. Even if enough trials are not run, the list should not vary a lot with each reset.

The model was simple. There were around ten assumptions, linked to one forecast through a series of formulas. The flowchart below (Figure 1) shows the type of formulas that were being used.
Figure 1: Model Flowchart

While running the model, two things were noticed:
  1. As mentioned before, the sensitivity charts do not stabilize even using a large number of trials (either there is a change in the contribution to variance value, or the position of a variable on the chart or both).
  2. Switching to the Sensitivity data view, it was noticed that the rank correlation values are at or near zero (considering upto 2 places of decimal). That would indicate spurious correlations, but in this model, that should not be the case.
These are signs that the sensitivity chart may not be working as intended and prompted me to check the limitations of the sensitivity chart (also available in the User's Guide, installed with your installation of CB. Available from Excel at: Excel 2007 > Crystal Ball tab > Resources > Crystal Ball Documentation, and then, Analyzing Other Charts > Understanding and Using Sensitivity Charts > Limitations of Sensitivity Charts). This turned out to be a classic case of one of the limitations of our sensitivity analysis: the analysis does not support non-monotonic relationship between the assumptions and the target forecast. Looking at the formulas in the model, the forecast is an aggregate of functions containing the absolute values applied on assumptions. That makes the relationship between the assumptions and the forecasts nonmonotonic, and hence the problem.

To ascertain that this is indeed the problem, and the relationships between the assumptions and the target forecast are non-monotonic, we can run the Tornado chart on the model and look at the spider chart. This chart can help identify which variables have a non-monotonic relationship with the target forecast. Looking at the spider chart (Figure 2), we see that this is indeed the case.
Figure 2: Spider Chart
The chart shows that each line is broken in the middle, which signifies non-monotonicity. If the relationship was monotonic, then the curve for each variable would have been a straight line from the left end to the right end.

Update (11/7/2011): Language.
Update (12/8/2011): Language.

Wednesday, November 2, 2011

Saving partially completed OptQuest model runs in Crystal Ball

There are times when working with a large OptQuest (a.k.a. Decision Optimizer) model which is running for hours with no end in sight, you want to take a break and do something else with the computer. But, you do not want to lose the progress that has been achieved by the solver so far - so you need a way to save the partially completed run. Although there is no direct way to achieve this through a button click, this can be done. Here we discuss two ways to do this, depending on how long a break you (and your computer) need(s).

Short break
It is possible to informally save a partially completed OptQuest Run. You can hit the red 'Stop' button on the Crystal Ball control panel, and that would essentially pause the solver. When you are ready, you can hit the green 'Continue Optimization' button, and that would continue the optimization from where it stopped, with all the settings and the progress intact.
Notes:
1. This technique would work on general optimization problems, but would not work if you are trying to evaluate the efficient frontier.

Long break
It is also possible to completely shut down Excel and Crystal Ball with a partially completed model. In this case, after you hit the red 'Stop' button on the Crystal Ball control panel, you would then have to export the intermediate solution to the spreadsheet by using OptQuest Results window -> Edit -> Copy Best solution to Spreadsheet. Then you can save and close the file. When you start up the file again and run OptQuest, it will use the saved solution as the starting solution (base value of the variables) and improve over that and should still end up with the same solution.
Notes:
1. Notice the word should in the last line. In this approach, OptQuest looses its history of previous solutions tried, and might end up re-examining some of the solution space.
2. Hence, the solution path and time taken might differ from what would have been the original solution path had you not disrupted the solver. The resulting solutions in these two cases (undisturbed vs. restarted) should not be considerably different though.

Enjoy your break !!

Note: This post is based of a recent CBUG discussion on the same topic.