## Sunday, March 29, 2015

### Optimization Pro and Con

A tweet by Nate Brixius (@natebrix) led me to read the article "The Natural Order and Divine Order of Optimization" published by the Wisconsin Institute for Discovery, a rebuttal/counterpoint to a New York Times Magazine article titled "A Sucker is Optimized Every Minute". The former sings the praises of optimization (somewhat) and the latter vilifies it (somewhat).

As someone whose research focuses on optimization, I'm generally in the pro camp, but I'll concede that it can be overdone. One of my favorite quotes is of rather fuzzy provenance. I've seen variations of it from multiple authors, phrased as a monologue by, variously, a Zen master, a playboy or a dying man (but not, so far, a dying playboy Zen master). The version I recall:
I had my chances to marry, but I was looking for the perfect woman. Then one day I found her; but she was looking for the perfect man.
Optimization is fine for some things, inappropriate for others, and (like everything else in life) not a good idea if taken to an extreme.*

* I'm rather proud of myself that I resisted the nearly-inevitable "extreme point" pun there.

## Saturday, March 28, 2015

### CPLEX Performance Variability

It may come as a surprise to some users of CPLEX that its performance is, somewhat intentionally, random. (This may be true of some other optimization software as well. I'll limit the assertion to CPLEX because CPLEX is the only solver for which I have first-hand knowledge that it's true.)

A certain amount of uncertainty about performance is probably inevitable. Operating systems swap programs in and out of memory and processes in and out of CPU cores in ways uncontrollable by the CPLEX user, so "random" fluctuations in timing are to be expected even when running code with identical parameter settings repeatedly on a single problem on a single machine. When the code is itself multi-threaded, I suspect even more "random" timing issues occur. (CPLEX by default uses multiple threads, but can be restricted to a single thread via a parameter setting.)

Less well known is that CPLEX uses a pseudorandom number stream to control some of its internal decision-making. Marc-André Carle (@ORNinja) has an excellent series of posts on his blog about performance variability, so I'll save myself a ton of typing and refer you there:
I thought I'd share some recent results that demonstrate how one particular source of variability, the random seed, can make a large difference in performance. The name for the seed in the Java API is IloCplex.Param.RandomSeed. According to the CPLEX 12.6.1 parameters manual, its default value changes with each new release of the program, so if you run two different versions of CPLEX on the same problem, you can expect different performance even if none of the changes from one version to the next directly impact your problem. Put another way, your mileage WILL vary.

Here's an example. I took a moderately chewy MIP model that arises in a current research project. I'm not at liberty (nor inclined) to give a full description, but the first few lines of the CPLEX log show that it has what are, by contemporary standards, modest dimensions.

Reduced MIP has 20434 rows, 10239 columns, and 78243 nonzeros.
Reduced MIP has 215 binaries, 0 generals, 0 SOSs, and 0 indicators.
Presolve time = 0.08 sec. (37.97 ticks)
Found incumbent of value 216.000000 after 0.23 sec. (89.01 ticks)

This is a cost minimization problem with an integer-valued objective function. The trivial feasible solution with cost 216 is always found immediately; it's from there that the adventure begins. This version of the problem is not too difficult. Although I have not solved it to optimality yet, I'm fairly confident that the optimal solution has cost 11, and I suspect that a few hours of crunching on my quad-core PC would be enough to prove that solution is optimal.

I did a little test, using CPLEX 12.6.1, in which I did 85 independent solves, using different values of the seed parameter. Runs were limited to 60 seconds each and three CPU threads, and the Emphasis.MIP parameter was set to 4 (emphasizing a search for hidden feasible solutions -- improving incumbents -- over moving the lower bound or proving optimality). In other words, I told CPLEX to find good solutions fast, and to worry about proving optimality later. All other CPLEX parameters were left at default.

Here is a histogram of the lower bounds from those 85 runs:
The bounds ranged from 9.38 to 9.60, and you can see that the vast majority were at the low (inferior) end of the range. Now for a tabulation of the incumbents found within one minute:

Incumbent 11 12 144
Frequency 1 3 81

Recall that I am fairly certain the optimal value is 11. Out of 85 attempts, I landed on the optimal solution once and a near-optimal solution three times; on the other 81 tries, I got a fairly terrible solution (costing more than 13 times the optimum). There are some intermediate solutions with costs in the 20s and 30s that show up in the node logs for the runs where I got lucky; CPLEX just never happened to be sitting on one of those when the time limit expired.

The point here is that, in practice, users don't run the same problem multiple times with different seeds (if they even know the seed parameter exists, which I suspect most do not). This caught my eye because I'd done a test run on this example, gotten the incumbent down from 216 to 144 (with a lower bound under 10) after a minute or so, and thought "hmm, this is going to take a while". I might add that 144 was found fairly quickly, so there was a lengthy stretch with no progress in the incumbent before time ran out. Had I been lucky on that pilot run (where, if the table is to be believed, being lucky has probability a bit under 1 in 20), I would have gotten a really good solution (11 or 12) in the first minute, with a lower bound of 10 (rounding up, since the objective is integer-valued), and thought "hmm, this problem is pretty easy".

So, is the problem easy or not? Beats me.

I'll finish this with a pointer to a slide in a recent present on advances in CPLEX. The slide mentions a "racing ramp-up phase" that can be used when running CPLEX on multiple processors in parallel. In essence, CPLEX tackles the same problem independently on multiple processors, using different parameter settings for each. After a set amount of time (the end of the "ramp-up" phase), the results are compared, a winner is declared, and the processors then all proceed to work on finishing the winner's search tree, using the winner's parameter settings.

With or without parallel processors, a user can replicate this to some extent. In particular, you can do a set of short runs with different seeds, find the best incumbent solution found by any of them, restart with that seed and do a full production run. I don't think there's a way to recycle the winner's search tree (unless you get lucky and the last run was the winner, in which case you can just continue it), but if the ramp-up phase is kept short, this could still help.

Bottom line: a "hard" problem might be truly hard (CPLEX will take a long time no matter seed you use), or it might be hard in the sense that there are (many) more bad seeds than good. Then again, maybe you just need better karma.

## Tuesday, March 24, 2015

### Updated Java Utilities for CPLEX and CP Optimizer

I just finished adding a feature to a utility library I use in Java projects that employ either CPLEX or CP Optimizer. In addition, I moved the files to a new home. The library is free to use under the Eclipse Public License 1.0. The code is mentioned in previous posts, so I'll just quickly summarize the content here and refer interested parties to the earlier posts:
The latest source code is now available from the Michigan State University Gitlab server. You do not need to log in, and you do not need to be a Git user. You can just click the download button near the upper right to get a ZIP archive of the source code. If you run into any bugs, there is an issue tracker on the MSU site where you can report them (please!). If you just want a binary (.jar) file and the Javadoc documentation, you can download a .zip file from my MSU web space.

In the following description, please assume that cplex and cp are instances of IloCplex and IloCP respectively. The main reason I developed the library is that I like to experiment with different parameter settings for CPLEX and CP Optimizer. Rather than hard coding a parameter (e.g., cplex.setParameter(IloCplex.Param.Emphasis.MIP, 3)) and then having to edit and recompile the code to try a different value, I prefer to pass CPLEX and CP Optimizer parameters to the program through the command line (or, if my program has a graphical interface, through a user dialog). The equivalent code to the previous example might look like cplex.setParameter(pname, pval) where pname is a string with value "Emphasis.MIP" (the minimum portion of the full parameter name necessary to uniquely identify the parameter) and pval is a string with the value "3".

With that as background, here is a summary of the contents of the library:
• CplexParameterSetter: Use an instance of this class to apply parameter settings (each specified by two strings, parameter name and new value) to an instance of IloCplex. Example code appears in the May 2014 post (look for the version 2.0 sample).
• CPOptimizerParameterSetter: Use an instance of this class to apply parameter settings to an instance of IloCP. Again, sample code is in the May 2014 post.
• CPDisplay: The static method CPDisplay.asString(cp) will produce a summary of the model in cp, suitable for printing or displaying. For whatever reason, IloCplex.toString() produces a human-readable version of a CPLEX model, but IloCP.toString() just prints the object's hash code. This class provides a workaround. For the output to really be readable, you need to be assiduous about assigning meaningful names to the variables and constraints in the model. I originally released this (in the March 2014 post) as an override to IloCP.toString(), but I decided a static method was easier to use (does not require extending IloCP).
• Main program: The project comes with a main program. If you run it, you will get an alphabetized list of all parameters known to the versions of CPLEX and CP Optimizer that you are using. For instance, run against CPLEX Studio 12.6.1, the output I get looks like this:
CPLEX:
AbsGap double  IloCplex.Param.MIP.Pool.AbsGap
AbsMIPGap double  IloCplex.Param.MIP.Tolerances.AbsMIPGap
AbsMIPGap double  IloCplex.Param.MIP.PolishAfter.AbsMIPGap
[...]
WriteLevel int  IloCplex.Param.Output.WriteLevel
ZeroHalfCut int  IloCplex.Param.MIP.Cuts.ZeroHalfCut

CPOptimizer:
AllDiffInferenceLevel int  IloCP.IntParam.AllDiffInferenceLevel
AllMinDistanceInferenceLevel int  IloCP.IntParam.AllMinDistanceInferenceLevel
[...]
WarningLevel int  IloCP.IntParam.WarningLevel
Workers int  IloCP.IntParam.Workers

At least for me, the main use of this is to figure out which parameter names are unambiguous and which are repeated. For instance, the CPLEX parameter IloCplex.Param.MIP.Pool.AbsGap can be abbreviated "AbsGap"; but the minimum amount necessary to specify the absolute MIP gap for convergence is "Tolerances.AbsMIPGap", since "AbsMIPGap" could also refer to IloCplex.Param.MIP.PolishAfter.AbsMIPGap.
If you have questions about the proper use of the code, please feel free to post them in as comments here. If you run into bugs (or missing features), please use the issue tracker. That will help me keep tabs on what needs to be done.

## Thursday, March 19, 2015

### An SSH Glitch

Something weird happened with SSH today, and I'm documenting it here in case it happens again.

I was minding my own business, doing some coding, on a project that is under version control using Git. After committing some changes, I was ready to push them up to the remote (a GitLab server here at Michigan State University). I had already set up the GitLab account with my RSA public key, and I had done previous pushes successfully. Today, suddenly, the push failed with a message that boiled down to the server not recognizing my key.

After the usual amount of fruitless messing around, I ran the command
in a terminal window. This is supposed to list the keys that the ssh-agent program recognizes. Oops! The response was "The agent has no identities."In other words, my public/private key pair, which I've been using for ages, was not being seen by ssh-agent.

I don't know if this is the result of a software update or a boot anomaly. The keys were right where they were supposed to be, in ~/.ssh, unchanged. Restarting ssh-agent did not help. What fixed it was to run
in a terminal. I did that when I first set up the keys, and I don't know why I had to do it again. We'll see if I'm forced to do it a third time.

After that last command, ssh-add -l showed the "fingerprint" of my key. I still could not push to the GitLab server, though. So, on the GitLab server, I tried to install my public key (with a new name) and was told the fingerprint matched my existing key. In desperation, I deleted the old key and installed the "new" public key (the one with the same fingerprint as the key just deleted). Bear in mind that at no point in this mess did I generate new keys; the key I installed was the same one previously installed (and just deleted). For whatever reason, that worked; the server accepted the commit.

<sigh>Now I have to make sure that every other server using my public key also recognizes my digital signature.</sigh>

Update: The business about using ssh-add may be a red herring. I just checked my laptop (which uses the same public/private key pair and has the same Git projects on it. I ran ssh-add -l and again got the "no identities" message. Then I ran a Git "pull" command (from the NetBeans IDE), using my key, and (after giving a password to unlock my keyring) it worked! I then tried ssh-add -l again, just to make sure unlocking the keyring had not caused ssh-agent to sober up, and the response was still "no identities". So having no listed identities apparently does not prevent applications from using the key pair.

On the home PC, I frequently (but not always) get the prompt to unlock my keyring when banging on a Git server. It didn't happen today, and I didn't think anything of that at the time. I'm now left to wonder if the whole problem with logging into the server had something to do with (a) the keyring being locked and (b) the system, for whatever reason, never asking me to unlock it?

Or maybe it's just gremlins. The little buggers tend to be a bit unpredictable.

## Constraints: What restricts your decisions?

At this juncture, the ideal set of decisions is likely to be obvious. Make an infinite number of anvils and sell them all for a profit (at the risk of causing road runner extinction). Eat as much as you want of all the foods you like. Sadly, various factors will constrain your free will here. There's only so much iron available for anvils, your factories have limited capacity, and demand for anvils is not infinite. Your wallet will not support that all-steak diet, your wardrobe will not support the change in dimensions resulting from all the ice cream you can eat, and your doctor (or mother) (or both) absolutely insists that something green is mandatory (and mint ice cream does not qualify).

Your task now is to identify the aspects of the situation that will limit your decisions and express them as equations or inequalities involving your decision variables. Again, you may identify auxiliary variables during this process. If you do not have a specific accounting cost for holding inventory, you might not have identified it in the previous section. When it comes time to connect anvil production with anvil shipments, though, you may find it convenient to create inventory variables.

For strictly mathematical reasons, you will need to avoid strict inequalities. Where quantities are discrete, this is easy. "I need to make more than 50 blue anvils" (strict inequality, verboten) is equivalent to "I need to make at least 51 blue anvils" (weak inequality, perfectly legal). In other cases, it may not be so easy. "We need to make some blue paint" sounds a lot like "blue paint production $> 0$", but that's a strict inequality. Your choices are either "blue paint production $\ge 0$" (allowing the possibility that you produce no blue paint) or "blue paint production $\ge \epsilon$ where $\epsilon$ is a strictly positive parameter (a lower production limit that is the minimum amount you can live with).

I hope someone finds that helpful.