Saturday, November 9, 2013

LP Cutting Planes in CPLEX

Cut generation, as used in what follows, refers to generating constraints for a mathematical program "on the fly" (based on intermediate solutions), rather than adding all relevant constraints at the outset of the problem. It is typically used when  there is an astronomical number of possible constraints, most of which will turn out not to be very useful.

Someone recently asked me which is faster in CPLEX when solving a linear program with cut generation:
  • solving the LP, adding cuts, solving the LP again ...; or
  • using a callback to add the cuts on the fly.
My initial guess (which I think proved correct) was that the first method is likely to be faster, or at least not notably slower, than the second method, but I suppose it might be possible for the second method to win if problems are large. (I'm not at all confident about that last point.)

What makes the "solve - cut - repeat" loop competitive is that CPLEX can, and with default settings does, "warm start" the solution of a linear program based on the most recent previous solution of the model. The model can be changed between calls to the solver, so long as CPLEX recognizes that it is the same problem. Generally, warm starting after adding (or removing) constraints expedites the solution process (relative to starting from scratch) if the number of changes is fairly modest.

The second method involves attaching a lazy constraint callback to the model. Each time a potential new optimum for the LP is found, the solution will be passed to the callback, which can either reject the solution by adding one or more constraints (cuts), at least one of which renders the candidate solution infeasible, or accept the solution by refraining from adding any new cuts. The original LP now becomes the root node "relaxation" of the problem. CPLEX will never attempt to branch, since there are no integer variables in the LP (and thus none that will take on fractional values), so once the root node is optimized, we're done.

A bit of trickery is required for the second method, since the lazy constraint callback is available only for integer and mixed integer programs. We can spoof CPLEX by adding a single integer or binary variable to the model, but not including it in any constraints or in the objective function. This works in the various programming APIs, where there is a method to add an object (in this case our dummy variable) directly to the model. In the interactive optimizer, it may be necessary to add a vacuous constraint as well. For example, if we add a dummy binary variable $z$, we can use the constraint $z \le 1$ to force $z$ into the model (and convince CPLEX the model is an integer program) without affecting the optimal solution.

I tested this with a couple of small convex optimization problems, using a rather old school cutting plane method. The problems take the form\[ \begin{array}{cccc} \mathrm{opt} & c'x\\ \mathrm{s.t.} & Ax & \le & b\\ & g(x) & \le & h\\ & x & \in & \Re^n_+ \end{array} \]where $g:\Re^n_+\rightarrow\Re^m$ is convex and differentiable and "opt" denotes either maximization or minimization. The initial linear program omits the constraint(s) $g(x)\le h$. When a candidate solution $\hat{x}$ is obtained, we plug it into each $g_i()$. If $g_i(\hat{x})\gt h_i$ by more than some rounding tolerance, we add the constraint\[\nabla g_i(\hat{x})x \le h_i-g_i(\hat{x}) + \nabla g_i(\hat{x})\hat{x}.\]If multiple constraints are violated, we can add just one cut or (as I did in my tests) simultaneously add a cut for each violated constraint.

I wrote a test program in Java (for which the source code is available), using two test problems:\[ \begin{array}{cccc} \mathrm{max} & 3x_{1}+x_{2}\\ \mathrm{s.t.} & x_{1}+x_{2} & \le & 3\\ & x_{1}^{2}+x_{2} & \le & 1\\ & x & \ge & 0 \end{array} \]and\[ \begin{array}{cccc} \mathrm{max} & 3x_{1}^{\phantom{2}}+\phantom{2}x_{2}+4x_{3}^{\phantom{4}}\\ \mathrm{s.t.} & \phantom{{3}}x_{1}^{\phantom{2}}\phantom{-2x_{2}+4x_{3}^{4}} & \le & 5\\ & \phantom{{3x_{1}^{2}-2}}x_{2}\phantom{+4x_{3}^{4}} & \le & 5\\ & \phantom{3x_{1}^{2}-2x_{2}+4}x_{3}^{\phantom{4}} & \le & 5\\ & \phantom{3}x_{1}^{2}-\phantom{1}x_{2}\phantom{+4x_{3}^{4}} & \le & 3\\ & \phantom{3x_{1}^{2}}-2x_{2}+\phantom{4}x_{3}^{4} & \le & 0\\ & x & \ge & 0. \end{array} \]My code recorded execution times for the CPLEX 12.5.1 solver using both methods, excluding time spent setting up the models, time spent reporting the results, and time spent generating the cuts. (Time spent adding the cuts to the models was included.)

The problems are both very small, so their execution times (which varied significantly during repeated runs of the same models) may not be indicative of what happens on more difficult problems. On the first problem, the first method usually beat the second method by a small margin (typically about 6 milliseconds v. 7 milliseconds), but I did see an occasional win for the second method (by about 1 ms.). On the second problem, the first method consistently beat the second method, usually by one or two milliseconds, occasionally by a bit more. The number of cuts generated (five in the first problem, eleven in the second) did not vary between methods, nor did the optimal solution reported.

Monday, November 4, 2013

The Triangle Inequality in Transportation Networks

I just noticed that I managed to go the entire month of October without a post. This was a combination of several factors: it's conference season for me (INFORMS in October, Decision Sciences Institute coming up soon); I was a guest blogger at the INFORMS conference (sample here); deciduous trees + autumn + Michigan = yard work; college and pro football games to watch; laziness; and a somewhat silent muse. Today's entry was prompted partly by a vague sense of guilt and partly by a bug I encountered.

The bug occurred in the context of some ongoing research, involving optimization of vehicle assignments and routing. Computational results on artificial data sets occasionally produced "optimal" solutions that were demonstrably suboptimal. The source of the problem turned out to be instances where the data did not satisfy the triangle inequality, one of the assumptions upon which our model is based. Fixing the data cured the problem.

This in turn reminded me of a dissertation proposal defense I attended a couple of years ago. The candidate (who, in fairness, had not received any training in operations research) was proposing a set of research questions in logistics that only made sense in situations where the underlying transportation network failed to satisfy the triangle inequality, either when the metric was distance traveled or when it was travel time. This prompts some observations about the triangle inequality in the context of transport networks. To save myself some typing, I'll henceforth use the phrase "literal network" to mean a network in which nodes are locations on a map and arcs or edges are literally segments of a road network. Some but not all of what follows may also apply to rail, sea or air transport.
  1. When the metric is travel time, literal networks will frequently not satisfy the triangle inequality. The most direct route from my former employer's campus to my house involved a fairly linear stretch on streets with comparatively low speed limits and frequent traffic lights. By going a little out of my way, I could turn the bulk of that drive into a sprint up an interstate highway. So A (campus exit) to B (home) had a substantially longer driving time, particularly at rush hour, than A to C (highway entrance) to D (highway exit) to B.
  2. When the metric is distance, literal networks may still not satisfy the triangle inequality. The "direct" route from A to B may loop around some obstacle (hill, swamp, US Capitol ... but I repeat myself) or have a significant vertical component. If A to C and C to B are fairly flat and linear segments, their combined length may be less than the looping A to B.
  3. If your key metric is distance, an undirected graph may be sufficient, unless roads are one-way. If you are concerned with driving time, you probably need a directed network, even if there are no one-way roads. Anyone who has driven during rush hour in the opposite direction of the heavy traffic has experienced (and enjoyed) the asymmetry.
  4. Practically speaking, the network actually traveled by (knowledgeable) drivers will satisfy the triangle inequality in either time or distance, whichever is more important, regardless of whether the literal network does. If I can get from A to B faster by way of C than going directly, I will, and so my A->B "arc" is actually A->C->B.
Points 2 and 4 underly the bug in that research project I mentioned. The arcs were generated without the assumption that distances would satisfy the triangle inequality. The non-triangularity (to coin a term) is plausible given that a rural road network, where road segments are not particularly straight, was being simulated. Travel times were then made proportional to distances, which is reasonable from a literal perspective but neglects the fact that drivers will use an indirect route if it is faster.

Point 4 also relates to what was tripping up the doctoral student I mentioned. His research required a scenario where commercial delivery vehicles would be routed using traveling salesman problems (TSPs), with travel time as the metric, on a network where travel times did not satisfy the triangle inequality. When I pointed out the gist of the fourth point (that drivers will take an indirect route if faster, assuming that their mileage is not being tightly monitored), he was properly scandalized, because it is well known that TSP solutions are Hamiltonian cycles that do not revisit intermediate nodes. The indirect routes would in some cases violate that restriction.

My response was that one has to apply the TSP model to a network in which an arc A->B represents not a literal segment from A directly to B, passing through no other nodes, but a "virtual" arc originating at A, terminating at B and stopping at no intermediate nodes (but possibly passing through them, if that is more efficient). He was quite skeptical, until I pointed out that if your warehouse is at the end of a blind alley and you are not allowed to repeat any streets, you can never return. That convinced him. There are a few situations in which the no repetition rule needs to be enforced rigorously, typically when you are leaving burned bridges, land mines or irate traffic cops in your wake. In most other cases, I think you can be flexible.

Replacing the arcs in my joint research problem with "virtual arcs" amounted to computing shortest (time) paths between all pairs of nodes, and replacing the original travel time on each arc with the time of the shortest path. I used something equivalent to the Floyd-Warshall algorithm for that. In the other incident, once I had convinced the doctoral candidate that travel times typically satisfy the triangle inequality (and that the no repetition rule of the Hamiltonian cycle is to be taken a bit loosely), he contacted two vendors of commercial routing software. As he reported the conversations to me, neither had taken this into account in their software, which I find a bit surprising.

One final note: if your literal network satisfies the triangle inequality with respect to distances (road segments are fairly flat and linear) but not time (due to traffic signal, asymmetric traffic volumes, etc.), and you "virtualize" the network as I described to get times that do satisfy the triangle inequality, your distances may no longer satisfy it. You can't have everything.

-----

Muses: Greek mythology seems to be a bit vague about the number of Muses -- nine seems to be a common estimate -- and most sources ignore my personal muse: Erratic, the unreliable twin of Erato.