Sunday, July 29, 2012

Is Mathematics Only for the Mathematically Inclined?

A colleague just sent me a link to an opinion piece in the New York Times Sunday Review with the somewhat provocative title "Is Algebra Necessary?" The author (Andrew Hacker, an emeritus professor of political science), while acknowledging the profound role of mathematics in society, questions whether we should require algebra (or, as I read the piece, anything beyond basic arithmetic and some basic level of statistical numeracy) of all school children. He quotes a professor of educational psychology (John P. Smith III) from Michigan State University asserting that “mathematical reasoning in workplaces differs markedly from the algorithms taught in school.” (For what it's worth, MSU has a nationally renowned College of Education ... but I have a hard time trusting anyone who looks that much like a young Geraldo Rivera.)

I don't want to get bogged down in the details of mathematics education in the U.S., and I certainly agree that the curriculum content (as well as the delivery) needs to be updated periodically. The mathematics of computing was not on anyone's radar when I was an urchin, and it's extremely important today. I'll also stipulate that not every adult in the U.S. (or elsewhere for that matter) needs to be proficient in algebra or trigonometry, let alone calculus. That said, my immediate reaction was that I thought I saw two flaws, one perhaps minor but the other crucial, in the author's argument.

I'll address the minor flaw first. Professor Hacker says (near the middle of the second page):
Of course, people should learn basic numerical skills: decimals, ratios and estimating, sharpened by a good grounding in arithmetic.
By "estimating" I assume he means computing (or intuiting) an approximate value (order of magnitude, interval, ...) for some uncertain quantity (variable). That implies the ability to determine a formula for the variable in terms of other parameters of the problem. For example, if I have a recipe (for a meal, or for a medication I must administer) that calls for a water-based solution with 40% active ingredient by volume, and I need five fluid ounces of the solution, about how much active ingredient do I need? That sounds to me like solving a simple linear equation, which I believe falls into the category of "algebra". (As a sidebar, in the event that this is a medication to be administered to me, and a nurse or pharmacist is doing the calculation, I would really appreciate it if they could do better than just an "estimate".) What if I know that I have five fluid ounces of active ingredient (which must all be used) and require a 40% solution? Can I estimate the amount of water required without knowing some basic algebra?

The major flaw with Professor Hacker's argument is that if we reduce K-12 mathematics education to the bare minimum, as he suggests, it will shut the doors to many college majors before students reach college. When I was in school, separate math classes did not begin until junior high school, so through grade 6 I received the same mathematics instruction everyone else did ... which in Professor Hacker's universe would be thin gruel indeed. I assume the system still operates approximately that way.  The mathematically inclined public school student (which described me at that age) will hopefully be able to go beyond the basic curriculum ... somehow ... at least if his or her parents can afford private tutoring. The generic college-bound student, however, will matriculate with the ability to figure out the tip on a restaurant check, and not much else.

I'm reminded of an incident from graduate school. Back then, registration was done manually in a large gym called "The Pit", and our math department used graduate students to relieve the secretaries of some of the duty (doling out punch cards to students seeking classes). One summer, I was working The Pit with a fellow doctoral student, "Mean Dave" Green. (Don't ask about the nickname, just trust me: he earned it.) An undergrad came up and requested Algebra I. Dave looked at him and asked "Didn't you take that from me fall quarter?" The kid indeed had (and failed the class). Dave asked why he hadn't repeated it immediately, in winter quarter, while the memory was fresh.  The kid had - and failed again. As he did spring quarter. So here he was asking for it for the fourth time. I innocently inquired what his major was. His answer: electrical engineering. The requirements for EE then were: algebra I and II; calculus I through IV; linear algebra; and ordinary differential equations.

Now picture that kid's children (or, um, grandchildren? sigh) reaching college today with Professor Hacker's suggested mathematics curriculum under their belts. Electrical engineering? Not going to happen; the math prerequisites alone would add two years to their degree, charitably assuming that a university was willing and able to staff that many remedial math classes, and charitably assuming that someone 18 years old with a carefully honed disdain or fear (or both) of mathematics is still able to learn math. Other engineering majors? Not likely. (If, somehow, Professor Hacker's position prevails and universities continue to pump out civil engineers nonetheless, remind me to stay off bridges and out of tunnels.)

Computer science uses enough mathematics that I suspect it would be tough to handle. Some business majors might still be open (personnel management?), but the more lucrative ones (finance and accounting in particular) might be off the table for those who had not gotten private tutoring. Physical sciences? So long.

Before I added "emeritus" to my title, I sometimes found myself answering questions from students (or parents) about how much math to take during freshman and sophomore years of college. My response was typically some variation of "as much as you can stand". Most college students do not commit to a major until their junior year. At that point, if you have racked up more math than your major requires, you can usually count the rest as elective credits; but if you are short of math, it can freeze you out of some majors. I've seen a similar phenomenon with graduate school: students who did a non-technical undergraduate major, taking the minimum amount of mathematics they could (sometimes none), and then decided they wanted to go to graduate school in a program with stiff mathematics requirements. Extrapolating that to cover students whose public school mathematics training stopped at counting and basic calculator punching frightens me.

We can legitimately debate what the basic K-12 mathematics curriculum -- the foundation for both attending college and going into a trade -- should be; but if we are going to reduce it as much as Professor Hacker suggests, we'd better come up with a genetic test that can identify a fetus's future trade or college major by birth.

Update: To my utter lack of surprise, other people are weighing in on this. I'm not actively looking for all the commentary (and I'm sure it will be voluminous), but as I come across posts I'll link them here.

Updating Java on Mint (and Ubuntu)

For various reasons, including compatibility issues with some third-party programs and a need to generate cross-platform code on occasions, I routinely use Sun (now Oracle) Java on Linux systems, rather than GCJ (which is frequently included in Linux distributions). Following a suggestion in a recent email from the folks at Mozilla, I checked my Firefox plugins and discovered a security warning for the Java 6 plugin. I'm not sure if I could upgrade just the plugin, using the Java 7 browser plugin while still using the Java 6 JRE (runtime environment) and the Java 6 JDK (development kit); so I decided to upgrade everything to Java 7. Small problem: the Ubuntu repositories still contain version 6 and not version 7.

A quick search turned up a blog post at Web Upd8 which explains why the official repositories lag behind:
Oracle JDK7 itself is not hosted in the PPA because that's not allowed by the new Java license (which is also the reason why it has been removed from the official Ubuntu repositories); the package in the PPA automatically downloads (and installs) Oracle Java JDK 7 from its official website and installs it on your computer, just like the flashplugin-installer package does.
The PPA (personal package archive) repository to which they refer contains a package that automatically downloads and installs the current version of the Oracle Java 7 JDK. (Note that there is no comparable package to install just the JRE; you have to install the full JDK, which contains the JRE as a subset. That's fine with me, since I need the JDK for program development.)

The instructions are straightforward and worked fine for me. Once I had confirmed that Java 7 was installed and was the default choice, there was one more step. I use the Netbeans IDE for writing Java code. I could have installed the latest version with Java 7 bundled, but that's a hefty download, and my current version of Netbeans is recent enough. In Netbeans, I clicked Help > About and discovered that Netbeans was still launching with Java 6. The solution is to edit the configuration file <netbeans>/etc/netbeans.conf (where <netbeans> is the top directory of the Netbeans installation) and change the setting for netbeans_jdkhome to point to the Java 7 directory. When I restarted Netbeans, it launched with Java 7 and also changed the default Java platform for compiled code to Java 7.

Big thanks to the PPA maintainers for simplifying the process.

Friday, July 27, 2012

Benders Decomposition in CPLEX

Author's note: This post is getting a bit long in the tooth. For users with CPLEX 12.7 or later, you might want to read this more recent post (after reading this one).

I finally got around to constructing a reasonably small example of Benders decomposition for solving a mixed-integer linear program (MIP) in CPLEX. What I am about to describe works in CPLEX 12.4 (possibly 12.2 or 12.3) and hopefully will work in the future (since I have no ambition to revise the post). The significance of the CPLEX version is twofold: IBM refactored the API for cuts, emphasizing the distinction between "user cuts" (cuts that tighten bounds without eliminating integer-feasible solutions) and "lazy constraints" (cuts that eliminate feasible solutions); and they decided that lazy constraint callbacks would be called whenever CPLEX found an incumbent, regardless of when and how it found the incumbent (node LP solution, application of heuristics, lucky guess ...), which eliminated the need for incumbent and branch callbacks. If you are still using an earlier version of CPLEX, I strongly recommend you upgrade. Among various reasons, you will need a bunch more code to pull off Benders correctly with an old version.

My test problem is simple: a transportation problem with fixed charges for the use of supply sources. (In the code, I refer to sources as "warehouses" and sinks as "customers"). The fixed-charge transportation problem is well know and rather easy to solve -- too easy to warrant Benders decomposition, in fact. I chose it because it is sufficiently straightforward that it should not interfere with understanding how the code works. (Subsequent to writing the first draft of this post, I discovered a nice document by Erwin Kalvelagen that demonstrates Benders decomposition in GAMS. He too used a fixed-charge transportation problem, but in his version there is a fixed charge for each arc used, rather than for each source.)

The mathematical model is as follows: \begin{gather*} \textrm{minimize }c'x+f'y\\ \textrm{s.t. }\sum_{i}x_{ij}\ge d_{j}\quad\forall j\\ \sum_{j}x_{ij}\le s_{i}y_{i}\quad\forall i\\ x\ge0\\ y_{i}\in\{0,1\}\quad\forall i. \end{gather*} Continuous variable $x_{ij}$ represents the flow from source $i$ to sink $j$; binary variable $y_{i}$ signals whether source $i$ is used (1) or not (0). Parameters $c\ge0$ and $f\ge0$ are unit flow costs and facility use charges respectively, while $s$ and $d$ are supplies (source capacities) and demands.

My Benders decomposition puts the facility decisions in a master problem (MIP) and the flow decisions in a subproblem (linear program). The subproblem is just \begin{gather*} \textrm{minimize }z=c'x\\ \textrm{s.t. }\sum_{i}x_{ij}\ge d_{j}\quad\forall j\\ \sum_{j}x_{ij}\le s_{i}y_{i}\quad\forall i\\ x\ge0 \end{gather*} in which the $y_{i}$ are treated as constants. The master problem is \begin{gather*} \textrm{minimize }f'y+\hat{z}\\ \textrm{s.t. }a^{(k)\prime}y-\hat{z}\le b^{(k)}\quad\forall k\in\mathcal{O}\\ a^{(k)\prime}y\le b^{(k)}\quad\forall k\in\mathcal{F}\\ y_{i}\in\{0,1\}\quad\forall i\\ \hat{z}\ge0 \end{gather*} where $\hat{z}$ is a surrogate for the flow cost component ($z$) of the original objective function (i.e., the value of the subproblem objective), $a^{(k)}$ and $b^{(k)}$ are coefficient vector and constant term for a cut generated during the solution of the problem, $\mathcal{O}$ is the index set of "optimality" cuts (cuts that attempt to correct the master underestimating the true flow costs for a given set of decisions $y$, i.e., $\hat{z}<z$) and $\mathcal{F}$ is the index set of "feasibility" cuts (cuts that eliminate use decisions $y$ that provide too little capacity to satisfy demand). Initially $\mathcal{O}=\emptyset=\mathcal{F}$, and the first master problem solution is both trivial ($y=0$, $\hat{z}=0$) and infeasible in the original model.

To generate cuts, we will need to work indirectly with the dual of the subproblem: \begin{gather*} \textrm{maximize }z=d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}\\ \textrm{s.t. }\lambda_{j}+\mu_{i}\le c_{ij}\quad\forall i,j\\ \lambda\ge0\\ \mu\le0 \end{gather*} where $\lambda$ is the vector of shadow prices of the demand constraints and $\mu$ the vector of shadow prices of the supply constraints. Given the assumption that all flow costs are nonnegative ($c\ge0$), the dual is obviously feasible ($\lambda=0=\mu$ works). If, for a given choice of $y$, the LP subproblem has an optimal solution, then so will the dual (with equal objective value $z$). Should the master problem generate a solution $(y,\hat{z})$ for which the subproblem is feasible with optimal value $z=\hat{z}$, we accept that solution as the new incumbent. If, instead, the subproblem is feasible but $\hat{z}<z$, then we add a new optimality cut $\hat{z}\ge d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}$ where $(\lambda,\mu)$ is the dual solution to the subproblem (so that $a_{i}^{(k)}=s_{i}\mu_{i},b^{(k)}=-d'\lambda$).

Should the master problem generate a solution $(y,\hat{z})$ for which the subproblem is infeasible, then the dual will be unbounded, and there will be a dual solution $(\lambda,\mu)$ such that $t(\lambda,\mu)$ is feasible in the dual for all $t>0$ and $d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}>0$. Note that the master problem variables ($y$) have no role in defining the dual problem's feasible region; in the dual, they occur only in the objective. So the dual ray we just obtained will remain a recession direction for the dual regardless of $y$, and to get a feasible subproblem we need to modify $y$ so that $d'\lambda+\sum_{i}(s_{i}\mu_{i})y_{i}\le0$ (i.e., so that the ray now points in an undesirable direction). This gives us our feasibility cut ($a_{i}^{(k)}=s_{i}\mu_{i},b^{(k)}=-d'\lambda$). Observe that optimality and feasibility cuts have the same constant term and coefficient vector; the only difference is the inclusion of $\hat{z}$ in optimality cuts and its exclusion in feasibility cuts.

On to the code. Let me start with my usual disclaimer: I am not a professional coder, and I make no claim that my code is particularly efficient, nor that I took the fastest/most straightforward route to the solution. As usual, the code is in Java. For the most part, conversion to other APIs that expose the callback functionality should be straightforward, but there are a few wrinkles to note.
  • If you code the comparisons $\hat{z}<z$ and $z=\hat{z}$ literally, the gods of computing will wait until your back is turned and then bite you in your derrière. There will be rounding errors in both $z$ and $\hat{z}$, so asking for exact equality is inviting disaster. Insert some rounding tolerance into your comparisons. In my code, look for a constant named FUZZ.
  • When the LP subproblem is feasible, we can get the dual values directly with a function named getDuals. When the LP is infeasible, it is tempting to use the getRay function to get the dual ray, but to do that you would have to formulate the dual itself, solve it directly, and then invoke getRay on the dual. It is easier to ask for a Farkas certificate, which provides the ray without further ado.
  • I found it necessary to turn off presolving of the subproblem. If the presolver detects that the subproblem is infeasible, it reports that fact with no option to get a Farkas certificate. So I sacrifice whatever speed advantage the presolver confers in order to have access to the Farkas certificate when the subproblem is infeasible.
  • Update 11/30/16: I recently tripped over another thing that I should have realized but somehow overlooked. In order to access the Farkas certificate, the LP subproblem must be solved by one of the simplex solvers (primal, dual or network). I'd been leaving the choice up to CPLEX, and on one instance it used the barrier method, decided the LP was infeasible, and then threw an exception when the Farkas certificate was requested. There's also a glitch in the CPLEX documentation for the dualFarkas method. It turns out that dualFarkas only works if the LP was solved by the dual simplex algorithm. So the key is to set the RootAlg parameter for the LP to 2 (dual simplex).
  • Forming a cut (either optimality or feasibility) involves multiplying the right-hand side of a subproblem constraint, expressed as a linear function of the master variables, by the corresponding dual value. In this particular example, the right-hand side always has a single term, either a constant ($d_{j}$) in a demand constraint or a multiple of a single master variable ($s_{i}y_{i}$) in a supply constraint.
    • More generally, though, each right-hand side could contain multiple terms. I deliberately over-engineered my code a bit so that it will generalize to other problems. What I do in the code is form and store an instance of IloNumExpr for each right-hand side, at the time that I am creating the subproblem. Inside the lazy constraint callback, I multiply those expressions by the corresponding dual values and sum to get the necessary expression ($a^{(k)\prime}y-b^{(k)}$) for the new cut. Somewhat counterintuitively, and for rather arcane reasons, this cannot be done with IloLinearNumExpr, even though the right-hand sides are in fact linear.
    • Since any master variable can occur in multiple subproblem right-hand sides, the method I just described for assembling the cut expression will typically result in multiple terms involving the same variable ($\dots+\alpha_{3}y_{7}+\dots+\alpha_{12}y_{7}+\dots+\alpha_{23}y_{7}+\dots$). Not to worry: when you actually add the cut, CPLEX will combine like terms ($\dots+(\alpha_{3}+\alpha_{12}+\alpha_{23})y_{7}+\dots$) for you.
  • Interpreting the Farkas certificate is a bit tricky, since CPLEX returns the dual values in an arbitrary order (see my earlier post "Infeasible LPs and Farkas Certificates"). To deal with this, I store the subproblem constraints in two arrays (one for supply constraints, one for demand constraints), and then use a HashMap to map each constraint to the corresponding right-hand side expression.
    • The C++ Standard Template Library has some analogous container class you can use (don't ask me which; I'm blissfully C++-free and mean to stay that way). With Python, you would create a dict. With other APIs, you are on your own.
    • The reason I use two arrays, rather than storing all constraints in a single array, is that it facilitates updating right-hand sides in the subproblem each time the callback is invoked. I need to recompute the RHS of each supply constraint ($s_{i}y_{i}$, using the new value of $y_{i}$), but not the RHS of each demand constraint ($d_{j}$). Having the supply and demand constraints separate makes this a bit easier, but in a more general application (where most or all constraints require an update), you may want to use more than two arrays, or just a single array.
  • Once the master problem has been solved, you have the optimal set of warehouse use decisions ($y$). There's a fairly good chance you will also want to know the optimal flows ($x$). The subproblem may well have been solved more recently than when the callback accepted what proved to be the optimal $y$; after that node, CPLEX may have generated some incumbents that your callback tested and rejected. So the most recent subproblem solution may not correspond to the optimal $y$. One resolution is to solve the subproblem one last time, using the optimal $y$. The other (which I use in the code) is to store the subproblem solution any time an incumbent is accepted, overwriting any previously stored solution. When the master is solved, the stored subproblem solution will be the correct flows.
That's it. To avoid inducing comas among readers, I will not go through the code line by line here. Instead, I will just provide this link for anyone who wants to download it. [UPDATE: A newer version of CPLEX broke the code. I have created a repository for the updated code; please see this more recent post for details.]

Saturday, July 14, 2012

Piecewise Linear Functions in CPLEX

Patricia Randall just published a blog post, "Piecewise Linear Objective Functions in Mathematical Models", in which she mentions using (or trying to use) the support for piecewise linear functions built into CPLEX APIs. (There's also a rather groan-worthy mathematical pun at the end of the post, in case you want to abuse yourself.)  I've talked about piecewise linear functions before (links below), and I'll try not to repeat myself too much. I'm aware of the function Patricia refers to (in the Java API, it is IloMPModeler.piecewiseLinear(), and is overloaded several ways). I've never used it, though.

From a modeling perspective, not all piecewise linear functions are created equal. In particular, those that induce an economy of scale require the use of binary variables (or modified branching logic in a MIP, or some other trick) to implement, whereas those that induce a diseconomy of scale can be converted into sets of linear constraints. Cutting to the chase scene, if your model would otherwise be a linear program, sometimes piecewise linear functions turn it into a MILP and sometimes they don't.

As I read Patricia's piece, I got curious whether the implementation of  IloMPModeler.piecewiseLinear() included a check for whether binary variables were needed (which is easy in some specific contexts but could be hard in the general case), or whether it always used binary variables. So I created a little test case, and based on that test case, it appears the latter is true.

The test case is a continuous knapsack problem (maximization) where the value (objective contribution) of each item is a concave piecewise linear function with three segments (two breakpoints). The value of a typical item might look like
concave piecewise linear function with three segments

where $f_1,f_2,f_3$ are linear functions that extrapolate the three segments of the value function $f$.

The base model is \begin{gather*}\textrm{maximize }\sum_{i=1}^n f_i(x_i)\\\textrm{s..t. }\sum_{i=1}^n w_i x_i \le C\\0 \le x_i \le U_i \quad\forall i\in \{1,\dots,n\}\end{gather*} where $x_i$ is the amount of item $i$ packed in the knapsack, $U_i$ is the amount of that item available, $f_i(x_i)\ge 0$ is the (piecewise-linear) value of packing $x_i$, $w_i>0$ is the unit weight of item $i$, and $C$ is the capacity of the knapsack. I randomly generated parameter values, set this model up in CPLEX using IloMPModeler.piecewiseLinear() to express each $f_i()$, and solved it. CPLEX turned the model into a MILP.  Just to be sure I had not erred, I created a second CPLEX model using the following formulation:\begin{gather*}\textrm{maximize }\sum_{i=1}^n z_i\\\textrm{s..t. }\sum_{i=1}^n w_i x_i \le C\\z_i \le m_{i1} x_i\quad\forall i\in \{1,\dots,n\}\\ z_i \le (m_{i1}-m_{i2})b_{i1} + m_{i2}x_i \quad\forall i\in \{1,\dots,n\}\\ z_i \le (m_{i1}-m_{i2})b_{i1} + (m_{i2}-m_{i3})b_{i2} + m_{i3}x_i \quad\forall i\in \{1,\dots,n\}\\
0 \le x_i \le U_i \quad\forall i\in \{1,\dots,n\}\\0\le z_i\quad\forall i\in \{1,\dots,n\}\end{gather*}where $z_i$ is the objective contribution of item $i$, $b_{i1}$ and $b_{i2}$ are the breakpoints for $f_i$ (with $0<b_{i1}<b_{i2}<U_i$) , and $m_{i1}>m_{i2}>m_{i3}>0$ are the slopes of $f_i$ ordered by increasing $x_i$. CPLEX solved this model as an LP and returned the same solution that it found for the MILP.

Is this a matter of concern? I'm not positive, but I'm inclined not to think so, or at least not too much. CPLEX solved the MILP model ($n=50$ items) at the root node, and I suspect this is not a coincidence -- I'm pretty sure the LP relaxation of the MILP will push the binary variables in the desired direction. There is a size difference, though, as shown in the following table:


As a sidebar, with three segments (two breakpoints) in each function $f_i$, CPLEX introduces no binary variables but instead introduces one Special Ordered Set (I assume type 2) per item. With two segments (one breakpoint), though, it introduces one binary variable per item and no SOSes.

Related posts:

Friday, July 6, 2012

Modeling Absolute Values

There's a category of frequently asked questions that, at some point, boils down to how to model an absolute value in a linear (LP) or mixed-integer linear program (MILP). There is typically more than one answer, and I became curious how different methods perform. Having idle time on my hands, I decided to try a little experiment. Being lazy (and not having to worry about pacifying reviewers and editors -- hooray for blogs!), I decided to do just one experiment. Since this involves MILPs, let me start with a disclaimer.

There are no safe generalizations about MILPs other than this one.
Your mileage will vary.
Past performance is not a predictor of anything.
Stuff happens.

(Hopefully that was sufficient. I should probably park that disclaimer in the margin of the blog so that it appears on all posts.)

Let's say that we have a variable (or variable expression) $x$ whose absolute value we need. Any trick for modeling absolute values will require $x$ to be bounded, say $L\le x\le U$.  If $L\ge 0$ then $|x|=x$, and if $U\le 0$ then $|x|=-x$, so we will dispense with the boring cases and assume that $L<0<U$.

The most common way to model $|x|$ starts with writing $x$ as the difference of two nonnegative variables: \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le U\\ 0\le x^{-}\le|L|. \end{gather*}It is then tempting to write $|x|=x^+ + x^-$, and in point of fact that comes close. What we can say is that\[|x|\le x^+ + x^-.\]If the net effect of inflating the claimed value of $|x|$ is to hurt the objective value, then the solver will force either $x^+$ or $x^-$ to be zero and the other one to be $|x|$. So this is all you need to do in some cases.

The problem is that if there is a possibility that inflating the estimate may benefit the objective, such inflation may happen. For instance, if we are maximizing $|x|$ and the correct answer is $x=a>0$ (so $|x|=a$), what we will actually get is $x^+=a+\delta$, $x^-=\delta$ with $\delta=\min\{|L|, U-a\}$ and $x^+ + x^- = a+2\delta$ rather than $a$.

Binary method

This brings me to the first method I know for modeling absolute values (outside of the case mentioned above where you get lucky): split the variable or expression into positive and negative parts as above, and add a binary variable to force one or the other to zero. Mathematically, this looks like \begin{gather*} x=x^{+}-x^{-}\\ 0\le x^{+}\le Uy\\ 0\le x^{-}\le|L|(1-y)\\ y\in \{0,1\}.\end{gather*}Besides the fact that it is the best known approach, a potential advantage of the binary method is that it contributes to bounding at nodes in the search tree. The smaller $U$ and $|L|$ are, the greater the value in bounding. Disadvantages include (a) adding another binary variable to the problem (MILP solution times tend to be sensitive to the number of integer variables) and (b) possibly causing numerical stability issues (discussed here, so I won't repeat the details) if $U$ and/or $|L|$ is large, since they are now constraint coefficients.

SOS1 method

In the commentary following the Big-M post, Ed Klotz points out that an alternative to using a binary variable to zero out another variable (or not) is to use SOS1 constraints.  Applied here, we skip the binary variable $y$ and just declare $\{x^+,x^-\}$ to be an ("an"? "a"? I'm so confused!) SOS1. This of course assumes the solver knows what to do with an SOS1. The upsides of this are that it avoids an additional binary variable, and avoids any numerical stability problems. The downside is that the fact that one of $x^+$ and $x^-$ must be zero does not contribute directly to bounding. In effect, it becomes a branching rule. The solver knows that at some point it needs to branch and create two children with $x^+=0$ in one and $x^-=0$ in the other. So an SOS1 formulation may be "weaker" than a binary formulation (unless $U$ and $|L|$ are large).

SOS2 method

Thinking about the SOS1 approach made me consider using an SOS2. An SOS2 involves three or more variables in a specified order; the solver interprets the SOS2 constraint to mean that at most two of the variables can be nonnegative, and those two must be consecutive in the given ordering. SOS2 is designed to model piecewise-linear functions, of which the absolute value function is one.

In this approach, we do not use $x^+$ and $x^-$. Instead, we introduce three weight variables $w^+,w^-,w^0$ that form the weights of a convex combination of $U$, $L$ and $0$. So the constraints are \begin{gather*}0\le w^+,w^-,w^0\le 1\\w^++w^-+w^0=1\\x=Lw^-+0w^0+Uw^+=Lw^-+Uw^+\end{gather*}with $\{w^-,w^0,w^+\}$ (in that order) an SOS2. With this approach, \[|x|=|L|w^-+Uw^+.\]The SOS2 method shares one of the virtues of the SOS1 method (no added integer variables) but not the other ($L$ and $U$ creep into constraint coefficients). I'm uncertain what its effect on bounding would be, but probably nothing useful. Consider, for instance, $U=100$ and $L=-100$. Any legal value of $x$ can be written as a convex combination of those two limits (i.e., with $w^0=0$), which would produce an "absolute value" of $100w^-+100w^+=100$ regardless of the actual value of $x$. This would violate the SOS2 constraint, but that constraint will be relaxed in node LPs until the solver decides to branch on it. (After branching, node LPs will contain the correct absolute value.)

ABS method

As noted by Irv Lustig in the comments section, CPLEX contains an absolute value function. Irv believes that CPLEX converts it into an indicator constraint, which then (as I understand things) can either take the binary approach or be handled by a branching rule, at CPLEX's discretion. This would seem to be the most intuitive approach from a user's perspective, but (a) there's a small programming quirk that I'll mention below, (b) the user can't be entirely sure whether it translates into the equivalent of the binary method or the equivalent of the SOS1 method (which, as we'll see, seem to perform differently) and (c) if it's the equivalent of the binary method, the question then becomes whether the "big M" value that CPLEX deduces is tighter or looser than the one the user would employ when applying the binary approach directly.

Test problem

So I devised a little experiment. Let's start with a transportation problem involving $M$ sources and $N$ sinks:\begin{gather*}\textrm{minimize }c'x\\\textrm{s.t. }\sum_{i=1}^Mx_{ij}\ge d_j\quad\forall j\in \{1,\dots,N\}\\\sum_{j=1}^Nx_{ij}\le s_i\quad\forall i\in \{1,\dots,M\}\\0\le x_{ij}\le \min(s_i,d_j)\quad\forall i,j.\end{gather*}The $s_i$, $d_j$ and $c_{ij}$ are supplies, demands and unit flow costs respectively, while $x_{ij}$ is flow from source $i$ to sink $j$. Disdaining the convention of equation constraints, I prefer inequalities (essentially implying that disuse of excess capacity is free) and assume that total supply is at least rather than exactly total demand ($\sum_i s_i\ge \sum_j d_j$). This reduces my fear of rounding errors. Also note that the flow variables have natural lower ($0$) and upper ($\min(s_i,d_j)$) bounds.

Now suppose that we solve the model, get an optimal solution with objective value $C$, and decide that we want some variety or flexibility in our decisions. (The virtues of this in real life have been discussed on other blogs and forums; since I'm just cooking up an example, I won't try to justify myself here.) So we'll choose some fraction $\delta>0$ such that we are interested in exploring "near-optimal" solutions having cost $\le (1+\delta)C$, and look for some number $K$ of such solutions. To maximize the diversity of the solutions we get, we will maximize the minimum L1 distance between any two solutions. The L1 norm\[ \left\Vert x\right\Vert _{1}=\sum_{i}|x_{i}| \]is one of two norms (along with\[ \left\Vert x\right\Vert _{0}=\max_{i}|x_{i}|, \]the L0 norm) that linearizes, and so is a nice fit with an LP or MILP.

Adding a superscript $k$ to $x$ to indicate which candidate solution it is, our model becomes
\textrm{maximize } z\\
\textrm{s.t. }\sum_{i=1}^M x_{ij}^k\ge d_j\quad\forall j\in \{1,\dots,N\},k\in\{1,\dots,K\}\\
\sum_{j=1}^N x_{ij}^k\le s_i\quad\forall i\in \{1,\dots,M\},k\in\{1,\dots,K\}\\
0\le x_{ij}^k\le \min(s_i,d_j)\quad\forall i,j,k\\
\sum_{ij}c_{ij}x_{ij}^k\le (1+\delta)C\quad\forall k\\
0\le z\le z_{hk}\quad\forall h,k\in\{1,\dots,K\},h<k\\
z_{hk}=\sum_{i,j}|x_{ij}^k-x_{ij}^h|\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}Here $z_{hk}$ is the L1 distance between solutions $h$ and $k$, and $z$ is the minimum distance between any pair of solutions. (As an aside, this formulation is not guaranteed to produce the optimal solution; but you already know that one, since it was necessary to solve the original problem to find $C$.) The trick now is to linearize that last constraint.

Here's where the "small programming  quirk" I mentioned above comes into play. Although you can apply IloCplexModeler.abs() to each difference $|x_{ij}^k-x_{ij}^h|$, you cannot directly add those expressions. [Oops! That's not correct. See update #3 below.] That's because IloCplexModeler.abs() returns an object of type IloNumExpr (numeric expression) rather than type IloLinearNumExpr (linear numeric expression), and only linear expressions may be combined into a larger linear expression (which is what you need for the right side of the last constraint). You cannot cast IloNumExpr to IloLinearNumExpr, which is plausible since casting an expression that is actually nonlinear to a linear expression could cause the sun to go nova. The workaround is easy but a bit tedious: assign each term of the last constraint to a variable. (Maybe Irv, or someone more familiar with CPLEX than I, can come up with a better approach.) [No sooner said than done: update #3 below.] Mathematically, the last constraint becomes\begin{gather*}z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\quad\forall h<k,\forall i,j\\ z_{hk}=\sum_{i,j}z^{hk}_{ij}\quad\forall h,k\in\{1,\dots,K\},h<k.\end{gather*}The extra variables may not be a bad thing, though, since we get to bound them. If we write $U_{ij}$ for the upper bound of $x^h_{ij}$ for every $h,i,j$, then we can bound the extra variables with \[z^{hk}_{ij}=|x_{ij}^k-x_{ij}^h|\le \max(U^h_{ij},U^k_{ij}).\]Bounding individual terms in the sum can't hurt and possibly could help.


I ran all four methods against my test problem using CPLEX 12.4 (Java API) with all parameters except the time limit at default values. I cut the runs off after 15 minutes (on a quad-core PC). The source code is available on request, but I don't know that it is very informative.

Problem dimensions were $M=5$ (sources), $N=10$ (sinks), $K=5$ (solutions), $\delta=0.05$ (accept a 5% increase in cost over the optimum). Coefficients $s$, $d$ and $c$ were generated randomly. The results are tabulated below. The value in the "Mean" column is the mean distance between any pair of solutions. (As a sidebar, CPLEX was apparently so disgusted with the SOS1 results that it refused to print the gap, substituting "---" ... or maybe it just didn't fit.) Remember that the objective is minimum distance between solutions, and we are maximizing it.


My example has reasonably tight bounds ($L$ and $U$ above): \[-\min(s_i,d_j)\le x^h_{ij}-x^k_{ij}\le \min(s_i,d_j).\]I thought I might relax the bound a bit by changing $\min(s_i,d_j)$ to $\sum_i s_i$ (i.e., bounding every flow by the total supply in the system). The results after that modification were as follows:


Generally speaking, looser bounds on the variables hurt all four methods, although the incumbents for the binary and ABS methods were actually better with the loose bounds even though the average distance between solutions, best bound and gap were consistently worse.  (Have I mentioned that MILPs are spiteful little beasts?)

Conclusion (?)

Reread the disclaimer above before interpreting these results. Still, based on very limited testing (I did run the same problem with smaller dimensions, with consistent outcomes), it looks as if the improved bounding in the binary formulation outweighs the cost of the binary variables -- provided that $L$ and $U$ are reasonable.  Using CPLEX's abs() method did better than either of the SOS approaches but not as well as providing your own binary variables. What I did not test here is a case where the bounds are large enough to cause the solver numerical indigestion.  Should that happen, perhaps one of the SOS methods might be worth using (but which one?).

Update #1

Bo Jensen pointed me to the slides from a presentation he did on exploiting absolute values in LP solvers. (Bo's presentation is well worth viewing, although he is dealing with the case where absolute values are being minimized, and so no binary variables are needed.) It reminded me of another variation on the binary approach. Rather than splitting $x$, we add a new variable $z=|x|$ and reformulate as follows:\begin{gather*}x\le z\\-x\le z\\z\le x + 2|L|(1-y)\\z\le -x+2Uy\\0\le z\\y\in \{0,1\}.\end{gather*}It uses the same number of variables (since, in my original binary formulation, $x$ can be eliminated) and more constraints, and the constraint coefficients containing the bounds are doubled, so I suspect it is inferior to the formulation above.

Update #2

Per Irv Lustig's suggestion in the comment section, I've added a fourth option, using CPLEX's built in absolute value function (IloCplexModeler.abs() in the Java API).

Update #3

Roland Wunderling at IBM kindly (and gently) pointed out that you can add absolute values in the CPLEX APIs. In the Java API, you can put the running total in an instance of IloNumExpr (let's call it total). If cplex is the instance of IloCplex we're working with, then total = cplex.sum(total, cplex.abs(...)) can be used in a loop to assemble a sum of absolute values. Once the sum is complete, total can be used as one side of a constraint. This avoids the use of the intermediate variables for each absolute difference, but also removes the opportunity to assign each term a bound. I've added results for this variant to the table (labeled "ABS2"). In my very limited testing (have I mentioned the disclaimer above?), it had somewhat looser bounds/larger gap than the ABS approach, but neither method dominated the other when it came to incumbents.

Update #4

IBM's Jean-Francois Puget posted an alternative approach for a special case ($x\in\{0,1,2\}$, $z=\left|x-1\right|\in\{0,1\}$) that readers might find interesting.

Related post: Absolute Value Inequalities