Friday, June 23, 2017

Premature Obituaries

[T]he report of my death was an exaggeration. (Mark Twain, 1897)
In a recent blog post, "Data Science Is Not Dead", Jean-Francois Puget discussed and dissented with a post by Jeroen ter Heerdt titled "Data Science is dead." Barring the possibility that Schroedinger shoved data science into a box and sealed it, both assertions cannot simultaneously be true. The central thesis of ter Heerdt's post is that data scientists have developed powerful and easy to use tools, now deployed on cloud services, that let lay people do the analyses that previously required said data scientists, in effect putting themselves out of business. Puget responds that "there is a very long tail of use cases where developing a packaged service isn't worth it", and draws parallels to operations research. Common and important problems such as routing, machine scheduling, crew scheduling and so on have led to the development of problem-specific commercial software, but "many companies still have an OR department (under that name, or as part of their data science department) because they have operations problems that cannot be optimized with off the shelf software of services".

I'm siding with Puget on this, having experienced the "demise" of management science (if not all of OR) some decades ago. When I started on the faculty at Michigan State University, management science (whether under that name, "operations research" or something else) was a common and important element of business school programs. We had mandatory core courses in M.S. at both the undergraduate and masters levels, as well as higher level "elective" courses that were de facto requirements for some doctoral concentrations. We also participated in an interdisciplinary OR program at the masters level.

Gradually (but not gradually enough for me), MS evaporated at MSU (a bit ironic given the respective acronyms). Some of the more applied subject matter was moved into "functional area" courses (production planning, marketing, finance); most of the more conceptual subject matter just went away. As previously noted, canned software began to be available to solve many of the problems. The perceived need shifted from someone who understood algorithms to someone who could model the problem well enough to generate the inputs for the software.

As Puget notes, there is still demand for OR/MS professionals because there are new problems to be recognized and modeled, and models to be solved that do not fit neatly into the canned software. I believe there is also another reason not to start shoveling dirt on the grave of OR/MS. Users who learned a few basic incantations in a functional area class, without learning how the magic works (or does not work), may misapply techniques or produce incorrect models. Those who learn OR/MS (or analytics) as incantations may also tend to be a bit too literal-minded.

A possible analogy is the difference between a chef and someone like me who equates "cook" with "microwave on high". A chef understands a recipe as a framework, to be adjusted as needed. You couldn't get the cut of meat called for? Switch to this meat, then adjust this spice and cook a bit longer. On the (exceedingly rare) occasions I actually prepare a dish, I follow the instructions slavishly and try not to freelance at all.

Along those lines, I attended a thesis proposal defense (for a student not trained in OR/MS) where the work involved delivery routing and included a variant of a traveling salesman model. Both the candidate and his committee took it as axiomatic that a vehicle could not pass through the same node in the routing graph twice because, well, that's part of the definition of the TSP. So I posed the following simple question. You have a warehouse W and two customers A and B, all on the same street, with W between A and B. A is in a cul de sac, so the network diagram looks like

A -- W -- B -- whatever

with any number of links to B but only the A-W edge incident A (and only A-W and B-W incident on W). Trivial exercise: prove that, under the strict definition of a TSP, A and B cannot be on the same route, no matter how close they are to W (and each other).

My point here was not to bust the student's chops. An OR "chef" building a network model for truck deliveries would (hopefully) recognize that the arcs should represent the best (shortest, fastest, cheapest ...) way to travel between any pair of nodes, and not just physical roads. So, in the above example, there should be arcs between A and B that represent going "by way of" W. It's fine to say that you will stop at any destination exactly once, but I know of only two reasons why one would route a vehicle with a requirement that it pass through any location at most once: it's either laying land mines or dribbling radioactive waste behind it. Hopefully neither applied in the student's case.

The student, on the other hand, could not see past "arc = physical roadway" ... nor was he the only one. After the defense, the student contacted two companies that produce routing software for the trucking industry. According to him (and I'll have to take his word for this), neither of them had given any thought to the possibility that passing through the same node twice might be optimal, or to creating "meta-arcs" that represent best available routes rather than just direct physical links. If true, it serves as a reminder that canned software is not always correct software. Caveat emptor.

The flip side of being too "cookie cutter" in modeling is being too unstructured. As an instructor, I cringed at students (and authors) who thought a linear regression model could be applied anywhere, or that it was appropriate to pick any variable, even a categorical variable, as the dependent variable for a linear regression. To me, at least, neural networks and software services sold as "machine learning" are more opaque than regression models. Having someone with modest if any data science training cramming whatever data is at hand into one end of a statistical black box and accepting whatever comes out the other end does not bode well.

So, in addition to Puget's "tail problems", I think there will always be value in having some people trained at the "chef" level, whether it be in OR/MS or data science, working alongside the "line cooks".

Monday, May 29, 2017

If This and That Then Whatever

I was asked a question that reduced to the following: if $x$, $y$ and $z$ are all binary variables, how do we handle (in an integer programming model) the requirement "if $x=1$ and $y=1$ then $z=1$"? In the absence of any constraints on $z$ when the antecedent is not true, this is very easy: add the constraint $$z \ge x + y - 1.$$Verification (by substituting all possible combinations of 0 and 1 for the variables) is left to the reader as an exercise.

I thought I had covered this in a previous post, but looking back it appears that it never came up (at least in this form). This might be my shortest post ever. :-)

Sunday, May 28, 2017

Update Error: Wrong Architecture

Yesterday I ran into one of those mystifying glitches that, will infrequent, serve as a reminder that Linux is not for the faint of heart. When I booted my desktop system (Linux Mint 18.1 Serena), the system tray icon for the software updater was displaying a red "X" that indicates it tried and failed to update its cache. This is not uncommon, and the solution is usually simple: open the application and click the update button.

This time, however, the update produced a slew of error messages. For what appeared to be each repository it checked, it printed an error message saying
Skipping acquire of configured file <something with "i686" in it> as repository <whatever> doesn't support architecture 'i686'
After all the architecture error messages, I also got one about a problem with a GPG encryption key on a CRAN mirror. That message had been popping up, harmlessly, on each update for quite a while, so I did not worry about it.

The PC is a 64-bit machine, with an AMD processor, and I'm used to architecture references that contain either "amd64" or "x86_64" rather than "i686". So my suspicious was that something had messed up a setting somewhere on the machine that identified the architecture to the updater. Running uname -m in a terminal got me "x86_64", confirming that the machine itself was not undergoing any sort of identity crisis.

After a fruitless search for a malformed configuration file, I went to a Mint help forum, where I received a few answers. One said it was a server error, and I should just wait for them to fix it. That didn't hold water for a couple of reasons. First, it would require simultaneous problems on a whole bunch of unrelated repository servers, which would be a bit too coincidental. Second, my laptop (also using an AMD 64-bit chip, but running Mint 18.0 Sarah and a slightly older version of the updater) had no problem updating. Another suggestion, reinstalling the updater, also bore no fruit.

Along the way, I discovered that in fact two repositories were being updated correctly. Those two, not coincidentally, specified "[arch=amd64]" in their source listings. Adding that string to all the other PPAs and extra repositories worked, albeit at some cost (the updater started to label the repositories as "arch=amd64" rather than by their correct names). Unfortunately, the updater would not let me use that trick on the two main repositories (one for Mint, one for Ubuntu), which left me still looking for a fix.

Someone on the help forum suggested disabling all the PPAs on my list of sources and trying again. I did that, and (crucially) also disabled sources in the "Additional repositories" list, leaving just the Mint and Ubuntu repositories. Lo and behold, the update went through. So I started restoring the other repositories in a binary search, and found the culprit. Remember the CRAN mirror that had a problem with its encryption key? If I disabled just that source, which was in the "Additional repositories" list, things worked. So I switched mirrors, reloaded the public key, and things are back to normal.

I have no idea why the encryption key error, which I'd been getting for weeks if not months, suddenly caused things to go splat across the board -- and even less intuition as to why a bad encryption key would cause the updater to start using the wrong architecture code. For now, though, I'll settle for having updates working again.

Update 06/28/17:  It happened again, this time with the Revolution Analytics CRAN mirror yielding the message "server certificate verification failed" and something about a missing file (which I assume was their certificate). So I switched to yet another mirror, and we're back in business ... for the moment. I don't like the idea of downloading software over insecure/unencrypted connections (for reasons that I would hope are obvious), but this is getting tedious. Note to server admins: please keep on top of your certificate deployment.

Thursday, April 20, 2017

Sharing "Implicit" Test Problems

The topic of reproducible research is garnering a lot of attention these days. I'm not sure there is a 100% agreed upon, specific, detailed definition of it, and I do think it's likely to be somewhat dependent on the type of research, but for purposes of this post the Wikipedia definition (previous link) works for me. In particular, I want to call out one part of the Wikipedia entry:
The term reproducible research refers to the idea that the ultimate product of academic research is the paper along with the laboratory notebooks and full computational environment used to produce the results in the paper such as the code, data, etc. that can be used to reproduce the results and create new work based on the research.
I've seen a lot of press related to reproducibility in the biological sciences, although I assume it's an issue in other sciences as well. (Good luck with that, physicists!) It's also increasingly an issue in the social sciences, where one study asserted that over half of the published results they tested failed to replicate. (That study itself was criticized as allegedly failing to replicate.) All of this has been termed by some a "replication crisis". In short, reproducibility is a big deal. There's at least one blog devoted to it, and a considerable amount of work in the statistics community is going into tools to facilitate reproducibility (such as R and Python "notebooks"). R users in particular should have a look at the rOpenSci Reproducibility Guide.

My research tends almost exclusively to fall into the category of "stupid math programming tricks". Either I'm trying to find some clever formulation of a (deterministic) problem, I'm trying to find an efficient way to generate an exact solution, I'm trying to find a decent heuristic for getting "good" solutions "quickly" (preferably without stretching analogies to nature too far: there's no way I'm beating slime mold optimization in the naming contest), or some combination of the above. Since I mostly avoid statistics (other than the occasional comparison of run times with some selected benchmark alternative), I've been largely unaffected by the debate on reproducibility ... until now.

Recently, the journals published by INFORMS have moved in the direction of reproducible research, and I suspect others are doing so (or will do so in the near future) as well. As an example relevant to my interests, the INFORMS Journal on Computing (JOC) has introduced policies on the sharing of both data and code. Personally, I think this is a good idea. I've shared data and code for one particular paper (machine scheduling) on my web site since the paper came out (20+ years ago), and I share data for a more recent paper as well (having released the code as an open source project).

At the same time, I recognize that there are various difficulties (licensing/copyright, for instance) in doing so, and also there are costs for the researcher. I'd be willing to share the code for one or two other projects, but it would take a major effort on my part to untangle the spaghetti and figure out which libraries are/are not required, and which code should be included or should be excluded as irrelevant to the ultimate experiments. I'm reluctant to commit that time until someone actually requests it.

There's another twist that I would not have anticipated until quite recently. I'm working on a project that involves "implicit hitting set" (IHS) problems. In an IHS problem, you have a master problem that formulates as a pure integer (in fact, 0-1) program. Candidate solutions to that model are fed to one or more "oracles", which either bless the solutions as feasible or generate additional constraints for the master problem that the candidates violate. Note that I have not said anything about the nature of the oracles. Oracles might be solving linear or integer programming models, which would be relatively easy to share as "data", but they might also be doing something decidedly funky that is encapsulated in their coding. In the latter case, the test "data" is actually code: I would have to provided the code for the oracles in order for someone to reproduce my results.

Well, okay, if sharing code is on the table now, isn't that just more code? Not quite. Let's say that some unfortunate doctoral student has been tasked by her advisor to code their shiny new IHS algorithm and test it against my published problems. The doctoral student used Python or Julia (or some other trendy language), whereas I used stodgy old Java, so there's a good chance the luckless doctoral student will have to recode my stuff (which, among other things, requires making sense of it). Moreover, I will have created an API to my oracles that may or may not fit with what they are doing ... and that's if I used an API at all. If I directly integrated program structures external to the oracle functions into the oracle (passed in variables, constraints or other elements of my master problems, for instance), our doctoral student will need to figure out how to isolate those elements and replace them with corresponding elements from her code ... if there is even a correspondence.

There's at least one implication in this for me. If I actually subscribe to the push for reproducibility (or if I take pity on other people's doctoral students), I need to code my oracles not as an integral part of my master code but as a separate software module or library with a well-documented and reasonably flexible interface to the outside world (API). <sigh>More work for me.</sigh>

Tuesday, March 28, 2017

Adventures with TeX Live

Since I use the Linux Mint operating system, the obvious (if not only) choice for a LaTeX distribution is TeX Live. (If you are not familiar with, or are not interested in, the LaTeX typesetting system, you have already read too far in this post.) On Mint, Ubuntu and other Debian-type operating systems, you typically install TeX Live by downloading and installing a number of fairly large files from a repository (in my case, mainly repositories for Ubuntu). Sometimes, though, you just want to install one or two small LaTeX packages without bothering with one of the large TeX Live files (which will contain a zillion other LaTeX packages you do not want or need). For this purpose, TeX Live includes a package manager (tlmgr).

Today, I found myself trying unsuccessfully to install something using tlmgr. The first problem was a missing compression/decompression program that was easy to install. After that, I continued to get cryptic error messages. Eventually, I decided to update tlmgr to see if that would help.

The Canonical (Ubuntu) repositories are fairly far behind the curve when it comes to TeX Live; they still provide the 2015 version, which is incompatible with newer versions (including the newer version of tlmgr). So updating tlmgr alone was not an option. (I had what appeared to be the most recent, and presumably final, incarnation of the 2015 version of tlmgr.) Following instructions I found on the Tips on Ubuntu site, I added a PPA containing the latest (I think) version of TeX Live, running the following in a shell:

sudo add-apt-repository ppa:jonathonf/texlive
sudo apt update 

After that, I used the software updater to update my TeX Live packages. (The "Tips on Ubuntu" instructions have you install TeX Live from the PPA, which presumes you do not already have a version installed, as I did.)

Before using tlmgr to do much of anything, you need to initialize the user tree by running

tlmgr init-usertree

in a shell. I had already done that, but I repeated it, confirming that the user tree was still intact. I then ran

tlmgr --gui

to open a graphical user interface, told it to load the default repository (there's a button in the "Repository" panel of the display), and tried -- unsuccessfully -- to install a LaTeX package.

So what was wrong now?  The error message contained the following snippet:

Tk::Error: /usr/bin/tlmgr: open(/usr/share/texlive/tlpkg/texlive.tlpdb)
failed: No such file or directory ...

I checked /usr/share/texlive/tlpkg, and sure enough there was no file texlive.tlpdb ... but there was a file named texlive.tlpdb.9ed73e8174b03a21aa0d40ebbcaac97f. It's a text file (around 12 MB) with configuration information and what looks like a directory structure. Why it had the rather funky hexadecimal extension, I have no idea. I symlinked it by running

sudo ln -s texlive.tlpdb.9ed73e8174b03a21aa0d40ebbcaac97f texlive.tlpdb

and tlmgr started behaving itself.

I later deleted the file with the scary looking hex extension, reran tlmgr and had it load the default repository again. It reproduced the file, but with a different (equally scary) hex extension. I again symlinked the file to texlive.tlpdb.

I have no idea why loading a repository produces what appears to be the correct file with the wrong extension and does not symlink it, but at least this fix seems to work.

Monday, February 27, 2017

MSU Students Do Microfinance

Several years ago, I found out about, an online "microfinance" site where individuals can make small loans ($25 is the standard increment at Kiva) to entrepreneurs in low income settings. The entrepreneurs actually apply for larger amounts, which they typically receive from third-party "field partners". The entrepreneurs repay the loans with interest to the field partners, who in turn repay the principal (no interest) to the original donors. A modest deposit by a donor can be turned over repeatedly into loans, enjoying a version of what in my long-ago undergrad econ class was termed the "multiplier effect". I've made a fair number of loans over the years, nearly all of which have been fully repaid. It's a bit of fun as well as personally satisfying.

So I was geeked to learn from one of my colleagues at the Broad College of Business, Professor Paulette Stenzel, that we have a student organization on campus dedicated to microfinance. The Spartan Global Development Fund is a 501(c)(3) nonprofit organization, created and run by students, that makes microloans both through their own field partners and through Kiva. It's a registered student organization, meaning that any Michigan State University student can join. Students and nonstudents alike can donate through PayPal.

Next time I'm on the Kiva site, I plan to join their Kiva "team", and I encourage anyone interested in microfinance to check out their site.

Thursday, February 23, 2017

Another Absolute Value Question

Someone asked whether it is possible to eliminate the absolute value from the constraint$$Lx\le |y| \le Ux$$(where $L\ge 0$ and $U>0$ are constants, $x$ is a binary variable, and $y$ is a continuous variable). The answer is yes, but what it takes depends on whether $L=0$ or $L>0$.

The easy case is when $L=0$. There are two possibilities for the domain of $y$: either $y\in [0,0]$ (if $x=0$) or $y\in [-U,U]$ (if $x=1$). One binary variable is enough to capture two choices, so we don't need any new variables. We just need to rewrite the constraint as$$-Ux\le y \le Ux.$$If $L>0$, then there are three possibilities for the domain of $y$: $[-U, -L] \cup [0,0] \cup [L, U]$. That means we'll need at least two binary variables to cover all choices. Rather than change the interpretation of $x$ (which may be used elsewhere in the questioner's model), I'll introduce two new binary variables ($z_1$ for the left interval and $z_2$ for the right interval) and link them to $x$ via $x=z_1+z_2$. That leads to the following constraints:$$-Uz_1 \le y \le -Lz_1 + U(1-z_1)$$and$$Lz_2 - U(1-z_2) \le y \le Uz_2.$$ If $x=0$, both $z_1$ and $z_2$ must be 0, and the new constraints force $y=0$. If $x=1$, then either $z_1=1$ or $z_2=1$ (but not both). Left to the reader as an exercise: verify that $z_1=1\implies -U\le y \le -L$ while $z_2=1 \implies L\le y\le U$.

Saturday, January 21, 2017

Fischetti on Benders Decomposition

I just came across slides for a presentation that Matteo Fischetti (University of Padova) gave at the Lunteren Conference on the Mathematics of Operations Research a few days ago. Matteo is both expert at and dare I say an advocate of Benders decomposition. I use Benders decomposition (or variants of it) rather extensively in my research, so it ends up being a frequent theme in my posts. Those posts tend to generate more comments than posts on other topics. Apparently Matteo and I are not the only BD users out there.

I don't know that I would recommend Matteo's presentation as the starting point for someone who has heard of BD but never used it, but I certainly recommend having a look at the slides for anyone who has any familiarity with BD. Matteo provides several interesting perspectives as well as a tip or two for potentially improving performance. I learned a few new things.

In a sad coincidence, Professor Jacques Benders, the originator of Benders decomposition, passed away at age 92 just eight days before Matteo's presentation.

Tuesday, January 10, 2017

Pro Bono Analytics Is Growing Social

Pro Bono Analytics is a program by INFORMS (the Institute for Operations Research and the Management Sciences, for the acronym-averse), "the largest society in the world for professionals in the field of operations research (O.R.), management science, and analytics". PBA "connects our members and other analytics professionals with nonprofit organizations working in underserved and developing communities". In other words, we hook up charitable organizations needing analytics or OR help with volunteers willing to provide it without compensation. Our volunteers are a mix of industry practitioners, academics, students and the occasional geezer retiree. I think the majority of the volunteer pool comes from the U. S., and a majority are INFORMS members, but we have volunteers from as far away as Australia, and a significant portion are non-members.

I'd encourage anyone with OR/MS/analytics skills to consider volunteering, and anyone who knows a charitable organization (particularly those of limited financial means) to let them know we're out there willing to extend a helping hand. Both potential clients and potential volunteers can find out more, and signify their interest, at the PBA home page (repeating the link above).

Our new (and apparently energetic) staff liaison, Tia Carrai, has expanded PBA's social media footprint. You can find us now on the following:
I'd also like to give a shout-out to Pro Bono O.R., a like-minded initiative by our sister institution in the U. K., the Operational Research Society.

Friday, January 6, 2017

Mapping Trackball Buttons

For years, I've used a Logitech M570 wireless trackball with my Linux Mint PC. I generally prefer trackballs to mice -- no need to lift and reposition after a bunch of movement -- and I find that using my thumb, rather than my index finger (or, if I'm in a bad mood, my middle finger) to move the cursor is less fatiguing for my hand and wrist. The M570 works fine with Linux (at least Mint and Ubuntu) with no need for additional drivers.

In addition to the usual (at least for non-Mac users) three main buttons (left, write and combination scroll wheel/third button), the M570 has a couple of secondary buttons, which Logitech describes as "large, easy-to-reach Back/Forward buttons". I'm not sure I'd agree about "large", but I agree that they are easy to reach, and up until my most recent operating system upgrade I would have agreed they acted as back and forward buttons. Prior to the upgrade (and with no special configuration on my part, at least that I can remember), the extra buttons acted like page up and page down in every web browser or document reader that I used. After the upgrade (and switch from the Cinnamon desktop to the Mate desktop), their behavior changed. In the Firefox web browser, they would switch among tabs rather than vertically scrolling the current tab. In Xreader and Acrobat Reader, they also did not move forward backward among pages. (I don't recall trying to read multiple documents at once to see if they would switch among open documents.)

I find the forward/backward action rather handy with multipage documents and long web pages, so I wanted the previous behavior back. It turned out (after some research) not to be hard to do.

The first step is to install the xbindkeys and xautomation packages, both available from the Canonical repositories. This can be done via Synaptic or by running
     sudo apt-get install xbindkeys
     sudo apt-get install xautomation
in a terminal. (I also installed xbindkeys-config, which xbindkeys "suggests", but I don't think it's really necessary.) The xautomation package provides a command xte that can fake a key press.

The second step is to determine what button numbers are assigned to the forward and backward buttons. Run
     xev | grep -i button
in a terminal. This will open a small window with a target. Position the cursor over that window and click each of the buttons. You will see two messages in the terminal for each button, one for the press and one for the release. Look for the button numbers. For me, they were button 8 for forward and button 9 for backward, but your mileage may vary.

The third step is to configure xbindkeys to translate the extra trackball buttons appropriately. The configuration for xbindkeys is kept in a plain text file named .xbindkeysrc in your home directory (~/.xbindkeysrc). You can either create a new one (if you don't have one yet, or if there is nothing in it you want to keep) containing the lines below, or append those lines to the existing file, using your choice of text editor.
# Key bindings for Logitech M570 trackball
"xte 'key Page_Up'"
"xte 'key Page_Down'"
If your button numbers differ from mine (8 and 9), edit the lines accordingly. Note carefully the single and double quotation marks; xte seems a bit finicky about them.

Finally, you'll want to test this. You need to restart xbindkeys to get it to read the modified configuration file. Theoretically you can do this by running
     killall -HUP xbindkeys
in a terminal, but I found it necessary to restart the X server (after closing any applications using it, such as my web browser and email client). Typically Control+Alt+Backspace will do the trick. After that, try the trackball keys and hopefully they'll behave as expected.