## Monday, June 23, 2014

### NetBeans 8 Update Bugs

An update to the NetBeans IDE (8.0 patch 2) this morning seems to have introduced some problems that I was fortunately able to work around. I'll document them here in case anyone else runs into them.

For some context, when I opened NetBeans, it correctly showed three projects in the project navigator, the main project on which I am currently working (call it C) and two related library projects (A and B). My task for the morning was to make a minor tweak to B. Happily (for my sanity), I use Git for version control and keep everything backed up to a remote repository (Bitbucket).

The IDE informed me that there were three updates available, which I downloaded and installed, forcing a restart of the IDE. Unfortunately, I took no notes on which updates they were, so I can't point to a particular culprit. Following the restart, I made B my main project, made the necessary code changes, tested them, committed them and pushed the commit up to the repository. I'll note here that I changed the method I used to access the repository from HTTPS to SSH, which may be significant for the second and third bugs. Then I did a clean and build to get a new jar file. So far, so good.

### Bug #1: Can't generate Javadoc

The first problem came when I went to regenerate the Javadoc files for B. The Generate Javadoc menu option was disabled (grayed out), and nothing I could do enabled it. I wasted a fair bit of time reading some very old bug reports that, as it turns out, were not germane, and generally futzing around before I finally did what I should have tried immediately: I shut down and reopened the IDE. Just like that, Generate Javadoc was enabled again, and I was able to generate the updated Javadoc files. I have no idea why the additional restart was required.

### Bug #2: Project won't open

Okay, whatever, time to make C my main project again. Small problem: after the latest restart, C is no longer listed in the project navigator (although several of its source files, which had been open the entire time, are again open). Fine, I'll open it from the recent projects list. Oops, it's not listed there either (?!). So I navigate to it with File > Open Project, try to open it and discover the following stuffed into the project name field:
Error in project.xml: The content of elements must consist of well-formed character data or markup.
Looking at project.xml in a text editor, I find what appears to be some Git-related comments stuffed into it (as if Git was trying to reconcile two conflicting versions of the file). Note that (a) this corruption must have occurred some time after I first opened the IDE, since C was listed in the project navigator then (and still was after the first post-update restart), and (b) up to that point I had not touched C, and certainly had not done any commits etc. relating to C.

Anyway, the fix was to open a terminal window in the project main folder and run git reset --hard to revert to the most recent version in the remote repository ... or so I thought.

### Bug #3: Projects merged

After the reset, I was able to open project C (without having to restart the IDE) and make it my main project. I was surprised to discover that, on top of C's own source files, C now contained all the packages and files from B (?!). Again reverting to a terminal (after closing the IDE), I ran git remote -v and found that, on top of listing the various known URLs for pushing and fetching C, it listed the SSH URL for project B as being one of C's URLs. How it got there I have no idea. When I added the SSH URL for B, both B and C were open, but B was the main project, and I added the URL while pushing B, not C.

Lacking an elegant way to fix this, I opened Git's configuration file (.git/config) in a text editor, found the line for B's repository and deleted it, saved that, deleted all the source files for C from C's project folder (while holding my breath), and ran git reset --hard again. That got C back to a state where it contained only its own files.

So everything is back to normal, I've only wasted one entire morning (sheesh!), and I've had the wisdom of remote backups reinforced.

## Friday, June 20, 2014

### Is the Gender Gap in STEM Degrees Closing?

Two recent posts by Randal S. Olson relating to STEM degrees and gender caught my eye, and I thought I'd share them. The posts are:
I won't attempt to recapitulate their content here; if you're interested, please have a look at them. With Randal's permission, though, I will reproduce his charts. The second chart is mathematically redundant but useful for visualization.
 (Source: Randal S. Olson)
 (Source: Randal S. Olson)

As Randal points out, the proportion of science and math baccalaureate degrees awarded to women has of late been around 40%. There's still a gap, especially since (if I recall correctly) women now comprise more than half the undergraduate population, but we have definitely made significant progress. I do have a wee bit of concern that the trend in math/statistics degrees was negative in the first decade of this millennium. Nonetheless, the gap in "S/M" degrees does not seem as bad as has sometimes been suggested. (Note that I was careful to use "/" and not "&" there.)

The "T/E" portion of STEM remains problematic, as signaled by the curves for Computer Science and Engineering, and there seems to be a fairly lengthy downward trend in the proportion of CS degrees awarded to women. (This is probably a more pressing concern for unwed heterosexual male CS majors than it is for me.) Various efforts to bolster female enrollments in "T/E", particularly engineering, are underway.

I'll end here with some related links.

#### Posts by Laura McLay (Punk Rock Operations Research blog):

• Too numerous to list. Just visit the blog, click the word "women" in the tag cloud ... and brace yourself.

## Wednesday, June 18, 2014

### Turning Bounds into Constraints in CPLEX

I had to delve into the CPLEX documentation today, and found something I had not seen before. As part of a (Java) program I'm writing, I need to use the conflict refiner to track down which upper and lower bounds on variables take a role in making a linear program infeasible. Of course, I could change the bounds to constraints (e.g., add a constraint $x\le 5$ rather than declaring $x$ as a variable with domain $[0, 5]$), but the model is more compact if the bounds are specified as bounds. The answer turns out to be tied to an interface in the Java API named IloNumVarBound. The documentation on it is not entirely self-explanatory, so I thought I'd post a small example.

Let me start by posing a small and obviously infeasible linear program:$\begin{array}[t]{cccc} \mathrm{minimize} & y\\ \mathrm{s.t.} & x-y & \le & 10\\ & x & \ge & 20\\ & y & \le & 5\\ & y & \ge & 0 \end{array}$where only the first constraint is entered as a constraint (the rest being entered as bounds). Here's my Java source code (minus imports, exception handling and other cruft):

IloCplex cp = new IloCplex();
IloNumVar x = cp.numVar(20, Double.POSITIVE_INFINITY);
IloNumVar y = cp.numVar(0.0, 5.0);
cp.addMinimize(y);
IloRange c1 = cp.addLe(cp.diff(x, y), 10.0);
cp.solve();
if (cp.getStatus() == IloCplex.Status.Infeasible) {
IloConstraint[] constraint =
new IloConstraint[] {c1,
cp.bound(x, IloNumVarBoundType.Lower),
cp.bound(x, IloNumVarBoundType.Upper),
cp.bound(y, IloNumVarBoundType.Lower),
cp.bound(y, IloNumVarBoundType.Upper)
};
double[] prefs = new double[] {1, 1, 1, 1, 1};
if (cp.refineConflict(constraint, prefs)) {
ConflictStatus[] status = cp.getConflict(constraint);
for (int i = 0; i < constraint.length; i++) {
System.out.println("Constraint " + constraint[i]
+ " has status " + status[i]);
}
} else {
System.out.println("No conflict found??");
}
} else {
System.out.println("Unexpected status = " + cp.getStatus());

If you are not familiar with the use of the conflict refiner, you can find it documented in the CPLEX API manuals. The refineConflict() method takes two arguments: a vector of constraints to consider in locating a conflict; and a numerical preference vector indicating which constraints should be ignored, which should automatically be considered part of any conflict, and what your priority is for including any of the remaining constraints.

The key issue here is that the first argument consists only of constraints, not bounds. The IloNumVarBound interface provides a way of "casting" a bound into something that CPLEX will accept as a constraint. In the Java API, IloCplex.bound() generates an instance of IloNumVarBound from a variable. It takes as arguments the variable and which bound (lower or upper) you want to treat as a constraint. It's important to note that you would not use bound() or IloNumVarBound to create the bound in the model; you still do that by specifying the bound as an argument to the method that creates the variable.

A couple of things in the code bear mentioning:
• I included all four bounds in the call to the conflict refiner because I was pretending not to know which ones were involved, not because you have to include every constraint and every bound when you call the refiner.
• In order to pass the one functional constraint to the conflict refiner, I had to assign (a pointer to) it to a variable (ct1).
Here is the output from the program:

Constraint IloRange  : -infinity <= (-1.0*[0.0..5.0] + 1.0*[20.0..infinity]) <= 10.0 has status Member
Constraint Lower bound of [20.0..infinity] has status Member
Constraint Upper bound of [20.0..infinity] has status Excluded
Constraint Lower bound of [0.0..5.0] has status Excluded
Constraint Upper bound of [0.0..5.0] has status Member


You can pretty it up by assigning names to the variables and constraints; I just didn't bother.

## Monday, June 9, 2014

### A Side-Scrolling JList

I just spent a less-than-enjoyable chunk of time trying to get a JList to scroll (horizontally) in a Java application with a Swing GUI. Everything I found in an online search either made it seem easier than it actually turned out to be (by omitting the key ingredient in the recipe) or sent me off in unproductive directions. So I'm recording what worked because I will, with probability 1.0, forget it soon enough.

The structure of the program, omitting most details, looked like this:

JList wideList = functionThatSpitsUpWideList();
JScrollPane pane = new JScrollPane(wideList);
JOptionPane.showMessageDialog(parent, pane, title, JOptionPane.PLAIN_MESSAGE);


It produced a dialog with a mile-wide list and no scroll bars. I tried what I thought was the obvious remedy, invoking setMaximumSize() (with modest dimensions) first on wideList and then on pane. Neither helped. The answer turned out to be invoking setPreferredSize() on pane:

JList wideList = functionThatSpitsUpWideList();
JScrollPane pane = new JScrollPane(wideList);
pane.setPreferredSize(new Dimension(400, 200));
JOptionPane.showMessageDialog(parent, pane, title, JOptionPane.PLAIN_MESSAGE);


I (naively?) thought that the maximum size trumps the preferred size. Oops.

## Friday, June 6, 2014

### Reproducibility and Java Collections

I've been immersed in Java coding for a research project, and I keep tripping over unintentional randomness in the execution of my code. Coincidentally, I happened to read a blog post today titled "Some myths of reproducible computational research", by C. Titus Brown, a faculty member (one of those hybrid species: Computer Science and Biology) at Michigan State University, my former employer. The post is worth reading. I've seen quite an up-tick in online discussions of the importance of reproducible research, sharing code, sharing data etc., to which I will add one more virtue (specific to computational research): it's a darn site easier to debug a program whose behavior is deterministic compared to debugging an inherently stochastic program.

That brings me to a couple of tripwires that have been snagging me on the recent project (and some other projects, for that matter). First, any code that uses parallel threads is capable of injecting some unwanted randomness. A given thread will consume different amounts of time in different runs, due to uncontrollable (and hence, for our purposes, "random") interference by external processes that from time to time are swapped into the CPU core running the given thread. If your program has multiple interdependent threads, the timing of when one thread gets a message from another thread will be different on each run unless you do some serious (and IMHO seriously tedious) magic to synchronize them. I generally shy away from coding multiple threads myself, the exception being programs with a graphical user interface, where the GUI has its own set of scheduling/event threads, and really lengthy computations need to be done in "worker" threads to avoid paralyzing the GUI. Even then, when I'm using a multithreaded library such as CPLEX, execution times vary in unpredictable ways.

The second tripwire has to do with Java collections. As a mathematician, I tend to think in terms of matrices, vectors and sets, and not so much in terms of lists and maps. I'll often code a collection of objects as a HashSet, both because I think of them as a (mathematical) set rather than a list, and because the Set interface in Java has one key property not shared by the List interface: adding an object to a Set that already contains it does not create a second instance of the object. Unfortunately, a key shortcoming (IMHO) of HashSet is clearly spelled out in its documentation:
It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time.
To give one example of what this implies for reproducibility (and debugging), I coded a heuristic for constructing a spanning tree that takes eac node from a set (as in HashSet) of unattached nodes and links it to randomly selected node from the tree under construction. The selection of the node already in the tree to which the new node will be linked (by an edge) is done using a pseudorandom number stream with an explicit, user-specified seed. By repeating the seed, I could reproduce the choices of those nodes, if only the rest of the heuristic were deterministic. Unfortunately, because iterating over a HashSet is random in irreproducible ways, I get a different tree each time I run the code, even with the same seed value. So I need to change that HashSet to a list of some sort ... and I need to remember this lesson the next time I'm tempted to use a HashSet.