Wednesday, November 28, 2012

Personal Finance Software for Linux

Upfront disclaimer #1: Anything that looks like a trademark probably is a trademark. That should spare me having to insert trademark symbols every line or so.

Upfront disclaimer #2: This post is about personal accounting software. If you find accounting terminally boring, feel free to stop reading now (and, trust me, I understand).

I keep tabs of my personal finances using a copy of Quicken 2004 running on Windows. At this point, it's pretty much the only reason I ever have to boot into Windows, and it's a bit of a PITA to have to exit Linux Mint and reboot into Windows to do bills, then reverse the process to get back to work (or play). Unfortunately, most versions of Quicken seem not to work very well under Wine: ratings are heavily silver (mostly works), bronze (some features work, some don't) or the self-explanatory "garbage" with a smaller number of platinum (problem-free) and gold instances. For what it's worth, Quicken 2004 Deluxe R3 is rated "bronze", but on my system the installer crashes.

So I've been tentatively planning for a while to migrate to a Linux personal finance package. Features and stability matter, but the #1 concern for me was the ability to migrate my Quicken data to the new application, and that's the focus of this post. The best (only?) way to get data out of Quicken is by exporting to a QIF (Quicken Interchange Format) file. QIF seems to be one of those industry "standards" that isn't quite standard, so you can't depend on an application that has a QIF import function to correctly import any old QIF file. I think my QIF file contained between 11,000 and 12,000 transactions. The number is probably not important, other than to explain why I had absolutely no desire to reenter data from scratch with a new application.

Without further ado, here are the Linux apps I tested and my results of trying them. I'll list them in alphabetic order to avoid offending anyone more than necessary.
  • GnuCash (open source, cross-platform): This is generally the most widely cited and best rated of the Linux personal accounting programs. It has a QIF import option. Unfortunately, when it imported my Quicken QIF file, several (most?) of the account balances were off. More critically, they were off in ways I could not repair. There were old and unreconciled "transactions" that were incorrect. (I put "transactions" in quotes because in at least one case there was a split in a transaction that was processed incorrectly.) GnuCash does not allow you to mark just any transaction as reconciled; you can only mark it by actually reconciling the account. There is seldom a good reason to retroactively mark a transaction as reconciled, but correct a software error is one of them. The only way I could see to mark the problem transactions as reconciled (after fixing them) was to reconcile as of the transaction's date; but that was a dismal failure, because GnuCash takes the balance as of the most recent reconciliation as the opening balance and will not let you specify a different opening balance. Since I couldn't find a way to fix the errors, I gave up on GnuCash.
  • Grisbi (open source, cross-platform): Grisbi also seems to be well regarded among reviewers. Sadly, it repeatedly hung up (with no error messages) trying to import my QIF file, and had to be killed from a terminal. (Killing it left an idle process that had to be tracked down and killed separately.) So Grisbi was a non-starter for me.
  • Homebank (open source, cross-platform): Homebank seems to be an attractive program, perhaps with fewer features (and an easier learning curve) than GnuCash has. When I imported my QIF file, however, quite a few of the account balances were off, in some cases way off. (Trust me: my credit union would not allow me to overdraw my draft account by more than half a million US dollars.)
  • jGnash (open source, Java-based): jGnash offers one feature not found in any of the other programs (that I'm aware of) - the ability to use multiple currencies. It's not a feature I need, but it certainly didn't turn me off. The (Swing-based?) interface is not the most aesthetically pleasing of the lot, but it is certainly good enough. Overall, I was quite impressed with jGnash, except for the all-important QIF import feature. When I imported my QIF file, some bank transactions were missed. Unlike GnuCash, you can retroactively add a missing transaction and mark it reconciled, so I seriously considered using jGnash and just adding the missing transactions ... until I discovered that none of my securities accounts had been imported. A major reason for my using Quicken was to track security purchases (including dividend reinvestments), since Uncle Sugar has funky rules for capital gains and such come tax time. So jGnash bit the dust.
  • Moneydance (commercial, Java-based): Moneydance may be the most Quicken-like of the programs I tested (give or take GnuCash). It has all the features I would want and one or two (such as grouping transactions using ad hoc "tags") that I didn't know I wanted. Most importantly, it imported my QIF file nearly correctly. All but one of my bank accounts had the correct balance. In the one account that was off, there were two or three phantom transactions scattered through its history, possibly the result of incorrectly read splits. They were easy to identify, as they were the only old transactions that were not reconciled, and deleting them fixed that account. All my securities accounts seem to have been imported correctly. Moneydance is a bit on the pricey side -- it actually costs more than the basic versions of Quicken, and you can probably find a "deluxe" version of Quicken at a lower price -- but it gets the job done.
  • Money Manager Ex (open source, cross-platform): Money Manager Ex looked like an attractive option, but the QIF import function seems to be limited to one account at a time (and the account must exist in MMEX before you import the QIF file). That's fine for grabbing data from a financial institution that offers QIF downloads, but I had no desire to do a separate QIF export of each account from Quicken, and I'm not sure what that would have done to transactions that moved money from one account to another. So after discovering that import of the all-in-one QIF file failed, I punted.
  • PocketMoney desktop (commercial, cross-platform): PocketMoney is actually a mobile application (Android and iWhatever), but Catamount Software does sell a desktop version. There is a QIF import function that, from what I read on the Web, works for the mobile versions but seems not to be implemented in the Linux desktop version. I did find a menu entry to import a QIF file. It popped up a file chooser, but the only file type it allowed was "myQIFType", and would not list my QIF file even if I changed the file extension to .myQIFType. So PocketMoney was a non-starter.
  • Skrooge (open source, cross-platform): Skrooge is based on KDE but runs well with Gnome. It has an attractive interface and likely would have been my first choice had it imported my QIF file properly. Sadly, it refused to import the QIF file at all. Unlike Grisbi, it did not hang: it immediately gave me an error message, and the GUI continued to work properly after the message. The message, however, gave me no idea as to what in the QIF file had offended it.
So I ended up purchasing Moneydance. As a postscript, to test whether the Quicken QIF file might contain something funky, I exported my accounts from Moneydance to a QIF file and tried importing that into the other programs. In all cases, the results were identical (Grisbi hung, Skrooge failed gracefully, jGnash skipped the securities, GnuCash and Homebank muffed a bunch of balances).


Saturday, November 17, 2012

NP-dopey

I have to make an up-front disclaimer: I have a hard time stifling yawns when discussions of which problems are in NP, NP-complete, NP-hard or NP-anything come up. In fact, I have a tempered enthusiasm for computational complexity in general. The mathematics is perfectly solid, it's of theoretical interest to some people, but I find it hard to connect much of it to the practice of mathematical optimization.

A couple of things prompt this post. First, Jean Francois Puget wrote a very nice blog post about what it means for a problem to be in class NP versus class P (and what the distinction is between an optimization problem and decision problem, as it relates to complexity classification). More importantly, from my perspective, he tied the discussion to customer concerns ("Why does it take so long to solve my problem?" "Can I solve larger instances in reasonable time?"). Part of what I'm going to say here was originally contained in a comment I posted to his blog, but apparently commenting on that blog is NP-difficult; my comment shows up blank. So I'll express myself here, where I control what actually gets approved.

The second trigger was an incident at the recent INFORMS 2012 conference. During one of the coffee breaks, I had a discussion with a few colleagues about publishing optimization papers, during which at least a couple of us agreed that authors have an unfortunate tendency to say "Our problem is NP-something, so we have to use a heuristic [and here's the wonderful heuristic we concocted]." Well, the Traveling Salesman Problem is NP-hard. Heck, I think it was voted the poster boy for NP-hard problems. Nonetheless, give me a network with three nodes, and I will give you an exact solution to it. Catch me after a cup of coffee and before too many beers, and I can even do a five node problem without resorting to a heuristic. Sure enough, not long after the coffee break I was in a session where a presenter said that his problem was NP-hard, and therefore he and his coauthor were forced to use a heuristic. (The presenter was a student, so it's not too late to redeem his soul.)

The following are my personal opinions and, based on conversations I've had, not widely endorsed within the academic community:
  1. Small instances of NP-hard problems can be solved exactly. Large instances are intractable. The same is true for polynomial problems (class P). The definition of "small" is "I can solve this easily". The definition of "large" is "this bugger takes too long to solve".
  2. There was excitement in the academic community when it was discovered that linear programs (or at least those satisfying some conditions I never bothered to memorize) can be solved in polynomial time. As George Dantzig pointed out (and if anyone has the reference for this, please post it in a comment!), a polynomial can be a big number. If I tell you that a particular problem class has an algorithm that is polynomial with complexity $O(n^{2593})$, where $n$ is, say, the number of nodes in a network, does that make it sound easy to you? It sure doesn't to me.
  3. Complexity results are asymptotic. To me, saying that problem A is in class P and problem B is in class NP means that as instances of the two problem grow larger and larger, I'll probably hit the barrier for what I can solve (with a contemporary computing platform, in acceptable time) sooner with problem B than with problem A. The thing is, I don't usually have problems where the instances grow and grow without bound. I have small instances that I use for testing and debugging, and I have the actual instances that need to be solved. If I can solve the instances I need to solve, I really don't care if the algorithm is polynomial time or not.
  4. Implicit in my previous statements were some things that I think get lost sometimes when people are throwing NP-ness around. A problem class belongs to P or NP. An individual instance of that problem does not. Also, computational complexity needs to be viewed at the algorithm level. Problems in class NP are those for which nobody knows a polynomial-time algorithm. Problems in class P are those for which we do know polynomial-time algorithms. That doesn't mean you're necessarily using one. The simplex algorithm has nonpolynomial (worst case) complexity, but it is applied to linear programs -- which, as mentioned above, are class P -- quite commonly.
  5. Computational complexity of an algorithm is a worst-case, asymptotic bound. I've already taken exception with the asymptotic part, since I don't typically worry about how computation time changes as my problems morph from huge to disgustingly huge. As far as the worst-case aspect, I don't typically expect to be solving the worst case. (Some days it seems like I am, but I'm probably just being grumpy.) I'd be more interested in knowing what complexity is on "average" problems, but even that might be deceptive - there might be something specific to my versions of the problem that would favor an algorithm that doesn't have the best "average" run times.
  6. In complexity measurement, there are constants that typically get overlooked. Suppose we have a class of problems where we can represent the size of an instance by a single dimension $n$. Suppose further that we have two algorithms, one (A) with complexity $O(n^3)$ and the other (B) with complexity $O(n)$. It's pretty clear that want to use the linear time algorithm (B) rather than cubic time algorithm (A), correct? Well, what the complexity stuff means is that (worst case) the time/work for A is bounded by $M_A n^3$ while the time/work for B is bounded by $M_B n$, where $M_A$ and $M_B$ are constants (that we typically do not know). Regardless of the values of those constants, \[\lim_{n\rightarrow \infty} \frac{M_B n}{M_A n^3}=0,\]so for large enough $n$, B is the winner. If $M_A \ll M_B$, though, then for small $n$, A is the faster algorithm. This isn't just a hypothetical. Take sorting a list. Tree sort ($O(n\log(n))$) is asymptotically faster than bubble sort ($O(n^2)$), but tree sorts involve a bunch of overhead, so bubble sorts really are faster on short lists. If you are going to sort a large number of short lists, and you implement tree sort (because, hey, it's got better complexity), you'll pay for it.
  7. The question of whether P = NP remains undecided, as far as I know. (There have been a few allegations of solutions in the last few years, but I don't think any have survived close scrutiny.) Like pretty much everyone, I believe they're not equal ... but if it turns out that P really does equal NP, won't all the people wringing their hands over this or that being NP-hard feel silly?
  8. For authors of academic papers, it's fine to include a statement (and, if it's not obvious, a proof) that your problem (class) is NP-whatever. That's another tidbit added to the body of human knowledge. Just don't think it's license to launch into building some funky new heuristic without first testing standard or obvious solution approaches (assuming such approaches exist).
So, am I out to thoroughly diss computational complexity? No. First, it gives us an indication of whether a problem class is likely to be difficult to solve. I don't know a better way to classify problems into scary-looking versus not scary-looking. I just wish people would not read more into the classification than it warrants.

Second, implementing algorithms usually involves more algorithms. Solving a MIP by branch and bound? How are you going to solve the node problems (meaning what algorithm will you use)? Computing shortest paths through a network as a step in some larger algorithm? Will you use Dijkstra, Floyd-Warshall or something else? How will you sort lists of things? What data structures (which contain algorithms for insertion and retrieval) will you use? When I'm coding an algorithm, I don't have time to test all possible combinations of all possible component algorithms; so I frequently grab the option that I think has best complexity because, frankly, I don't have anything better to go on.

So, bottom line, many instances of problems in class NP can be solved, and large enough instances of class P problems cannot be solved. To paraphrase a former boss (discussing hard work), I'm not afraid of NP-hard problems. I can lie down right next to them and sleep peacefully.

Other voices (in no particular order):

    Tuesday, November 6, 2012

    An OM Blog Worth Reading

    Since many of the three readers of my blog come from the operations research community, perhaps I should start with a quick bit of terminology. Experts in the various areas will no doubt disagree. Fortunately, none of them read this blog.

    "Operations management" (OM, also known in some quarters as either "production management" or "service management") generally refers to planning and managing the process of producing goods and services. "Logistics" generally refers to the process of transporting and distributing products (and services), either on the inbound side (from suppliers) or the outbound side (to customers). "Sourcing", "purchasing" and "procurement" are synonyms for the process of obtaining materials, components and services to support production and distribution. (Never trust an academic discipline that changes its name every few years.) Integrate these three and you have "supply chain management" (SCM).

    I've had a somewhat uneasy coexistence with SCM in its various incarnations and manifestations over my academic career. On the one hand, I've taught a few courses in or related to SCM, and I've coauthored (with actual SCM faculty) a few papers in the area. On the other hand, the SCM program at Michigan State University basically crowded management science out of the curriculum, and one of the professional societies to which I belong has migrated from a focus on the decision sciences to dominance by the SCM crowd (with a commensurate shift in its annual conference). Fundamentally, my reaction when I see anything SCM-oriented is almost indistinguishable from my reaction to campaign ads.

    Given that, when I recommend reading an operations management blog, you can insert the tag line "man bites dog". Barry Render and Jay Heizer have written a bunch of operations management textbooks that have been well received by the OM/SCM community. They also have a blog (the subject of this post) that ties business news to OM concepts, frequently with teaching tips to help integrate the news with content from their latest book. The blog features occasional guest posts related to teaching; but even if you have no interest in teaching OM (and, trust me, I have no interest in teaching OM ever again), the blog is worth reading for insights it provides into issues of business and global competitiveness. The writing is concise and crystal clear.

    As with all blogs (movies, novels, ice cream flavors ...) individual tastes will vary. If you have any interest in manufacturing, logistics or business in general, I think their blog is at least worth a look.

    Friday, November 2, 2012

    Producing UML Diagrams in Netbeans 7

    I need to revise some Java code I wrote several years ago, and I thought it might be helpful to have some visual aids as a reminder of how it works. Sadly, when you get older, memory is the second thing to go. (The first thing is ... er ... um ...) I use the Netbeans 7.2 IDE, and I knew that Netbeans had a plugin to produce UML diagrams for Java code, so I figured it was finally time to bite the bullet and learn UML.

    Good news first: The book "UML 2.0 in a Nutshell" is available as a free e-book. The authors assume that you know the key concepts of object-oriented programming (abstraction, classes, fields, methods ...) but not much else. I've had good luck with O'Reilly's "nutshell" series before, and this book is proving no exception. (Caveat: I'm only in chapter 2. There's still time for my head to explode.) [Update: I just found out, via the comments, that the copy I downloaded is probably pirated. So I need to reevaluate and either buy the book or switch to a truly free resource. Mea culpa: I should have checked the source more carefully. Usually I can spot pirate sites.] There is also a rather nice online resource by Scott W. Ambler [that really is free].

    Now the bad news (you knew this was coming, right?): Apparently the UML plugin for Netbeans was discontinued when Netbeans moved up to version 7. What to do, what to do?

    Back to good news: I found this blog post by Matthew Johnson (grad student at UNC-Charlotte), in which he identifies an alternative solution that works in Netbeans 7.x. It uses the yWorks UML Doclet, which is available as in both a free "community edition" and a paid "professional edition". (I'm using the free edition, at least until I figure out whether I'll do enough UML diagrams to warrant investigating the paid version.) The yWorks doclet embeds UML diagrams in documentation generated by Javadoc for existing Java code (when you generate the javadocs), but does not work in the other direction (producing code stubs from UML diagrams). That's fine with me; I never put enough forethought into a program to produce any sort of diagram in advance of the onset of coding.

    Installing the doclet is trivial; just download it, unpack the zip archive, and park the directory somewhere you can find. Using it proved a bit problematic at first, but a combination of a tip by commenter Kai Zander on Johnson's blog and a little trial-and-error by me got it working. Here are the key steps (once the software is installed). For any Netbeans Java project:
    1. Per Kai's tip,  go to the resources folder of the UML Doclet installation directory, edit ydoc.cfg, find the tag <group name="tiling"> and change the enabled property from true to false. Otherwise, you tend to get UML diagrams with an empty swath down the middle.
    2. Clean and build the project. (The user guide page for UML Doclet says that you need to add paths to any external libraries used by the project to the docletpath string, but I found that if you build the project first -- which copies any external libraries into the dist/lib folder -- UML Doclet will find them without needing to add them to docletpath.)
    3. Right-click the project name, click Properties, then drill down to Build>Documenting. Add the following to the Additional Javadocs Options entry: -docletpath "<ypath>/lib/ydoc.jar" -doclet ydoc.doclets.YStandard -resourcepath "<ypath>/resources" -umlautogen, where <ypath> is the absolute path to the UML Doclet installation folder. The last switch (-umlautogen) tells UML Doclet to generate overview, package and class diagrams for all documented files. There are alternative switches to generate some but not all three types of diagrams, and there are switches that will restrict diagrams to documented files containing an @y.uml tag. If you omit all the -uml... switches, or use -umlgen but don't have @y.uml tags in any of your files, you won't get any UML diagrams because the doclet will think it has no work to do.
    4. Now right-click the project, click Generate Javadoc, and discover that your javadocs have embedded diagrams.
    Here are samples of package and class diagrams (from my command line utility):

    UML package diagram
    Package Diagram


    UML class diagram
    Class Diagram