Friday, November 4, 2016

Surviving an nVidia Driver Update

Scenario: I'm running Linux Mint 17.3 Rebecca (based on Ubuntu 14.04) on a PC with a GeForce 6150SE nForce 430 graphics card. My desktop environment is Cinnamon. The graphics card is a bit long in the tooth, but it's been running fine with the supported nVidia proprietary driver for quite some time. Unfortunately, having no reason to do so, I had not noted down what version of the driver I had.

Yesterday I installed a couple of "recommended" updates, one of which bumped the nVidia driver to version 304.132. Apparently this driver is "recommended" for people who don't want to see their desktops. On the next boot after the upgrade, I got a black screen. To be clear, there's no problem before the desktop loads -- I can see the BIOS messages at the start of the boot and the grub menu where I get to choose which version of the operating system to boot. It's only when we get to the desktop that the display fails.

A bit of searching showed me that I'm far from the only person experiencing this. What's lacking (at least as of this writing) is a definitive fix. I'll skip the gory details of a day and half of fruitless hacking and cut to the chase scene.

Getting to a terminal


The first step was to escape the black hole of my desktop. That was easier said than done. You can't right click on an invisible desktop (or at least if you do it's unproductive). Ditto trying to get to the application menu. Fortunately, control+alt+F2 did manage to kill the (worthless) X session and get me to a terminal. (The display worked fine in terminal mode.)

Getting to a desktop


It's a bit like cutting off your leg to get rid of a broken toe, but one way to get out of nVidia Hell is to get nVidia off your system. So in the terminal I ran

sudo apt-get purge nVidia*

(which deleted all nVidia packages) followed by

sudo reboot now

(which did exactly what you would think). With nVidia gone, the system was forced to use the open source "nouveau" driver. Unfortunately, the nouveau driver seemed to be hopelessly confused about what sort of display I had (it called it a "laptop display") and what resolution to use. The result was a largely unusable (but at least mostly visible) desktop.

Rolling way, way back


My hope was to roll back to the previous nVidia driver, but that hope was quickly dashed. I was able to run the device manager. (You have two ways to do this, depending on how good or bad the nouveau display is. One is to use the Mint menu to run "Device Manager", if you can. The other is to open a terminal and run "gksudo device-manager".) The device manager listed three possible drivers for me. The first was the multiple-expletives-deleted 304.132 nVidia driver, the second was the nouveau driver, and the third was the version 173.14.39 nVidia driver. So I picked the third, applied the changes and restarted.

This got me a fully functional desktop (at the correct resolution), but performance was less than stellar, as one might expect from that old a driver. There were noticeable lags between some operations (such as clicking the close control on a window) and their results (window actually closing). More importantly, if I suspended the machine, when I tried to resume I could not get the desktop back. So version 173 was not the permanent solution.

Rolling back just a little


I've mentioned the sgfxi script before. I tried running it, but it wanted to install the latest supported version, which was that nasty 304.132 thing. After screwing around for way too long, I discovered I could in fact roll back with the script.

The first step was to kill the X server, since sgfxi won't run while X is running. So I used control+alt+F2 to get to a terminal, logged in, and ran

sudo service mdm stop

to get rid of the display manager. That forced me to log in again, after which I ran

sudo su -
sgfxi --advanced-options | less

for two reasons. One was to find the correct option for specifying a particular driver version (it's -o). The other was to get a list of available versions, which appears near the top of the output.

I tried a few of the more recent ones listed, but was told either that they weren't supported (despite appearing in the list) or that some other package was the wrong version to mesh with them. Fortunately, 304.131 could be installed. I assume that was released immediately before the ill-fated 304.132. So once more unto the breach: running (as root)

sgfxi -o 304.131

worked. I was prompted to make a few decisions (one or two of which I simply guessed about), and I got one error message during a cleanup phase, but the script did install the driver and terminated. I rebooted and the system seems to be working normally. It doesn't feel sluggish any more, and returning from a nap is no problem.

The earlier purger and removed the nvidia-settings package, so I used the Synaptic package manager to reinstall version 304 of that. It provides a graphical interface to adjust display settings, although so far the defaults seem to work just fine for me.

Now I just need to be sure never, ever, ever again to update that driver.

No comments:

Post a Comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.