Ok, so the server went down in flames and smoke just as I had left for a presentation in Moscow, Russia. As I came back home and started troubleshooting, I realized things were worse than I thought.
It was yet another hard drive problem – I was seeing superblock I/O errors all over the logs, and the server had dismounted the file system to read-only. After trying with a new SSD drive, and then another (which I had ordered to my home while in Russia), which all showed the same symptoms, I realized it was the SATA circuits on the motherboard that were fried and giving me disturbing intermittent failures, rather than the hard drives.
So, rummaging through the spares, I found a PCI SATA adapter which would bypass the motherboard’s SATA circuits. No game. The server would install fine, but BIOS would not boot into that hard drive (it was an Adaptec 1210 card).
Two options remained. The first was to get a PCIe SATA adapter that was different from the PCI SATA and hope that the BIOS would be able to transfer control to that hard drive. The second, last option, would be to get an IDE-to-SATA adapter to feed the SSD off the ATA interface on the motherboard, which hopefully had a different circuit path.
The third and fourth options were increasingly arcane (such as running the system SSD over USB2) and just aimed at getting something running, rather than getting it running in a close to decent way.
Anyway, the shops opened today at 1000 Stockholm time, and the PCIe SATA that I got worked after some trial, error, and reconfiguration. The server is now up and running and I hope this was the root cause of the errors that has been affecting the site off and on since February.