Thursday, August 28, 2014

Time Machine Corruption...Such Fun

I make backups with an external terabyte hard drive. It is configured for backups through Time Machine, so it automatically makes incremental backups for me every hour.

For the most part, it's an automated function that I don't have to think about. But having worked in technology for a few years, I know better than to fully trust it. So I check my backup status periodically. So of course while reading through the logs I noticed that the backups had failed last Tuesday.

Well, that, and there was a popup saying there had been an error in my Time Machine volume.

It started off with a simple error about being unable to write a file. At least the drive wasn't clicking, and it was visible to the system...it just didn't work properly.

I opened Disk Utility and told it to repair the disk. After running for half an hour, a series of errors about orphaned linked and incorrect link counts scrolled by and a final error message popped up telling me that Disk Utility could not fix the drive. It actually told me that I needed to make a backup and reformat the drive.

The odd thing was that whenever I tried to use Disk Utility to format the drive, Disk Utility refused to allow a format. The partition couldn't be removed, and any attempt to erase the drive gave me an error.

I tried several variations of the diskutil, fdisk and gpt commands to wipe part of the drive; nope, didn't work (although fdisk did change part of the primary partition on the drive...apparently if you encrypt the time machine disk, the disk is not only very hard to repartition, but either the encryption process or the act of formatting the external drive created a "hidden" partition that I couldn't format or remove.)

I stopped short of trying to just cat /dev/random > /dev/(diskdevice), which might have worked, or...maybe not. Like I said, I stopped short of trying it, so I didn't try that.

I was trying to just get the disk to a state where I could get Disk Utility to reformat it. But the drive wasn't having it.

I even fired up the VirtualBox Windows VM and used the USB passthrough filter to try to convince Windows to format the drive. Each attempt just gave me another USB driver error on the VM. Eventually I decided that it was getting late. I was getting tired. And I was getting irritated that everything was failing on me.

The next day I connected the drive to a physical Windows system and opened the disk manager. It showed the drive; it had two partitions. It would let me format the large partition. But the small partition, marked as an EFI partition?

Nope.

Fortunately there is a way to just obliterate the disk contents.

Basically:

  1. Open a command prompt.
  2. type "diskpart".
  3. type "list disk" and figure out which disk is the one you're trying to nuke.
  4. type "select disk #" where # is the disk you're trying to nuke.
  5. If you type "list disk" again you should see an asterisk next to the disk you're trying to nuke.
  6. type "clean", hit enter, and hold your breath. It should be just a few moments. If you pass out, something is probably wrong, either with you or the disk nuke process.
  7. type "exit"
At that point, the drive can be re-partitioned and formatted and, in my case, turned into an encrypted time machine drive again.

The takeaways from this:
  1. Time Machine drives, when encrypted, are more susceptible to unrecoverable problems when there are filesystem corruption issues.
  2. EFI partitions and/or the hoops that CoreStorage goes through to encrypt the drive will make it harder to reformat/repartition should you need to do so. It doesn't make it impossible, but it can make a "newbie" user think the drive is possibly completely broken when in fact it's just being a pain in the ass.
  3. Windows does have some handy utilities hidden at the command line, as much as <favorite OS>-snobs will bitch about Windows for the sake of hating it.
  4. Virtualization is handy has hell. However, when it comes to directly hitting hardware, you're probably going to be at the mercy of the intermediate drivers. If the hardware is oddball or having problems, virtualization will probably not help. Stick to virtualization for software purposes.
  5. I decided to call it quits and go to bed before trying to nuke the drive with a simple "let's cat garbage to the raw device," which might have worked or might not have...but it can be handy to have another computer around running Linux or Windows to try low-level nuking drives too.
  6. I already mentioned using cat to redirect bleh directly to the /dev/diskx device (or rdiskx device, since using the raw interface should be faster) might have worked, but on PC hardware using something like DBAN (Dan's Boot and Nuke) or Ultimate Boot CD's utilities could probably have wiped the disk as well, as long as you don't accidentally wipe other drives you meant to keep in working order. Disconnect those if you can. Otherwise you feel stupid.
  7. Encryption is a good thing. It means the contents of a stolen or lost drive can't be easily read. It also means that if something gets corrupted...perhaps through an unplanned mid-read disconnect (didn't mean to lean on that cable...) the odds of catastrophic filesystem damage skyrockets.
So that was part of my fun day, and so marks the end of my blog hiatus!