Wednesday, December 9, 2015

Bootcamp + Windows 10...No Boot Device

Backstory:
My son has a not entirely new MacBook, and because of his relatively young age he likes video games. Because of this, he wanted to play games with Steam, and hated the limited selection of Steam games on OS X.

To solve this he used Bootcamp to install a retail copy of Windows 8.x. Everything seemed fine for some time. We've upgraded OS X, and Windows has been upgraded to Windows 10 Home. All still seemed fine.

At some point he wanted to reboot to Windows and test some screen recording software. "Dad, what does it mean when the screen is black and says there's no boot device?"

Dammit.

Symptoms:
OS X booted fine. Bootcamp still appeared as a volume that was readable under OS X. I could select it as a bootable volume when OS X was running. Upon startup with Bootcamp as the startup disk, the Mac booted to a black console with a Windows error saying there wasn't a usable boot disk and asking you to insert a bootable CD or disk and restart.

When you boot the Mac holding the Option key, the only bootable volumes that appeared were OS X and the recovery volume.

Therefore, the data/partition was intact, but something related to the Master Boot Record (or the hybrid used on the Mac to accommodate Windows with the disk partition scheme compatible on the Mac with its lack of BIOS) was damaged.

I tried booting the commercial Win8x disc and running a repair; Windows would say it repaired things, but upon restart it still threw an error.

I was going to reformat the partition and reinstall but then I realized we had upgraded to Windows 10 and the whole "install 8.x and run an upgrade to Windows 10" would be beyond painful with our crap Internet connection.

I really wanted to double down on my effort to save the partition.

I had run Disk Utility to check the drive. The drive was coming up fine, the OS X partition was healthy, and Disk Utility wouldn't even try to play with an NTFS partition hosting Windows. So the drive seems fine. It was most likely a data-level problem instead of a hardware problem.

(Note - yes, I know, backups. There weren't any. That is something I tell the family to do, and if they lose the data because of <reasons> then it's not my direct fault...they can ask about it and I can help set it up, but they have to be responsible for actually connecting a drive and running backups periodically, which with laptops means an actual effort on the user's part to run backups. Hybrid installations of OS X and Windows make it three times harder. In the end, there are no backups, it's their responsibility to do it even though I'd help if they ask, and I'm not spending significant portions of time chasing them down to do it...if they lose data after a problem, it's on them. Sorry.)

The fix:
He had no idea how this happened or if something had happened the last time he used Windows, but people who deal with troubleshooting other people's systems know that this isn't uncommon.

To make it worse, I had made the decision that since I'd reformat the partition and start over, it was a good time to finally do the OS X 10.11 from 10.10.5. The realization that he had Windows 10 installed (by the way, I checked the version from OS X by opening the Bootcamp partition, then drilling to C:\Windows\System32\ happened as I wrapped up the 10.11.1 install.

Why is that a big deal? Because changing things that affect the system are now a no-no. OS X 10.11 uses SIP (System Integrity Protection) to protect users from themselves; that can also affect programs that affect boot sector manipulation. I followed the steps I outline here once from a regular boot, and it said the write failed. So to save rehashing that attempt, I'll start with what I did to run it again and have it work.

DISCLAIMER - This is playing with your hard drive. Like, you could accidentally kill your system. What I'm saying is that if you don't mind reformatting your drive and reinstalling everything and potentially losing data, go ahead and do what I'm outlining here. But I strongly advise backups first. Not my fault if this bricks your system.

  1. Download and install some support software
    1. Download gptfdisk (my repair used 1.0.1)
    2. Run the pkg installer
    3. If you get the unidentified developer error, say ok and open your security system pref and tell it to run it there
  2. Disable SIP
    1. Boot to recovery mode (on startup, hold Command-R)
    2. Open a terminal (using the Utilities menu)
    3. Run the command csrutil disable in the terminal. Note: you can get the status of SIP using the command csrutil status
    4. Reboot, but boot back into recovery mode
  3. Let's Run a Repair
    1. In recovery mode, open two terminals
    2. In one terminal, run diskutil list
    3. There are instructions on the link above saying to first run gpt -r -vv show disk0 and fdisk /dev/disk0 for information about the drive...
      1. These give you more information, but the part I needed for later instructions was from diskutil list
      2. The instructions in the link also use "sudo" but recovery mode's Terminal is running as root (# in the prompt) that doesn't require sudo. Recovery mode also lists more mounted volumes in diskutil list than it did in regular running mode, but all I needed was disk0 with the Bootcamp partition in the numbered list.
    4. diskutil list will show the #, Type, Name, Size and Identifier of your partitions on each disk. I needed the partition (slice) marked Microsoft Basic Data BOOTCAMP, which was #4 in the list. Remember that far left number for the line marked BOOTCAMP.
    5. Because I'm in recovery mode, I have to run the gdisk executable from where it's installed. In the second terminal window, so I can use the first terminal window with diskutil's output for reference:
      1. mount
      2. look for the hard drive volume location (/dev/disk0). cd /Volumes/<hard drive>
      3. cd usr/local/bin
      4. ./gdisk /dev/disk0
    6. r <enter> for the recovery and transformation menu
    7. h <enter> to create a new hybrid MBR
    8. 4 <enter> to add partition 4 to the list (ENTER THE NUMBER FOR YOUR BOOTCAMP PARTITION...from diskutil list...HERE. For me it was 4.)
    9. y <enter> to place EFI GTP (oxEE) partition...good for GRUB...yolo...
    10. <enter> to accept default MBR hex code (07)
    11. y <enter> to set the bootable flag
    12. n <enter> because I have no more partitions to protect
    13. w <enter> to write partition table to the disk
    14. y <enter> to proceed
At this point I rebooted. I selected the Bootcamp volume as the startup disk within OS X, and rebooted again.

Windows booted and complained about an issue with startup (but I got farther along the boot process this time!); at that point, I ran a startup repair with Windows 10's builtin auto-repair process, after which it automatically restarted. Windows 10 then booted fine!

Once everything was booting fine, I booted back into recovery mode, opened a terminal, and ran csrutil enable along with a quick check using csrutil status to verify that SIP is running again. DON'T RUN WITHOUT SIP UNLESS YOU REALLY REALLY KNOW WHAT YOU'RE DOING. See the link above in my instructions for SIP to see the section on verifying that SIP is enabled.

Conclusion:
Something...a utility, an update, a chkdisk...munged the boot sector. Windows isn't expecting the hybrid nature of Bootcamp's boot sector to accommodate the EFI firmware instead of a BIOS, so it killed Windows boot sector while leaving the partition table information and data intact. 

I linked to the articles I used for guidance above in the instructions. I included a casual warning about not being responsible for destroying your data if you follow these instructions, and the linked set of instructions also warns of possible dire consequences by following the (similar) instructions. Make a backup. You could destroy your data, both Windows and OS X, when screwing with the boot sector on your hard disk.

No comments:

Post a Comment