You ever have one of those days, where you are happily using your Linux laptop, shut it down for transportation, then fire it up and it just doesn’t boot? I’ll admit that I do all sorts of screwy things on my laptop, so this is probably my own fault, but I figured I’d write it up in the hopes that it helps someone else, and to give a couple shout-outs to helpful resources.
I am running a Lenovo Thinkpad T430s with Red Hat Enterprise Linux 6.4 Server. I have a 256GB SSD, which has a 512M boot partition, and the rest is a LUKS encrypted PV, with several LVs for my filesystems.
I booted the laptop and received a black screen with only two words in white on it:
Error 17
Yep, that’s it, nothing more. Guessing that “Error 17” is a GRUB error (confirmed by Google), I grab my trusty RHEL6.4 USB installer (created with livecd-tools), and boot into rescue mode. A quick fdisk -l shows me a perfectly valid partition layout with a single ~5.5GB partition …. wait, that doesn’t seem right.
So, let me go check my partition table backup that I make periodically. Heh, yeah right.
I vaguely recall my drive layout was as I described above in the second paragraph, but how the heck am I suppose to confirm if my data is still in place (please let this be just a corrupt partition table!!!), and then figure out the partition boundaries and reconstruct the table?
Googling brought me to a posting on Dedoimedo, which pointed me to the awesome TestDisk utility by CGSecurity. I grabbed the Linux x86_64 binary package provided by the site, and followed the CGSecurity TestDisk instructions to analyze my disk. Sure enough, it found the first partition 512MB, and then a second 2MB partition. I read a little about how TestDisk works, looking for signatures for ext2,ext3,ext4,FAT, btrfs, etc., so I understood that it found my first partition which was ext4, but it didn’t know what to do with my LUKS partition. I was pretty sure that the first one was correct, though, so I went ahead and let TestDisk write a new partition table.
Then I fired up fdisk -cu , and deleted the 2nd partition, then recreated it starting at the sector following the first partition, and extending to the last sector on the disk. I was pretty certain that I had used the rest of the disk …. but still added a healthy dose of finger-crossing.
I wrote the disk partition out, rebooted into rescue mode again, found my way to a shell, and attempted to mount the first partition – success!
Next, I proceeded stepwise to access the second partition:
#> cryptsetup isLuks /dev/sda2 #First, test to make sure this is recognized as a LUKS device -- it was! #> cryptsetup luksOpen /dev/sda2 foo #Open the LUKS encrypted device as new device "foo" #> pvs #Scan for LVM PVs, which foo should be -- and it was found! #> vgchange -ay #Activate all LVM VGs on discovered PVs -- and my VG was found! #> lvs #Scan for LVM LVs -- and mine were found
At this point, I mounted my home LV, and was able to take a backup. Feeling brave, I gave the laptop a reboot, and everything is running as before.
So, thank you to Dedoimedo for pointing me towards TestDisk, and thank you to CGSecurity for the great tool!