Rescuing hardware RAID (again)
I was asked to perform hardware RAID heroics again tonight. In fact, I’ve been on site about eight hours now (it’s 0630) and I’m still working at it, but I’ve basically got the problem licked.
I put a broken SCSI RAID 5 (six disks, one failed) back together. The general process is:
- Get twice as much storage as the size of the array you want to restore.
- Boot your friendly Knoppix disc, or other Linux distribution as you see fit. Make sure you can see the array you’re restoring (in JBOD, of course) and the additional storage you’re going to use for recovery.
ddeach disk from the broken RAID array to a file on the additional storage. Now you’ve got a backup to work from. You’d hate to screw up the original devices.- Disconnect the original devices. You can work with the “images” you made of them. You’d hate to screw up the original devices.
- Pick apart the disks with something like BIEW. I couldn’t get mine to work with large files, so I had to, uh, “bind” the data files to loop devices. You’re looking for some data that spans a block boundary so you can tell what parity algorithm the array is using. Choices are: left symmetric, left asymmetric, right symmetric, right assymmetric. You’ll also need to discover the block size. Both arrays I’ve done this with have used 64KiB block size. Hopefully you can make a good guess at the order the controller put the drives in; for my SCSI RAID this was in order of their IDs, just like I guessed.
- Now you’ve likely got a problem. Your logical RAID volume has a
partition table and (duh) partitions. We’re using Linux software
RAID (md) to stitch your disks back together, and md only
semi-recently (kernel 2.6) gained the ability to have partitions on
a software raid device. So, you’re going to either need a version
of mdadm that
lets you use
--buildwith--level=raid5(which, to my knowledge, doesn’t exist), or else you’ll apparently need to patch raidtools. I don’t have a patch for you; just go into the source and comment out any section that bitchesnot an MD device!ifmajor(stat_buf.st_rdev) != MD_MAJOR. Miraculously once you stopmkraidfrom looking at the device major, it works just fine with partitioned MD devices. cat /proc/devices, look for the major number assigned tomdp. If you don’t find it, I gather your kernel lacks partitions-on-md-devices support. You might need to load some module; this was already done for me on Knoppix, and the number assigned was 254.mknod /dev/md0 b 254 0substituting254for whatever number you got in the previous step.- Make the individual partition devices by counting up minors from
there. I.e.,
md0p1ismknod /dev/md0p1 b 254 1. I think you can go up to 63 partitions. - Now you can make an /etc/raidtab. Mine was pretty standard, and a
lot like the one in the original awesome post of the guy that
gave me this idea in the first
place,
except this particular night I had a
failed-disk. (Don’t worry about what device to supply for thefailed-disk; I gave it/dev/missing.mkraidbitches, but it still starts the array.) - Now you can try something like
mkraid --force --dangerous-no-resync /dev/md0. (Since I had a failed disk, I don’t think I actually needed--dangerous-no-resync.) - Now take the other half of your additional storage (wondering why
I said you need twice the size of the RAID array you’re restoring,
weren’t you?), format it, mount
md0p1or whatever, and copy the data over. In my case it was a Windows server, so I made a FAT32 partition (mkdosfs -F 32 /dev/hdf2) and then copied all the data over to that.
When I was done, I had an IDE drive I could jam in a Windows computer to read the data back on to the newly rebuilt server. Trickiest parts are reading the drives by hand to determine how they need to be reassembled, and then making raidtools work with partitioned md devices.
As far as “reading the drives by hand,” that could be automated if I ever took the time to learn more about NTFS. Reading it tonight, and working with a little Python I wrote, I noticed things about it, like two bytes that seem to hang out around the end of every 4KiB (I think) block, and the fact that the second byte before the start of a long file name seems to indicate that file name’s length. Still since there are only four parity types, and probably only about 5-10 block types I can even remotely call “sane,” someone with a little knowledge of the filesystem could automate the process of finding out the parameters without a tremendous amount of effort.
Me Ruv You Rong Time!
Comment by Vietnamese Admiree — Wednesday, 08 March 2006 @ 12:31:49