July 31, 2003

Fixing hardware RAID, ripping my CD collection

We had a client’s hardware raid controller decide to lose its configuration, thus making their array unusable. It has been our experience with RAID controllers that you can’t say “here’s how the array is configured, don’t blow away the data on the drives please.” This particular controller was an AMI Megaraid controller, and after taking a lengthy route around it, I now believe there was at least a 50% chance that this controller would let us configure an array without initializing it. However, we didn’t want to risk that 50% chance that it would ruin our client’s data of which they had no backup. So my boss called me in to see what I could do.

I read up about partition tables, RAID 5 and how parity is calculated, NTFS, and I even found someone who had done what I was thinking: use Linux software RAID to access the array and at least get the data off. Oh, did I mention that no one had any idea how the array was configured, really? Six SCSI drives, no idea how they were allocated. Later my boss told me that, before the breakage, the controller was alerting him that it was constructing two logical drives. So I was kind of hoping it was three drives to each logical device, in RAID 5, and as it turned out later I was right.

First I had to activate the drives in the controller, as they were “ready” but not “online;” as it turns out I had to make logical disks of all the separate disks. This makes sense, of course, though I don’t think all other controllers do it this way, and I think several that I’m used to working with definitely don’t do it this way. Then we bought a couple of 120GB IDE drives. I used one to load RH9 on and the other to back up each SCSI drive to. sda got backed up to hdc1, sdb to hdc2, and so on, using dd. This took a while, naturally (no ATA/66 even).

After the backup was complete I disconnected the SCSI drives to keep myself from munging them in my sleep-deprived state. Then I built Linux Disk Editor which is OK. I used it with the switches --paranoid -t no. The first tells it to open the device in read only, the second tells it not to try and detect the filesystem type, since it doesn’t know about NTFS. It worked well as basically a hex editor. Let me take this moment to say: Linux seems to be lacking a real good hex editor that can open devices. Something like LE looks like it has a nice interface, but it won’t open devices. Oh, and the graphical ones were no help either, like KHexEdit. Of course, I just found a page with a bunch of Linux hex editors and I haven’t tried a few of them, so maybe I just didn’t look hard enough. Get with Freshmeat developers, please.

One thing that puzzled me initially was that I could see the partition table on four of the two drives, and on the combinations sda/sdc and sdd/sdf they were the same partition table. I started to think maybe they had two RAID 1 arrays with a hot spare for each, but after trying the arrays out, no luck. BTW, I “tried them out” using the Linux-NTFS Red Hat packages which work most excellently and have been kept up well for RH9. I’d use this driver later to test the drives once I figured out how to reconstruct them. Save yourself some worrying and remember to give -r to mount to make sure the mount point is read only. I believe the Linux NTFS drivers provided are read only anyway, but I’m always nervous when I dicking around with data I have no backup of.

I browsed around on the disks a bit and found that the “parity algorithm” was left symmetric. It also looked like the block size was 8K. I could usually tell this by seeking somewhere into the disk and finding some identifiable data, and watching for where it abruptly stopped on one disk and started on the other. Identifying the parity in this case was very easy as well: I had three xterms open with LDE for each of the three drives in (what I theorized was) the array, find a block of readable data, look for two readable and one that isn’t; the one that isn’t is the parity block. For a reference on how the various “parity algorithms” (as Linux calls them, at least) work, try this page on Linux RAID Information.

Next I tried a method for getting Linux software RAID to make an md device out of my former hardware RAID members as detailed in that post. Of course, once I did this I had an image of an entire disk, partition table and all, on /dev/md0. There is a patch for using partitions on a software RAID device but I didn’t have it and I didn’t feel like rebuilding the kernel. So I dd’ed just the one partition off of the software RAID device onto the 120GB boot drive which had plenty of space. Once that was done I was able to successfully mount the files on a loop device. That is to say, this whole mess of stuff worked.

I found the other array had a failed SCSI disk. Perhaps not so surprisingly after all this, using failed-disk in raidtab also worked. After I had pulled the two partitions off the second array, I fdisked the drive holding the images of all the array members to write the partition files back to it (dd). Voila, someone else could pop the drive into an NT machine and pull the data off of it. In fact I probably could have taken the old SCSI array, reconfigured it, replaced the broken drive, and dded the drive images back on to it; except I think we were still leery about erasing the original data, or what was left of it.

In other news, I’ve created two scripts that use Abcde to rip all my audio CDs into FLAC files. I think I had to patch Abcde a bit too. I’ll try and get that all posted at some point.

Comments (1)

  1. March 11, 2006

    [...] I was asked to perform hardware RAID heroics again tonight. In fact, I’ve been on site about eight hours now (it’s 0630) and I’m still working at it, but I’ve basically got the problem licked. [...]