Wednesday, January 30, 2008

Recovering failed RAID disks on Linux

Hey there once more,

As I mentioned in yesterday's post regarding creating RAID disk sets we'll be looking at how to deal with RAID disk failures on Linux in today's post. Once we're done here, I think we can call it a wrap on this subject for a while (Are those sighs of relief I'm hearing? ;) At least until we come up with some good shell scripts to expedite some of this stuff.

Disk failure, on the hardware side, is too broad to cover in any great detail here, but the following basic steps should be followed (of course, as noted, your setup may require otherwise). The scenario here is that one of your disks has just gone "bad." It's beyond recovery.

1. If the disk is hot-swappable, simply remove it. If it isn't, you'll need to schedule downtime and remove the disk then. If the failed disk is your boot disk, you'll already have your downtime ;)

2. Replace the failed disk and restart your machine (and skip to step 6) if your failed disk isn't the boot disk.

3. If your failed disk is the boot disk, you'll next want to boot off of CD (We're using RedHat Linux AS), mount the root filesystem under a temporary mountpoint, and do the following:

host # mkdir /tmp/recovery
host # mount /dev/hda0 /tmp/recovery
host # chroot /tmp/recovery
host # grub --batch
(This may take a while as grub probes and tries to guess where all of your drives are)

4. Once grub is finished probing, do the following at the "grub>" prompt:

grub> root (hd0,0)
root (hd0,0)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
...
Running "install /grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/grub/stage2
/grub/grub.conf"... succeeded
grub> exit


5. Now take a second and verify that all is well while still running off of the CD, like so:

host # cat /boot/grub/device.map:
(fd0) /dev/fd0
(hd0) /dev/hda

host # df -k (should show the /dev/md0 mount)
host # umount /tmp/recovery
host # reboot
(Be sure to set the grub device map for hd0 to /dev/hdc if /dev/hda has gone bye-bye)

6. Now that you have the disk physically replaced and you've booted back up, check the content of /proc/mdstat, like so:

host # cat /proc/mdstat

Personalities : [raid1]
read_ahead 1024 sectors
...
md0 : active raid1 hda1[0]
307328 blocks [2/1] [U_]
...
unused devices: <none>


Again, the [2/1] listing indicates that one part of the mirror is not active in this two-way mirror set!

7. If this has been driving you crazy from the first post, we're now going to recreate the mirror so it's not incomplete (apologies for any stress-related reading injuries. Sometimes I go a long way to make a point ;). The first thing we'll do is repartition the disk, again, with fdisk (instructions in our previous post regarding creating RAID disk sets ) and we should end up with our partition table looking exactly the same:

host # fdisk -l /dev/hda (The partition table should look almost identical for /dev/hdc)

Disk /dev/hda: 16 heads, 63 sectors, 77520 cylinders
Units = cylinders of 1008 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 6095 3071848+ fd Linux raid autodetect
/dev/hda2 6096 67047 30719808 fd Linux raid autodetect
/dev/hda3 67048 73142 3071880 fd Linux raid autodetect
/dev/hda4 75175 77206 1024096+ 82 Linux swap


8. As a matter of course, just to be sure, check /etc/raidtab to see what devices need to be mirrored with what devices:

host # cat /etc/raidtab

raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
...


9. Now, we just need to add back all the partitions we lost:

host # raidhotadd -a /dev/md0 /dev/hda1
...


And keep tabs on the process by checking on /proc/mdstat:

host # cat /proc/mdstat

Personalities : [raid1]
read_ahead 1024 sectors
...
md0 : active raid1 hda1[2] hdc1[1]
30716160 blocks [2/1] [_U]
[===========>..........] recovery = 45.9% (34790000/61432320) finish=98.7min speed=8452K/sec
unused devices: <none>


Once the RAID sync is done, you should be good to go! I'd reboot one more time, just to be sure. Especially if you were forced to take downtime in the first place :P

Take care,

, Mike