Changing RAID Drives Without Losing Data

Update September 2006

There are a number of HOWTO documents showing how to add software RAID drives without loosing any data. They use a missing drive method, where you build a RAID device with one missing drive. The missing drive is where you keep you data while you create the RAID. After you copy the data to the RAID device, you can add the missing drive to the RAID device.

This document shows how to change a 4x40GB RAID5 to a 4x80GB RAID5 without losing the data. It uses the missing drive method to keep the data while you partition and format the other drives. I used mdadm as the RAID management tool.

What I had was 2x40GB and 2x80GB drives in the following configuration:

 
/dev/hde
/dev/hdg
/dev/hdi
/dev/hdl
Size
80GB 
 80GB 
40GB 
40GB 
SWAP
256KB
256KB
256KB
256KB
RAID5 (120GB)
40GB
40GB
40GB
40GB
RAID1 (40GB)
40GB
40GB
 
 

The RAID1 was assembled as /dev/md1 and was the root for my system. The RAID5 was assembled as /dev/md0 and was the /home device. After I added the two new 120GB drives I wanted the following configuration. To convert required that I remove the two 40GB drives and delete the existing partitions on the 80GB drives.

 
/dev/hde
/dev/hdg
/dev/hdi
/dev/hdl
Size
120GB 
120GB 
80GB 
80GB 
SWAP
256KB
256KB
256KB
256KB
RAID5 (240GB)
80GB
80GB
80GB
80GB
RAID1 (40GB)
40GB
40GB
 
 
 

Just as a side note, if you are going to use multiple drives in you system, you may want to create a small swap partition on each drive. Linux stripes across the partitions for speed gains.

[mm@wysenburg mm]$ /sbin/swapon -s
Filename        Type         Size      Used     Priority
/dev/hde1       partition    257000    1228     1
/dev/hdg1       partition    257000    1212     1
/dev/hdi1       partition    258008    1232     1
/dev/hdl1       partition    258008    1232     1

Notice the the swap partitions are used equally. Reading from four drives in parallel is faster than from only one. RAID5 uses the same technique and the reason it is better than just mirroring. You need a minimum of three drives for RAID5.

I would have used RAID5 for the entire file system, but Linux can't boot from a software RAID5 array. It will boot from a software RAID1 (mirrored drive) so I mounted my root on a RAID1 array and /home on a larger RAID5 array.

Strategy

I used a four step method to upgrade:

  1. Copy data to new 120GB drive.
  2. Replaced boot drive with new 120GB
  3. Replaced other drives and create new RAID arrays with one missing drive
  4. Copy data to RAID and add missing drive into RAID arrays 

System Configuration

First a bit of information about the configuration. The system board is an ABIT BP6 dual Celeron with 2 ATA33 IDE channels and an onboard  HighPoint ATA66 controller with 2 more IDE channels. So the system could have eight drives (master and slave for each channel). The drives I was going to use were ATA133 and I had some trouble with the onboard HighPoint controller, so I added two Promise TX2 ATA100 cards, each with two channels. While the cards can have two drives per channel, I only placed one drive per cable or channel. I didn't use the onboard IDE channels.

Linux recognized the drives as shown above (hde, hdg, hdi and hdl). I had a faulty cable on the last channel and it burnt out the master for that channel. I switched the drive to the slave and it worked fine. That is the reason for hdl rather than hdk. The drive was OK, it was the Promise board that got burnt.

In the BIOS I could boot from the onboard IDE (C, D, E, F or CDROM among other devices) or an external device. The onboard HighPoint controller was considered an external device. When the system was set to boot from an external device, you could choose the onboard HighPoint Controller or SCSI. When you set it to SCSI, it boot from the first Promise IDE card otherwise it boot from the HighPoint controller.

Step 1:     Copy data to new 120GB drive.

I used the HighPoint controller to attach one of the new 120GB drives. If you don't have a spare connection for the new drive, you can remove one of the existing drives. Both arrays will survive without one drive.

Connected to the HighPoint controller, the drive was recognized as /dev/hdm. While the BIOS called it a 48GB drive, Linux recognized the entire drive. I partitioned it with a small swap, a 77GB Linux RAID partition to match the 80GB drives and the remaining as a 35GB Linux RAID partition. Funny how a 120GB drive only has 112 GB of space and the 80GB drives only have 77GB of room. I formatted the partitions as ext3, mounted them and copied my root and /home using the commands:

[root@wysenburg /]#mkdir /mnt/newroot
[root@wysenburg /]#mount /dev/hdm3 /mnt/newroot
[root@wysenburg /]#cp -ax / /mnt/newroot
[root@wysenburg /]#mkdir /mnt/newhome
[root@wysenburg /]#mount /dev/hdm2 /mnt/newhome
[root@wysenburg /]#cp -ax /home /mnt/newhome

I didn't quite have enough room for the entire /home device so I used the newroot partition to hold some of the data. When that operation finished I checked the sizes with df to see if everything copied.  There were no errors so I was sure they copied fine. You could also use du -sc to compare the totals to make sure that everything copied.

One of the problems with coping the root drive is that stuff is always happening. I have ntp running and it keeps stats. The files are added to a few times a minute.

Step 2:     Replaced boot drive with new 120GB

I had a problem with the GRUB boot loader. I couldn't get it recognize /dev/hdm so I couldn't install the boot loader while it was connected to the HighPoint controller. It was the fifth drive so it should have recognized it as (hd4). GRUB complained that no BIOS device existed for that drive. I suspect that was a problem with my (old) mother board and not a problem with GRUB. It could also have been a typo, but I had another way to accomplish the same thing and didn't pursue it.

I got around the problem by replacing /dev/hde with the new 120GB drive that now had the data on it and used the Fedora CD to boot into rescue mode. I then installed GRUB to the new drive in it's new location. There are a couple of ways to install grub. You can use the script grub-install /dev/hde or you can do it manually. 

If you use grub-install you should chroot to the new root so that grub-install knows where to install. From the rescue mode command line:

[root@wysenburg /]#mkdir /mnt/newroot
[root@wysenburg /]#mount /dev/hde3 /mnt/newroot
[root@wysenburg /]#chroot /mnt/newroot
[root@wysenburg /]#grub-install /dev/hde

/dev/hde3 is the partition I copied my old root to.

You can also do this manually with GRUB. From the command line:

[root@wysenburg /]#grub
grub>root (hd0,2)
grub>setup (hd0)
grub>quit

This sets the root to the third partition on the first drive (grub starts counting with zero) and installs GRUB to the MBR of the first drive.

You will also have to change /etc/fstab and /boot/grub/menu.lst. In my /etc/fstab I had /dev/md1 mounted as the root. I changed that to /dev/hde3. Because my home was mounted to another RAID device, I also changed that to /dev/hde2 in /etc/fstab. This is where I copied /home.

/dev/md1 / ext3 defaults 1 1
/dev/md0 /home ext3 defaults 1 2

changed to:

/dev/hde3 / ext3 defaults 1 1
/dev/hde2 /home ext3 defaults 1 2

Finally, I modified kernel line in GRUB. It was:

kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/md1

I changed it to:

kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/hde3

With those changes I was ready to boot to the new drive.

Sidetrip and testing:

Before I went any further, I wanted to test the system to make sure everything was copied successfully and the system would still work.  I had placed the placed 80GB drive on the HighPoint controller as /dev/hdm so all four original drives were still in the system, but the original boot drive (/dev/hde) was now /dev/hdm.

When I rebooted, the system came up fine. Well it actually didn't, but that was just a typo problem. The previous partition used for root on the boot drive was /dev/hde3. It was now /dev/hde2. A subtle difference, but I had to change /boot/grub/menu.lst from root (hd0,2) to root (hd0,1) to get the system to boot. After that it booted just fine.

Because the RAID devices had persistent superblocks (new style instead of /etc/raidtab), the RAID drives were assembled into arrays except it couldn't find  /dev/hde which was now /dev/hdm. I thought that wouldn't make a difference as long as all the drives were available. Obviously it does as the RAID devices came up in degraded mode. It didn't make a lot of difference as the data was copied onto the new drive, plus it was still available in the degraded array

Since I had all the data on the new drive /dev/hde, I decided to experiment with the RAID drives. I tried to assemble the RAID1 array to see if it would put them back together. I first stopped the array (mdadm --stop /dev/md1), then assembled it  (mdadm --assemble /dev/md1). This didn't work as mdadm was reading the configuration from the mdadm.conf file. The mdadm.conf file was now out of date because I had moved one drive since I updated the file. The proper method would be to specify the devices to assemble:

[root@wysenburg /]#/sbin/mdadm --assemble /dev/md1 /dev/hdg3 /dev/hdm3

or have the system search for the proper drives by specifying the super-minor number:

[root@wysenburg /]#/sbin/mdadm --assemble -m 1 /dev/md1

This would search for all the devices with a minor number of 1 and try to assemble them together.

I could add the drive (mdadm /dev/md1 -a /dev/hdm2), but then it treated the drive as a new drive and started the process of synchronizing the array. The data on /dev/hdm2 would be updated with the first drive in the array.

Step 3:     Copy data to RAID and add missing drive into RAID arrays

When I decided that the system was back up running and all the data was intact, I shutdown and installed the other drives. I moved the two 80GB drives to the slots three and four and installed the other new 120GB drive in slot two. I then partitioned the drives as shown in the table at the top of this page.

I created the two new RAID arrays in degraded mode. They were missing the first drive. That drive had root and /home partitions so they couldn't be added at this time. I used the commands:

mdadm --create /dev/md1 --level=1 --raid-devices=2 missing /dev/hdg2
mdadm --create /dev/md5 --level=5 --raid-devices=4 missing /dev/hdg3 /dev/hdi2 /dev/hdl2

The super-minor number of the RAID device is arbitrary. In my case I just used a number to reflect the type of RAID.

Step 4:     Copy data to RAID and add /dev/hde back into RAID arrays 

I formatted the drive with ext3 file system (mke2fs -j /dev/md1) and copied the root file system over using the same method as before only the other way:

[root@wysenburg /]#mkdir /mnt/newhome
[root@wysenburg /]#mount /dev/md5 /mnt/newhome
[root@wysenburg /]#cp -ax /home /mnt/newhome

I did the same for the root partition. I then had to change the /etc/fstab back to mounting the RAID arrays for root and /home.

/dev/hde3 / ext3 defaults 1 1
/dev/hde2 /home ext3 defaults 1 2

changed back to:

/dev/md1 / ext3 defaults 1 1
/dev/md5 /home ext3 defaults 1 2

The final change was to change /boot/grub/menu.lst to tell the kernel the root=/dev/md1

kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/md1

I rebooted and when the system cam back up, I used df to check that the RAID devices had mounted properly:

[root@wysenburg /]# df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md1              36108480  18364300  15909964  54% /
/dev/md5             235598184  90994224 132636188  41% /home 

I then added the last drive into the arrays:

[root@wysenburg /]#mdadm /dev/md1 -a /dev/hde2
[root@wysenburg /]#mdadm /dev/md5 -a /dev/hde3

The RAID arrays took a couple of hours to resynchronize and the conversion was complete. However, mdadm reports that there is a failed device. 

[root@wysenburg /]# /sbin/mdadm --detail /dev/md1
/dev/md5:
        Version : 00.90.00
  Creation Time : Sun Oct 16 16:03:20 2004
     Raid Level : raid5
     Array Size : 239355456 (228.27 GiB 245.10 GB)
    Device Size : 79785152 (76.09 GiB 81.70 GB)
   Raid Devices : 4
  Total Devices : 5
Preferred Minor : 5
    Persistence : Superblock is persistent

    Update Time : Sun Oct 16 22:05:10 2004
          State : dirty, no-errors
 Active Devices : 4
Working Devices : 4
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K
    Number   Major   Minor   RaidDevice State
       0      33        3        0      active sync   /dev/hde3
       1      34        3        1      active sync   /dev/hdg3
       2      56        2        2      active sync   /dev/hdi2
       3      57       66        3      active sync   /dev/hdl2
           UUID : ec7a419c:2bae6206:de50f4bc:d5c1e1d5
         Events : 0.6

The failed device was the one I specified was missing when I created the array. I had the same problem with previous arrays. I don't think there is a method to correct the report other then deleting the array and creating a new one. The array works fine the way it is, although the report is a little disconcerting.

 
Home Page Maps Software Search Support Site Map Contact Us

©1998-2004 Digital Mapping Systems
Maintained by: WebMaster@DigitalMapping
Get Firefox! Created with Microsoft Front Page Powered by Windows NT Server