|
|
Changing RAID Drives Without Losing DataThere are a number of HOWTO documents showing how to add software RAID drives without loosing any data. They use a missing drive method, where you build a RAID device with one missing drive. The missing drive is where you keep you data while you create the RAID. After you copy the data to the RAID device, you can add the missing drive to the RAID device. This document shows how to change a 4x40GB RAID5 to a 4x80GB RAID5 without losing the data. It uses the missing drive method to keep the data while you partition and format the other drives. I used mdadm as the RAID management tool. What I had was 2x40GB and 2x80GB drives in the following configuration:
The RAID1 was assembled as /dev/md1 and was the root for my system. The RAID5 was assembled as /dev/md0 and was the /home device. After I added the two new 120GB drives I wanted the following configuration. To convert required that I remove the two 40GB drives and delete the existing partitions on the 80GB drives.
Just as a side note, if you are going to use multiple drives in you system, you may want to create a small swap partition on each drive. Linux stripes across the partitions for speed gains. [mm@wysenburg mm]$ /sbin/swapon -s Filename Type Size Used Priority /dev/hde1 partition 257000 1228 1 /dev/hdg1 partition 257000 1212 1 /dev/hdi1 partition 258008 1232 1 /dev/hdl1 partition 258008 1232 1 Notice the the swap partitions are used equally. Reading from four drives in parallel is faster than from only one. RAID5 uses the same technique and the reason it is better than just mirroring. You need a minimum of three drives for RAID5. I would have used RAID5 for the entire file system, but Linux can't boot from a software RAID5 array. It will boot from a software RAID1 (mirrored drive) so I mounted my root on a RAID1 array and /home on a larger RAID5 array. StrategyI used a four step method to upgrade:
System ConfigurationFirst a bit of information about the configuration. The system board is an ABIT BP6 dual Celeron with 2 ATA33 IDE channels and an onboard HighPoint ATA66 controller with 2 more IDE channels. So the system could have eight drives (master and slave for each channel). The drives I was going to use were ATA133 and I had some trouble with the onboard HighPoint controller, so I added two Promise TX2 ATA100 cards, each with two channels. While the cards can have two drives per channel, I only placed one drive per cable or channel. I didn't use the onboard IDE channels. Linux recognized the drives as shown above (hde, hdg, hdi and hdl). I had a faulty cable on the last channel and it burnt out the master for that channel. I switched the drive to the slave and it worked fine. That is the reason for hdl rather than hdk. The drive was OK, it was the Promise board that got burnt. In the BIOS I could boot from the onboard IDE (C, D, E, F or CDROM among other devices) or an external device. The onboard HighPoint controller was considered an external device. When the system was set to boot from an external device, you could choose the onboard HighPoint Controller or SCSI. When you set it to SCSI, it boot from the first Promise IDE card otherwise it boot from the HighPoint controller. Step 1: Copy data to new 120GB drive.I used the HighPoint controller to attach one of the new 120GB drives. If you don't have a spare connection for the new drive, you can remove one of the existing drives. Both arrays will survive without one drive. Connected to the HighPoint controller, the drive was recognized as /dev/hdm. While the BIOS called it a 48GB drive, Linux recognized the entire drive. I partitioned it with a small swap, a 77GB Linux RAID partition to match the 80GB drives and the remaining as a 35GB Linux RAID partition. Funny how a 120GB drive only has 112 GB of space and the 80GB drives only have 77GB of room. I formatted the partitions as ext3, mounted them and copied my root and /home using the commands: [root@wysenburg /]#mkdir /mnt/newroot [root@wysenburg /]#mount /dev/hdm3 /mnt/newroot [root@wysenburg /]#cp -ax / /mnt/newroot [root@wysenburg /]#mkdir /mnt/newhome [root@wysenburg /]#mount /dev/hdm2 /mnt/newhome [root@wysenburg /]#cp -ax /home /mnt/newhome I didn't quite have enough room for the entire /home device so I used the newroot partition to hold some of the data. When that operation finished I checked the sizes with df to see if everything copied. There were no errors so I was sure they copied fine. You could also use du -sc to compare the totals to make sure that everything copied. One of the problems with coping the root drive is that stuff is always happening. I have ntp running and it keeps stats. The files are added to a few times a minute. Step 2: Replaced boot drive with new 120GBI had a problem with the GRUB boot loader. I couldn't get it recognize /dev/hdm so I couldn't install the boot loader while it was connected to the HighPoint controller. It was the fifth drive so it should have recognized it as (hd4). GRUB complained that no BIOS device existed for that drive. I suspect that was a problem with my (old) mother board and not a problem with GRUB. It could also have been a typo, but I had another way to accomplish the same thing and didn't pursue it. I got around the problem by replacing /dev/hde with the new 120GB drive that now had the data on it and used the Fedora CD to boot into rescue mode. I then installed GRUB to the new drive in it's new location. There are a couple of ways to install grub. You can use the script grub-install /dev/hde or you can do it manually. If you use grub-install you should chroot to the new root so that grub-install knows where to install. From the rescue mode command line: [root@wysenburg /]#mkdir /mnt/newroot [root@wysenburg /]#mount /dev/hde3 /mnt/newroot [root@wysenburg /]#chroot /mnt/newroot [root@wysenburg /]#grub-install /dev/hde /dev/hde3 is the partition I copied my old root to. You can also do this manually with GRUB. From the command line: [root@wysenburg /]#grub grub>root (hd0,2) grub>setup (hd0) grub>quit This sets the root to the third partition on the first drive (grub starts counting with zero) and installs GRUB to the MBR of the first drive. You will also have to change /etc/fstab and /boot/grub/menu.lst. In my /etc/fstab I had /dev/md1 mounted as the root. I changed that to /dev/hde3. Because my home was mounted to another RAID device, I also changed that to /dev/hde2 in /etc/fstab. This is where I copied /home. /dev/md1 / ext3 defaults 1 1 /dev/md0 /home ext3 defaults 1 2 changed to: /dev/hde3 / ext3 defaults 1 1 /dev/hde2 /home ext3 defaults 1 2 Finally, I modified kernel line in GRUB. It was: kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/md1 I changed it to: kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/hde3 With those changes I was ready to boot to the new drive. Sidetrip and testing:Before I went any further, I wanted to test the system to make sure everything was copied successfully and the system would still work. I had placed the placed 80GB drive on the HighPoint controller as /dev/hdm so all four original drives were still in the system, but the original boot drive (/dev/hde) was now /dev/hdm. When I rebooted, the system came up fine. Well it actually didn't, but that was just a typo problem. The previous partition used for root on the boot drive was /dev/hde3. It was now /dev/hde2. A subtle difference, but I had to change /boot/grub/menu.lst from root (hd0,2) to root (hd0,1) to get the system to boot. After that it booted just fine. Because the RAID devices had persistent superblocks (new style instead of /etc/raidtab), the RAID drives were assembled into arrays except it couldn't find /dev/hde which was now /dev/hdm. I thought that wouldn't make a difference as long as all the drives were available. Obviously it does as the RAID devices came up in degraded mode. It didn't make a lot of difference as the data was copied onto the new drive, plus it was still available in the degraded array Since I had all the data on the new drive /dev/hde, I decided to experiment with the RAID drives. I tried to assemble the RAID1 array to see if it would put them back together. I first stopped the array (mdadm --stop /dev/md1), then assembled it (mdadm --assemble /dev/md1). This didn't work as mdadm was reading the configuration from the mdadm.conf file. The mdadm.conf file was now out of date because I had moved one drive since I updated the file. The proper method would be to specify the devices to assemble: [root@wysenburg /]#/sbin/mdadm --assemble /dev/md1 /dev/hdg3 /dev/hdm3 or have the system search for the proper drives by specifying the super-minor number: [root@wysenburg /]#/sbin/mdadm --assemble -m 1 /dev/md1 This would search for all the devices with a minor number of 1 and try to assemble them together. I could add the drive (mdadm /dev/md1 -a /dev/hdm2), but then it treated the drive as a new drive and started the process of synchronizing the array. The data on /dev/hdm2 would be updated with the first drive in the array. Step 3: Copy data to RAID and add missing drive into RAID arraysWhen I decided that the system was back up running and all the data was intact, I shutdown and installed the other drives. I moved the two 80GB drives to the slots three and four and installed the other new 120GB drive in slot two. I then partitioned the drives as shown in the table at the top of this page. I created the two new RAID arrays in degraded mode. They were missing the first drive. That drive had root and /home partitions so they couldn't be added at this time. I used the commands: mdadm --create /dev/md1 --level=1 --raid-devices=2 missing /dev/hdg2 mdadm --create /dev/md5 --level=5 --raid-devices=4 missing /dev/hdg3 /dev/hdi2 /dev/hdl2 The super-minor number of the RAID device is arbitrary. In my case I just used a number to reflect the type of RAID. Step 4: Copy data to RAID and add /dev/hde back into RAID arraysI formatted the drive with ext3 file system (mke2fs -j /dev/md1) and copied the root file system over using the same method as before only the other way: [root@wysenburg /]#mkdir /mnt/newhome [root@wysenburg /]#mount /dev/md5 /mnt/newhome [root@wysenburg /]#cp -ax /home /mnt/newhome I did the same for the root partition. I then had to change the /etc/fstab back to mounting the RAID arrays for root and /home. /dev/hde3 / ext3 defaults 1 1 /dev/hde2 /home ext3 defaults 1 2 changed back to: /dev/md1 / ext3 defaults 1 1 /dev/md5 /home ext3 defaults 1 2 The final change was to change /boot/grub/menu.lst to tell the kernel the root=/dev/md1 kernel /boot/vmlinuz-2.4.22-1.2115.nptlsmp ro root=/dev/md1 I rebooted and when the system cam back up, I used df to check that the RAID devices had mounted properly: [root@wysenburg /]# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/md1 36108480 18364300 15909964 54% / /dev/md5 235598184 90994224 132636188 41% /home I then added the last drive into the arrays: [root@wysenburg /]#mdadm /dev/md1 -a /dev/hde2 [root@wysenburg /]#mdadm /dev/md5 -a /dev/hde3 The RAID arrays took a couple of hours to resynchronize and the conversion was complete. However, mdadm reports that there is a failed device. [root@wysenburg /]# /sbin/mdadm --detail /dev/md1
/dev/md5:
Version : 00.90.00
Creation Time : Sun Oct 16 16:03:20 2004
Raid Level : raid5
Array Size : 239355456 (228.27 GiB 245.10 GB)
Device Size : 79785152 (76.09 GiB 81.70 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 5
Persistence : Superblock is persistent
Update Time : Sun Oct 16 22:05:10 2004
State : dirty, no-errors
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
0 33 3 0 active sync /dev/hde3
1 34 3 1 active sync /dev/hdg3
2 56 2 2 active sync /dev/hdi2
3 57 66 3 active sync /dev/hdl2
UUID : ec7a419c:2bae6206:de50f4bc:d5c1e1d5
Events : 0.6
The failed device was the one I specified was missing when I created the array. I had the same problem with previous arrays. I don't think there is a method to correct the report other then deleting the array and creating a new one. The array works fine the way it is, although the report is a little disconcerting. |
|