Adventures in BTRFS: Replacing an array

Posted on Wed 20 December 2023 in Technical

Preamble ramble

There aren't a lot of real world examples of using BTRFS around online. Some of the most useful advice I found was in r/BTRFS on Reddit but it's hard to actually find user experiences beyond the theroretical.
This is sort of one, but the tl;dr is have good backups and learn from your experiences. The practical stuff starts at 'The Practical Stuff'.

I've had a home disk array for a long time for various reasons. As much as anything I have a home music streaming system as I prefer to buy and own music, even if it is as a download, and over the years I have digitised CDs, originally into iTunes but currently using Funkwhale.
The current disk array is a Terramaster D5-300, a five disk JBOD array that has Windows based software RAID support, but, being Just A Bunch of Disks, can be set up with other on-disk RAID architectures, so, as a Linux user, there were a couple of immediate choices: ZFS, or BTRFS.
ZFS is an enterprise level filesystem that was originally developed by Sun (RIP) but was ported to Linux when Sun's BSD based OS Solaris was open sourced before the Oracle takeover and was forked as OpenZFS. ZFS itself is a great filesystem, but at the time (2016) OpenZFS development had slowed and it's not really a good fit for the kind of low powered machine that goes with a home setup (I think I set it up on an Acer Revo at first).
BTRFS, on the other hand, is in the Linux kernel, so if something runs the kernel, it should work, and just needs the btrfs-tools package to manage it, so it can be configured and moved around if necessary.

At the beginning of this month, owing to a full fibre upgrade and a declining powerline network that meant my home office wasn't enjoying the benefits of the kind of connectivity that was unimaginable when I first plugged in a modem, I had ethernet run around the house by Tom at WestworldIT, which required the study to look less of a tip, facilitiating a much needed clean but also the shutdown of the server for a while, and as is often the case in these situations, checking the disks with smartctl showed that a couple of the aging second hand 3TB disks in it failed their tests. I had a pair of 4TB disks so went looking for some more slightly less aging second hand ones and found some via eBay at the Edinburgh Remakery.
As the array disks were holding up, I backed them up to a spare 4TB and started the replacement.

The Practical Stuff

Note: there are mistakes here. If you know anything about BTRFS you will probably be shouting at me.
The array was originally set up as RAID10 with metadata striped 1c3, so on three disks.
I replaced one of the failing disks with the spare and ran btrfs balance /mount. I should have possibly done btrfs replace here but both seem to be valid at this point. However, balance is a rich command and works best with filters, which are well described in the manual page for btrfs-balance, so running it as is is going to take some time.
When the new (second-hand) 4TB disks arrived the next day, I left them to warm up and let the balance complete.
The disks were a variety of 'desktop' (does anyone have a 4TB drive in a desktop these days - I suppose they do) and NAS disks, all around six years old. They all checked out with smartctl, so I started to replace the older 3TB disks using btrfs replace start <id> /dev/sdx /mount.
Each disk took about 24 hours, for around 2.5TB of data. The first two worked perfectly. Progress can be monitored with btrfs replace status /mount.
I did two disks and resized them, but the third appeared to be showing problems when trying to mount the volume. It appeared it might be the disk itself but smartctl didn't show anything. I swapped it out for the other spare 4TB that I used to back up (copying the data onto one of the 3TB disks and a bit of juggling).
However, here was my mistake: I picked out the last disk I replaced for the backup, and that must have been the one that had metadata on it, so I was now down to two metadata disks and the replace had got stuck and the volume would only mount read only, indicating that it was time to get data off and start again. You are welcome to shout now, as I did.
As I have backups I decided not to fight and just recreated the volume and started restoring and it looks like it's going to be a day or so of rsyncs.

Lessons learned

  1. Maintain backups
  2. Watch what state your metadata is in between swapping disks - I'm not 100% sure that this was the issue but it's what I suspect, and balancing metadata would probably have avoided it.
  3. Have patience - it's a slow process and rightly so. Cloning disks appears to be an alternative but I'm not sure how that would work with a RAID scheme like 1c3.
  4. Maintain backups.

As a note the current storage host is a 8GB Raspberry Pi 4, which is excellent as a server but less so in doing maintenance work, so that gets handed off to a laptop.