7.8 KiB
title, description, published, date, tags, editor, dateCreated
| title | description | published | date | tags | editor | dateCreated |
|---|---|---|---|---|---|---|
| Replacing Failed Disk | Guide on removing an old yeller from a BtrFS RAID 1 array (for a new yeller) | true | 2022-04-30T20:10:40.002Z | btrfs, storage, nas, filesystem | markdown | 2022-04-04T16:25:48.663Z |
One of the old 3TB yellers has started playing dirty. We do not negotiate with terrorist - a pair of 8TB's were called for reinforcement on that very same day.
Below, I will write this page as I replace the failing, followed by the non failing disk, for the BtrFS RAID1 array on Takahe.
If all goes well, this will be a nice, cozy page. If I cause catastrophic data loss (again), this shall be a monument of my failure.
Do NOT use this method to replace a filesystem with errors! it will copy them over and they will be unrecoverable! {.is-danger}
Crossing Disk Serial with Device Name
Ever so pretentious, smartd will name a disk by it's serial - see this example below:
➜ ~ systemctl status smartd
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2022-04-04 08:01:55 IDT; 11h ago
Docs: man:smartd(8)
man:smartd.conf(5)
Main PID: 1014 (smartd)
Status: "Next check of 2 devices will start at 19:31:55"
Tasks: 1 (limit: 4915)
CPU: 85ms
CGroup: /system.slice/smartd.service
└─1014 /usr/sbin/smartd -n
Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
That's wonderful, honey.
But who is /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY?
btrfs sure as hell doesn't know:
➜ ~ btrfs filesystem show /Red-Vol
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 2 FS bytes used 2.21TiB
devid 1 size 2.73TiB used 2.21TiB path /dev/sdc
devid 2 size 2.73TiB used 2.21TiB path /dev/sdb
udevadm to the rescue! I even looped it nicely for ya :)
➜ ~ for disk in $(btrfs filesystem show /Red-Vol/ | awk '{print $NF}' | grep "/dev"); do echo $disk && udevadm info --query=all --name=$disk | grep ID_SERIAL; done
/dev/sdc
E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N3YN0903
E: ID_SERIAL_SHORT=WD-WCC4N3YN0903
/dev/sdb
E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY
E: ID_SERIAL_SHORT=WD-WCC4N7UEPSDY
A-ha! /dev/sdb, you bastard!
Crossing Device Name With devid (pointless)
But wait, there's more!
The btrfs replace command expects the devid (or the device name which we already know, making this section utterly insignificant, but what the heck).
To find it, check btrfs filesystem show [mountpoint]:
➜ ~ btrfs filesystem show /Red-Vol/
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 3 FS bytes used 2.21TiB
devid 1 size 2.73TiB used 2.21TiB path /dev/sdc
devid 2 size 2.73TiB used 2.21TiB path /dev/sdb
A-ha! devid 2, you bastard!
Replacing The Bastard
Now, run btrfs replace:
➜ ~btrfs replace start 2 /dev/sda /Red-Vol/ -f
The
-fwas thrown in because I have chosen to format the new disk with BtrFS beforehand. I have chosen to format the new disk with Btrfs beforehand because I am very stupid. {.info}
Now, all that is left is watching in panic:
➜ ~ btrfs replace status /Red-Vol
1.4% done, 0 write errs, 0 uncorr. read errs
Will it work? will it destroy ALL my data?
We shall see.
Resizing The Bastards
Success! Now, assuming we are replacing with larger disks (go big or go home, shmub), you will have to resize the disks.
First, see your devid's with btrfs filesystem show:
➜ ~ btrfs filesystem show /Red-Vol/
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 2 FS bytes used 2.21TiB
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
devid 2 size 2.73TiB used 2.21TiB path /dev/sda
Now, run btrfs filesystem resize [devid]:max [mountpoint]:
➜ ~ btrfs filesystem resize 1:max /Red-Vol
Resize device id 1 (/dev/sdb) from 7.28TiB to max
➜ ~ btrfs filesystem show /Red-Vol/
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 2 FS bytes used 2.21TiB
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
devid 2 size 2.73TiB used 2.21TiB path /dev/sda
➜ ~ btrfs filesystem resize 2:max /Red-Vol
Resize device id 2 (/dev/sda) from 2.73TiB to max
➜ ~ btrfs filesystem show /Red-Vol/
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 2 FS bytes used 2.21TiB
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
devid 2 size 7.28TiB used 2.21TiB path /dev/sda
Finally, to see your changes, remount the filesystem:
➜ ~ mount -o remount,rw /Red-Vol
➜ ~ btrfs filesystem show /Red-Vol/
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
Total devices 2 FS bytes used 2.21TiB
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
devid 2 size 7.28TiB used 2.21TiB path /dev/sda
Hurrah!
Mounting The Bastards
Do not go there. You know what you did. {.is-warning}
The best method to mount your new pool is by (one) of your disk's UUID - which is always unique.
Finding the UUID is easy with blkid:
➜ ~ blkid | grep /dev/sda
/dev/sda: UUID="c2d98db0-b903-4cc2-947c-4c4c944da026" UUID_SUB="19f4df76-f50b-48c2-ad4b-1f71936440cd" BLOCK_SIZE="4096" TYPE="btrfs"
Now, go fish:
➜ ~ cat /etc/fstab
...
...
...
UUID=c2d98db0-b903-4cc2-947c-4c4c944da026 /Red-Vol/ btrfs defaults,compress=zstd:11 0 0
# ^ This friendo right here from blkid
...
...
...
Or you can go by just the id, which is how OpenSUSE did it. I do not know why but I know they know better, you know?
...
...
...
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H /Red-Vol/ btrfs defaults,compress=zstd:11 0 0
...
...
...
Now, reboot and hope for the best.
Keep An Eye On The Bastards
Now, we add the disk(s) we replaced to smartd. Edit /etc/smartd.conf and add the disk:
#DEVICESCAN
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H -a
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0SHFR0H -a
Uncommenting DEVICESCAN also works, but we do not trust it.
Balance The Bastards & Scrub The Bastards
You're not assuming nothing went wrong, are you?
Anyway, if you got this far, run btrfs balance start [mountpoint]. If that checks out, run btrfs scrub start [mountpoint]. Each of these will take many, many hours.
Enjoy the rest of your day.