--- title: Replacing Failed Disk description: Guide on removing an old yeller from a BtrFS RAID 1 array (for a new yeller) published: true date: 2022-04-30T20:10:40.002Z tags: btrfs, storage, nas, filesystem editor: markdown dateCreated: 2022-04-04T16:25:48.663Z --- One of the old 3TB yellers has started playing dirty. We do not negotiate with terrorist - a pair of 8TB's were called for reinforcement on that very same day. Below, I will write this page as I replace the failing, followed by the non failing disk, for the BtrFS RAID1 array on Takahe. If all goes well, this will be a nice, cozy page. If I cause catastrophic data loss (again), this shall be a monument of my failure. > Do **NOT** use this method to replace a filesystem with errors! it ***will*** copy them over and they ***will*** be unrecoverable! {.is-danger} # Crossing Disk Serial with Device Name Ever so pretentious, `smartd` will name a disk by it's serial - see this example below: ```zsh ➜ ~ systemctl status smartd ● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2022-04-04 08:01:55 IDT; 11h ago Docs: man:smartd(8) man:smartd.conf(5) Main PID: 1014 (smartd) Status: "Next check of 2 devices will start at 19:31:55" Tasks: 1 (limit: 4915) CPU: 85ms CGroup: /system.slice/smartd.service └─1014 /usr/sbin/smartd -n Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors ``` That's wonderful, honey. But who is `/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY`? `btrfs` sure as hell doesn't know: ```zsh ➜ ~ btrfs filesystem show /Red-Vol Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 2 FS bytes used 2.21TiB devid 1 size 2.73TiB used 2.21TiB path /dev/sdc devid 2 size 2.73TiB used 2.21TiB path /dev/sdb ``` `udevadm` to the rescue! I even looped it nicely for ya :) ```zsh ➜ ~ for disk in $(btrfs filesystem show /Red-Vol/ | awk '{print $NF}' | grep "/dev"); do echo $disk && udevadm info --query=all --name=$disk | grep ID_SERIAL; done /dev/sdc E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N3YN0903 E: ID_SERIAL_SHORT=WD-WCC4N3YN0903 /dev/sdb E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY E: ID_SERIAL_SHORT=WD-WCC4N7UEPSDY ``` A-ha! `/dev/sdb`, you bastard! # Crossing Device Name With devid (pointless) But wait, there's more! The `btrfs replace` command expects the `devid` (or the device name which we already know, making this section utterly insignificant, but what the heck). To find it, check `btrfs filesystem show [mountpoint]`: ```zsh ➜ ~ btrfs filesystem show /Red-Vol/ Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 3 FS bytes used 2.21TiB devid 1 size 2.73TiB used 2.21TiB path /dev/sdc devid 2 size 2.73TiB used 2.21TiB path /dev/sdb ``` A-ha! `devid 2`, you bastard! # Replacing The Bastard Now, run `btrfs replace`: `➜ ~btrfs replace start 2 /dev/sda /Red-Vol/ -f` > The `-f` was thrown in because I have chosen to format the new disk with BtrFS beforehand. I have chosen to format the new disk with Btrfs beforehand because I am very stupid. {.info} Now, all that is left is watching in panic: ```zsh ➜ ~ btrfs replace status /Red-Vol 1.4% done, 0 write errs, 0 uncorr. read errs ``` Will it work? will it destroy ALL my data? We shall see. # Resizing The Bastards Success! Now, assuming we are replacing with larger disks (go big or go home, shmub), you will have to resize the disks. First, see your `devid`'s with `btrfs filesystem show`: ``` ➜ ~ btrfs filesystem show /Red-Vol/ Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 2 FS bytes used 2.21TiB devid 1 size 7.28TiB used 2.21TiB path /dev/sdb devid 2 size 2.73TiB used 2.21TiB path /dev/sda ``` Now, run `btrfs filesystem resize [devid]:max [mountpoint]`: ``` ➜ ~ btrfs filesystem resize 1:max /Red-Vol Resize device id 1 (/dev/sdb) from 7.28TiB to max ➜ ~ btrfs filesystem show /Red-Vol/ Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 2 FS bytes used 2.21TiB devid 1 size 7.28TiB used 2.21TiB path /dev/sdb devid 2 size 2.73TiB used 2.21TiB path /dev/sda ➜ ~ btrfs filesystem resize 2:max /Red-Vol Resize device id 2 (/dev/sda) from 2.73TiB to max ➜ ~ btrfs filesystem show /Red-Vol/ Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 2 FS bytes used 2.21TiB devid 1 size 7.28TiB used 2.21TiB path /dev/sdb devid 2 size 7.28TiB used 2.21TiB path /dev/sda ``` Finally, to see your changes, remount the filesystem: ``` ➜ ~ mount -o remount,rw /Red-Vol ➜ ~ btrfs filesystem show /Red-Vol/ Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026 Total devices 2 FS bytes used 2.21TiB devid 1 size 7.28TiB used 2.21TiB path /dev/sdb devid 2 size 7.28TiB used 2.21TiB path /dev/sda ``` Hurrah! # Mounting The Bastards > Do not go there. You know what you did. {.is-warning} The best method to mount your new pool is by (one) of your disk's `UUID` - which is always unique. Finding the `UUID` is easy with `blkid`: ```zsh ➜ ~ blkid | grep /dev/sda /dev/sda: UUID="c2d98db0-b903-4cc2-947c-4c4c944da026" UUID_SUB="19f4df76-f50b-48c2-ad4b-1f71936440cd" BLOCK_SIZE="4096" TYPE="btrfs" ``` Now, go fish: ``` ➜ ~ cat /etc/fstab ... ... ... UUID=c2d98db0-b903-4cc2-947c-4c4c944da026 /Red-Vol/ btrfs defaults,compress=zstd:11 0 0 # ^ This friendo right here from blkid ... ... ... ``` Or you can go by just the `id`, which is how OpenSUSE did it. I do not know why but I know they know better, you know? ``` ... ... ... /dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H /Red-Vol/ btrfs defaults,compress=zstd:11 0 0 ... ... ... ``` Now, reboot and hope for the best. # Keep An Eye On The Bastards Now, we add the disk(s) we replaced to `smartd`. Edit `/etc/smartd.conf` and add the disk: ```conf #DEVICESCAN /dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H -a /dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0SHFR0H -a ``` Uncommenting `DEVICESCAN` also works, but we do not trust it. # Balance The Bastards & Scrub The Bastards You're not assuming nothing went wrong, are you? Anyway, if you got this far, run `btrfs balance start [mountpoint]`. If that checks out, run `btrfs scrub start [mountpoint]`. Each of these will take many, many hours. Enjoy the rest of your day.