197 lines
7.8 KiB
Markdown
197 lines
7.8 KiB
Markdown
---
|
|
title: Replacing Failed Disk
|
|
description: Guide on removing an old yeller from a BtrFS RAID 1 array (for a new yeller)
|
|
published: true
|
|
date: 2022-04-30T20:10:40.002Z
|
|
tags: btrfs, storage, nas, filesystem
|
|
editor: markdown
|
|
dateCreated: 2022-04-04T16:25:48.663Z
|
|
---
|
|
|
|
One of the old 3TB yellers has started playing dirty.
|
|
We do not negotiate with terrorist - a pair of 8TB's were called for reinforcement on that very same day.
|
|
|
|
Below, I will write this page as I replace the failing, followed by the non failing disk, for the BtrFS RAID1 array on Takahe.
|
|
|
|
If all goes well, this will be a nice, cozy page. If I cause catastrophic data loss (again), this shall be a monument of my failure.
|
|
|
|
> Do **NOT** use this method to replace a filesystem with errors! it ***will*** copy them over and they ***will*** be unrecoverable!
|
|
{.is-danger}
|
|
|
|
# Crossing Disk Serial with Device Name
|
|
|
|
Ever so pretentious, `smartd` will name a disk by it's serial - see this example below:
|
|
```zsh
|
|
➜ ~ systemctl status smartd
|
|
● smartd.service - Self Monitoring and Reporting Technology (SMART) Daemon
|
|
Loaded: loaded (/usr/lib/systemd/system/smartd.service; enabled; vendor preset: enabled)
|
|
Active: active (running) since Mon 2022-04-04 08:01:55 IDT; 11h ago
|
|
Docs: man:smartd(8)
|
|
man:smartd.conf(5)
|
|
Main PID: 1014 (smartd)
|
|
Status: "Next check of 2 devices will start at 19:31:55"
|
|
Tasks: 1 (limit: 4915)
|
|
CPU: 85ms
|
|
CGroup: /system.slice/smartd.service
|
|
└─1014 /usr/sbin/smartd -n
|
|
|
|
Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
|
|
Apr 04 17:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
|
|
Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
|
|
Apr 04 17:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
|
|
Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
|
|
Apr 04 18:01:56 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
|
|
Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
|
|
Apr 04 18:31:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
|
|
Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 2 Currently unreadable (pending) sectors
|
|
Apr 04 19:01:55 Takahe smartd[1014]: Device: /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY [SAT], 4 Offline uncorrectable sectors
|
|
```
|
|
|
|
That's wonderful, honey.
|
|
But who is `/dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY`?
|
|
|
|
`btrfs` sure as hell doesn't know:
|
|
```zsh
|
|
➜ ~ btrfs filesystem show /Red-Vol
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 2 FS bytes used 2.21TiB
|
|
devid 1 size 2.73TiB used 2.21TiB path /dev/sdc
|
|
devid 2 size 2.73TiB used 2.21TiB path /dev/sdb
|
|
```
|
|
`udevadm` to the rescue! I even looped it nicely for ya :)
|
|
|
|
```zsh
|
|
➜ ~ for disk in $(btrfs filesystem show /Red-Vol/ | awk '{print $NF}' | grep "/dev"); do echo $disk && udevadm info --query=all --name=$disk | grep ID_SERIAL; done
|
|
/dev/sdc
|
|
E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N3YN0903
|
|
E: ID_SERIAL_SHORT=WD-WCC4N3YN0903
|
|
/dev/sdb
|
|
E: ID_SERIAL=WDC_WD30EFRX-68EUZN0_WD-WCC4N7UEPSDY
|
|
E: ID_SERIAL_SHORT=WD-WCC4N7UEPSDY
|
|
```
|
|
|
|
A-ha! `/dev/sdb`, you bastard!
|
|
|
|
# Crossing Device Name With devid (pointless)
|
|
But wait, there's more!
|
|
The `btrfs replace` command expects the `devid` (or the device name which we already know, making this section utterly insignificant, but what the heck).
|
|
|
|
To find it, check `btrfs filesystem show [mountpoint]`:
|
|
```zsh
|
|
➜ ~ btrfs filesystem show /Red-Vol/
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 3 FS bytes used 2.21TiB
|
|
devid 1 size 2.73TiB used 2.21TiB path /dev/sdc
|
|
devid 2 size 2.73TiB used 2.21TiB path /dev/sdb
|
|
```
|
|
|
|
A-ha! `devid 2`, you bastard!
|
|
# Replacing The Bastard
|
|
|
|
Now, run `btrfs replace`:
|
|
`➜ ~btrfs replace start 2 /dev/sda /Red-Vol/ -f`
|
|
> The `-f` was thrown in because I have chosen to format the new disk with BtrFS beforehand. I have chosen to format the new disk with Btrfs beforehand because I am very stupid.
|
|
{.info}
|
|
|
|
Now, all that is left is watching in panic:
|
|
```zsh
|
|
➜ ~ btrfs replace status /Red-Vol
|
|
1.4% done, 0 write errs, 0 uncorr. read errs
|
|
```
|
|
|
|
Will it work? will it destroy ALL my data?
|
|
|
|
We shall see.
|
|
|
|
# Resizing The Bastards
|
|
|
|
Success! Now, assuming we are replacing with larger disks (go big or go home, shmub), you will have to resize the disks.
|
|
First, see your `devid`'s with `btrfs filesystem show`:
|
|
```
|
|
➜ ~ btrfs filesystem show /Red-Vol/
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 2 FS bytes used 2.21TiB
|
|
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
|
|
devid 2 size 2.73TiB used 2.21TiB path /dev/sda
|
|
```
|
|
Now, run `btrfs filesystem resize [devid]:max [mountpoint]`:
|
|
```
|
|
➜ ~ btrfs filesystem resize 1:max /Red-Vol
|
|
Resize device id 1 (/dev/sdb) from 7.28TiB to max
|
|
➜ ~ btrfs filesystem show /Red-Vol/
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 2 FS bytes used 2.21TiB
|
|
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
|
|
devid 2 size 2.73TiB used 2.21TiB path /dev/sda
|
|
|
|
➜ ~ btrfs filesystem resize 2:max /Red-Vol
|
|
Resize device id 2 (/dev/sda) from 2.73TiB to max
|
|
➜ ~ btrfs filesystem show /Red-Vol/
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 2 FS bytes used 2.21TiB
|
|
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
|
|
devid 2 size 7.28TiB used 2.21TiB path /dev/sda
|
|
```
|
|
Finally, to see your changes, remount the filesystem:
|
|
```
|
|
➜ ~ mount -o remount,rw /Red-Vol
|
|
➜ ~ btrfs filesystem show /Red-Vol/
|
|
Label: none uuid: c2d98db0-b903-4cc2-947c-4c4c944da026
|
|
Total devices 2 FS bytes used 2.21TiB
|
|
devid 1 size 7.28TiB used 2.21TiB path /dev/sdb
|
|
devid 2 size 7.28TiB used 2.21TiB path /dev/sda
|
|
```
|
|
Hurrah!
|
|
|
|
# Mounting The Bastards
|
|
> Do not go there. You know what you did.
|
|
{.is-warning}
|
|
|
|
The best method to mount your new pool is by (one) of your disk's `UUID` - which is always unique.
|
|
|
|
Finding the `UUID` is easy with `blkid`:
|
|
```zsh
|
|
➜ ~ blkid | grep /dev/sda
|
|
/dev/sda: UUID="c2d98db0-b903-4cc2-947c-4c4c944da026" UUID_SUB="19f4df76-f50b-48c2-ad4b-1f71936440cd" BLOCK_SIZE="4096" TYPE="btrfs"
|
|
```
|
|
Now, go fish:
|
|
```
|
|
➜ ~ cat /etc/fstab
|
|
...
|
|
...
|
|
...
|
|
UUID=c2d98db0-b903-4cc2-947c-4c4c944da026 /Red-Vol/ btrfs defaults,compress=zstd:11 0 0
|
|
# ^ This friendo right here from blkid
|
|
...
|
|
...
|
|
...
|
|
```
|
|
Or you can go by just the `id`, which is how OpenSUSE did it. I do not know why but I know they know better, you know?
|
|
```
|
|
...
|
|
...
|
|
...
|
|
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H /Red-Vol/ btrfs defaults,compress=zstd:11 0 0
|
|
...
|
|
...
|
|
...
|
|
```
|
|
|
|
Now, reboot and hope for the best.
|
|
|
|
# Keep An Eye On The Bastards
|
|
Now, we add the disk(s) we replaced to `smartd`. Edit `/etc/smartd.conf` and add the disk:
|
|
```conf
|
|
#DEVICESCAN
|
|
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0PDFR0H -a
|
|
/dev/disk/by-id/ata-TOSHIBA_HDWG480_71Q0A0SHFR0H -a
|
|
```
|
|
Uncommenting `DEVICESCAN` also works, but we do not trust it.
|
|
|
|
# Balance The Bastards & Scrub The Bastards
|
|
You're not assuming nothing went wrong, are you?
|
|
|
|
Anyway, if you got this far, run `btrfs balance start [mountpoint]`. If that checks out, run `btrfs scrub start [mountpoint]`. Each of these will take many, many hours.
|
|
|
|
Enjoy the rest of your day.
|