Might have found the solution, reset the vdev_id.conf file to a /disk/by-path configuration. Then established a raidz1 pool. Which I used a drive that I figured would fault inside that pool. Then sent 6 TB write to the pool, which is what happened in the middle of the write, and tray11 went into fault.
So I’ll attach a copy of the vdev_id.conf file I used.
#_______Start of File_______________________________________________________________________
# vdev_id.conf file using disk by-path alaises
#for disk uuid ls -lh /dev/disk/by-uuid |grep sd*# <...(recommended method)
#for partitions ls -lh /dev/disk/by-partuuid |grep sd*# < must be partitioned and formatted
#for wwn# ls -lh /dev/disk/by-id |grep sd*#
# for by-path ls -lh /dev/disk/by-path |grep sd*
#---------------------------------------------------------------------------------------------------------------
alias tray10 /dev/disk/by-path/pci-0000:05:00.0-sas-phy2-lun-0
#SN Z1ZAVMM6
alias tray11 pci-0000:05:00.0-sas-phy3-lun-0
#SN ZC116KNN
alias tray12 pci-0000:05:00.0-sas-phy1-lun-0
#SN Z1ZAVMFC
alias tray13 pci-0000:05:00.0-sas-phy0-lun-0
#SN Z1ZAYE7F
#--------------------------------------------------------------------------------------------
# once fully edited save then issue> sudo udevadm trigger
# validate new disk aliases are working issue> cd /dev/disk/by-vdev && ll
# note that the alias name may not be visable to ls / lsblk or blkid but
# once the zpool is created the zpool will ID by alais name in zpool status and list reports
# if pool is created without alias export the pool then import with
# > sudo import -d /dev/disk/by-vdev [poolname]
#_____________End Of File_______________________________________________________
Now, for the details, because of the fact that the alias is tied to the pci path it assumes that name “pci-0000:05:00.0-sas-phy3-lun-0” in lieu of any labels or other names on the drive.
sudo zpool status
pool: datapool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0B in 00:00:48 with 0 errors on Mon Dec 16 13:19:51 2024
config:
NAME STATE READ WRITE CKSUM
datapool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
tray10 ONLINE 0 0 0
tray11 FAULTED 0 10 0 too many errors
tray12 ONLINE 0 0 0
tray13 ONLINE 0 0 0
errors: No known data errors
I then pulled the fault drive and inserted the new replacement drive.
sudo zpool status datapool
pool: datapool
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: scrub in progress since Tue Dec 17 02:45:46 2024
4.47T / 4.47T scanned, 179G / 4.47T issued at 203M/s
0B repaired, 3.91% done, 06:08:43 to go
config:
NAME STATE READ WRITE CKSUM
datapool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
tray10 ONLINE 0 0 0
tray11 REMOVED 0 0 0
tray12 ONLINE 0 0 0
tray13 ONLINE 0 0 0
errors: No known data errors
So I then issued sudo zpool replace datapool tray11 with the replacement drive in the same slot as the fault drive. Which according to the documentation it should have been just inserting the new drive.
Which before I loaded the new vdev_id.conf file when I attempted to use the same slot on the enclosure. It would error and refuse to replace the drive.
But now the drive is resilvering
$ sudo zpool status datapool
[sudo] password for mike:
pool: datapool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Dec 17 03:02:34 2024
4.47T / 4.47T scanned, 590G / 4.47T issued at 261M/s
148G resilvered, 12.90% done, 04:20:30 to go
config:
NAME STATE READ WRITE CKSUM
datapool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
tray10 ONLINE 0 0 0
replacing-1 DEGRADED 0 0 0
tray11-part1/old FAULTED 0 0 0 too many errors
tray11 ONLINE 0 0 0 (resilvering)
tray12 ONLINE 0 0 0
tray13 ONLINE 0 0 0
errors: No known data errors
Although it’s not exactly as the man pages said, it is in my opinion, acceptable as I don’t have to attach the drive to a different slot or connection for the replacement drive. I did not try a reboot to see if it would resilver on it’s own. Which could be the case as even though ZED is running it forces ZED to re-examine the drives.
I’ll add what the attempts to get autoreplace earlier, as I read one post elsewhere that the pool needed a spare. so I had established a pool with a spare using the early conf files. which lead to a fail.
sudo zpool status
[sudo] password for mike:
pool: datapool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:
NAME STATE READ WRITE CKSUM
datapool DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
tray10 ONLINE 0 0 0
15221825480377120240 UNAVAIL 0 0 0 was /dev/disk/by-vdev/tray11-part1
tray12 ONLINE 0 0 0
spares
tray13 AVAIL
errors: No known data errors
as you can see the spare didn’t kick in and I had placed a drive into the tray and issued the replace command. When I actually read the out put “One or more devices could not be used because the label is missing or invalid.” I went to the actual openzfs manual and checked it when I seen a vague burp about using the /disk/by-path in the conf file. While it didn’t say explicitly that is should be used but, as read I realized it would clear the label fault. So I figured it was worth a shot.
history of pool creation To show that I used the aliases to create the pool, as well as settings. On the “successful” time with the NEW .cof file by-path.
2024-12-16.12:20:34 zpool create -f -o ashift=12 -o autoexpand=on -o autoreplace=on datapool raidz1 tray10 tray11 tray12 tray13
2024-12-16.12:20:45 zfs set compression=lz4 recordsize=1M xattr=sa atime=off datapool