Vdev_id.conf Question

sgt-mike · December 15, 2024, 5:05am

Currently I’m using a by-vdev using device link aliases in a vdev_id.conf to mount and identify my drives with a pool. Everything is working fine, and currently I don’t have plans on really changing it. This is just a question for my own and others education.
BUT…
I did some more research in a area of the zfs auto replace command which doesn’t work with a device link alias. So I dug a bit deeper and found out why, simply to use the auto replace feature the drive have to be named the same. Which when using the vdev_id.conf (in the manner I am ) the naming is on disk by id (wwn, or uuid).
In the man pages it addressed non and multipath sas configuration with example that did give the drives the same name. So if I understand the man pages. this “should” be correct???
First lspci

lspci -knn | grep 'LSI'
03:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
        Subsystem: Broadcom / LSI SAS 9300-16i [1000:3130]
05:00.0 Serial Attached SCSI controller [0107]: Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 [1000:0097] (rev 02)
        Subsystem: Broadcom / LSI SAS 9300-16i [1000:3130]

now a non-multipath configuration vdev_id.conf file that I “think will work” …at least 90% chance of being correct maybe???

# A non-multipath configuration with direct-attached SAS enclosures and an arbitrary slot re-mapping:

# by-vdev using SAS topology on the HBA
multipath     no
topology      sas_direct
phys_per_port 4
slot          bay

#       PCI_SLOT HBA PORT  CHANNEL NAME
channel 03:00.0  0         A
channel 03:00.0  0         B
channel 05:00.0  1         C
channel 05:00.0  1         D


# Custom mapping

#    Linux      Mapped
#    Slot       Slot      Channel
slot 1          0         A
#SN
slot 2          1         A
#SN
slot 3          2         A
slot 4          3         A
slot 5          4         B
slot 6          5         B
slot 7          6         B
slot 8          7         B
slot 9          0         C
slot 10         1         C
slot 11         2         C
slot 12         3         C
slot 13         4         D
slot 14         5         D
slot 15         6         D
slot 16         7         D

Overlooking it and thinking about the configuration I’m concerned/confused or a combination that would /could case a error.
The LSI card had 4 ports so 4 drive per port x4 = 16.
This is the part that gets me, even though when I go into the HBA configuration it combines the two ports on 03.00.0 channel into one.

phys_per_port 4
slot          bay

#       PCI_SLOT HBA PORT  CHANNEL NAME
channel 03:00.0  0         A
channel 03:00.0  0         B

I’m wondering if it shouldn’t read

phys_per_port 8
slot          bay

#       PCI_SLOT HBA PORT  CHANNEL NAME
channel 03:00.0  0         A
channel 05:00.0  0         B

This not a rush or urgent Question so if no quick replies No issues.
(if it was a rush LOL I’d pull the spare Motherboard and 1 of the two spare HBA’s I have on the shelf and try it)
(P.S. I’m probably overthinking it)

sgt-mike · December 15, 2024, 5:58pm

Finally broke down, edited the vdev_id.conf and finally did get it to show the drives, after actually a few attempts. I was then able to create a zpool with the names generated.
I wound up with this for a working vdev_id.conf by going through the controller ( as it was the only one to pickup all the drives)

# A non-multipath configuration with direct-attached SAS enclosures and an arbitrary slot re-mapping:
# validate pci addres > lspci -knn | grep 'LSI'  or use lspci -knn | grep 'controller'
# by-vdev using SAS topology on the HBA

multipath     no
topology      sas_direct
phys_per_port 4
slot          bay

#       PCI_SLOT HBA PORT  CHANNEL NAME
channel 03:00.0  0         Drive
channel 03:00.0  1         Drive
channel 05:00.0  0         Tray
channel 05:00.0  1         Tray


# Custom mapping

#    Linux      Mapped
#    Slot       Slot      Channel
slot 1          0         A
#SN
slot 2          1         A
#SN
slot 3          2         A
#SN
slot 4          3         A
#SN
slot 5          4         B
#SN
slot 6          5         B
#SN
slot 7          6         B
#SN
slot 8          7         B
#SN
slot 9          0         C
#SN
slot 10         1         C
#SN
slot 11         2         C
#SN
slot 12         3         C
#SN
slot 13         4         D
#SN
slot 14         5         D
#SN
slot 15         6         D
#SN
slot 16         7         D
#SN

#--------------------------
# once fully edited save then issue> sudo udevadm trigger
# validate new disk aliases are working issue> cd /dev/disk/by-vdev && ll
# note that the alias name may not be visable to ls / lsblk or blkid but
# once the zpool is created the zpool will ID by slot number akin to drive number
# in status and informational reports

this is the command I used which generated this output

sudo zpool create -f -o ashift=12 -o autoexpand=on -o autoreplace=on datapool raidz1 Tray0 Tray1 Tray2 Tray3
mike@Beastie:/dev/disk/by-vdev$ cd ~
mike@Beastie:~$ sudo zpool status
  pool: datapool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            Tray0   ONLINE       0     0     0
            Tray1   ONLINE       0     0     0
            Tray2   ONLINE       0     0     0
            Tray3   ONLINE       0     0     0

So this lead me to the actual reason for this pain in the 5th point of contact … The autoreplace = on, so I faulted a drive, Which per the Ubuntu man page I should only have to remove the faulted drive. Insert new drive and boom resilver… Nope … No can do G.I.
Read more on other sites (Ask Ubuntu , Stack exchange, pretty much you name I went there) Oh can’t be partitioned when creating the vdev into the pool. OK I tried that and about 6 other ways mentioned at least lost count. Now would the pool advised me to use the manual replace command yes.
If there is a way , please post it . But for now I’ve returned the NFS to device link alaises .conf file.
Which if the autoreplace isn’t going to work and I still have to manually issue a command to replace a drive. I would prefer the device link method it’s easier to read and setup. (just my honest opinion, heck personally I don’t really mind issuing the command of replace)
a sample here of my current vdev_id.conf file after all the testing / Attempts to get autoreplace to perform as stated in the man pages, just so some can see.

#    by-vdev using device link alaises
#for disk uuid ls -lh /dev/disk/by-uuid |grep sd*# <...(recommended method)
#for partitions ls -lh /dev/disk/by-partuuid |grep sd*# < must be partitioned and formatted
#for wwn# ls -lh /dev/disk/by-id |grep sd*#
#------------------------------------------------------------------------------
# setup for mediapool1 raidz2 9 drives wide external bays
# For an additional pools use internal drive bays within beastie's case and other bays
# of existing systems
#------------------------------------------------------------------------------
#     name         fully qualified or base name of device link
alias beastdrive1      /dev/disk/by-id/wwn-0x5000cca2430f81c8-part1
#SN NHG8JBPN
alias beastdrive2      wwn-0x5000cca2430eb1c4-part1
#SN NHG82J7N
alias beastdrive3      wwn-0x5000cca2430e6bd4-part1
#SN NHG7XVVN
alias beastdrive4      wwn-0x5000cca2430f88b0-part1
#SN NHG8JUYN
alias beastdrive5      wwn-0x5000cca2430efdac-part1
#SN NHG87KYN
alias beastdrive6      wwn-0x5000cca2430f90fc-part1
#SN NHG8KD2N
alias beastdrive7      wwn-0x5000cca2430f7bd0-part1
#SN NHG8HZBN
alias beastdrive8      wwn-0x5000cca2430f8154-part1
#SN NHG8JASN
alias beastdrive9      wwn-0x5000cca2430f779c-part1
#SN NHG8HPPN
# data pool start
alias tray10          wwn-0x5000c500855fc8c7
#SN ZC116KNN
alias tray11          wwn-0x5000c50085716b6b
#SN ZC11WJ9N
alias tray12          wwn-0x5000c500579d0a2b
#SN ZC16AC1P
alias tray13          wwn-0x5000c500579c809b
#SN Z1ZAYE7F
#--------------------------------------------------------
# once fully edited save then issue> sudo udevadm trigger
# validate new disk aliases are working issue> cd /dev/disk/by-vdev && ll

But even with this a lsblk /etc etc will not list the drive aliases
But when checking the pool via status or list the aliases is present.

sudo zpool status
  pool: datapool
 state: ONLINE
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            tray10  ONLINE       0     0     0
            tray11  ONLINE       0     0     0
            tray12  ONLINE       0     0     0
            tray13  ONLINE       0     0     0

errors: No known data errors

  pool: mediapool1
 state: ONLINE
  scan: scrub repaired 0B in 01:45:33 with 0 errors on Sun Dec 15 09:50:09 2024
config:

        NAME                   STATE     READ WRITE CKSUM
        mediapool1             ONLINE       0     0     0
          raidz2-0             ONLINE       0     0     0
            beastdrive1-part1  ONLINE       0     0     0
            beastdrive2-part1  ONLINE       0     0     0
            beastdrive3-part1  ONLINE       0     0     0
            beastdrive4-part1  ONLINE       0     0     0
            beastdrive5-part1  ONLINE       0     0     0
            beastdrive6-part1  ONLINE       0     0     0
            beastdrive7-part1  ONLINE       0     0     0
            beastdrive8-part1  ONLINE       0     0     0
            beastdrive9-part1  ONLINE       0     0     0

in the mediapool1 pool where it list i.e beastdrive#-part1 That is because I actually partitioned the drive prior to creating the pool. Where as in the data pool I used the whole disk which netted me a way cleaner output. Although when I run lsblk -o type,name,label,partlabel,size,model,serial,wwn
I get this

TYPE NAME     LABEL      PARTLABEL              SIZE MODEL                          SERIAL       WWN
disk sda                                        3.6T MB4000JEQNL                    NHG8JBPN     0x5000cca2430f81c8
part └─sda1   mediapool1 primary                3.6T                                             0x5000cca2430f81c8
disk sdb                                        3.6T MB4000JEQNL                    NHG8JUYN     0x5000cca2430f88b0
part └─sdb1   mediapool1 primary                3.6T                                             0x5000cca2430f88b0
disk sdc                                        3.6T MB4000JEQNL                    NHG7XVVN     0x5000cca2430e6bd4
part └─sdc1   mediapool1 primary                3.6T                                             0x5000cca2430e6bd4
disk sdd                                        3.6T MB4000JEQNL                    NHG82J7N     0x5000cca2430eb1c4
part └─sdd1   mediapool1 primary                3.6T                                             0x5000cca2430eb1c4
disk sde                                        3.6T MB4000JEQNL                    NHG8JASN     0x5000cca2430f8154
part └─sde1   mediapool1 primary                3.6T                                             0x5000cca2430f8154
disk sdf                                        3.6T MB4000JEQNL                    NHG8HZBN     0x5000cca2430f7bd0
part └─sdf1   mediapool1 primary                3.6T                                             0x5000cca2430f7bd0
disk sdg                                        3.6T MB4000JEQNL                    NHG87KYN     0x5000cca2430efdac
part └─sdg1   mediapool1 primary                3.6T                                             0x5000cca2430efdac
disk sdh                                        3.6T MB4000JEQNL                    NHG8KD2N     0x5000cca2430f90fc
part └─sdh1   mediapool1 primary                3.6T                                             0x5000cca2430f90fc
disk sdi                                        3.6T MB4000JEQNL                    NHG8HPPN     0x5000cca2430f779c
part └─sdi1   mediapool1 primary                3.6T                                             0x5000cca2430f779c
disk sdj                                        3.6T ST4000NM0023                   Z1ZAVMM6     0x5000c500855fc8c7
part ├─sdj1   datapool   zfs-93d9439c160d5a98   3.6T                                             0x5000c500855fc8c7
part └─sdj9                                       8M                                             0x5000c500855fc8c7
disk sdk                                        3.6T ST4000NM0023                   Z1ZAYE7F     0x5000c50085716b6b
part ├─sdk1   datapool   zfs-44ef2f25f8135cac   3.6T                                             0x5000c50085716b6b
part └─sdk9                                       8M                                             0x5000c50085716b6b
disk sdl                                        3.6T ST4000NM0023                   Z1Z2DRWH     0x5000c500579d0a2b
part ├─sdl1   datapool   zfs-feb8c165f4f1939d   3.6T                                             0x5000c500579d0a2b
part └─sdl9                                       8M                                             0x5000c500579d0a2b
disk sdm                                        3.6T ST4000NM0023                   Z1Z2DBZC     0x5000c500579c809b
part ├─sdm1   datapool   zfs-15097ad7ae2c0ecf   3.6T                                             0x5000c500579c809b
part └─sdm9                                       8M                                             0x5000c500579c809b

Which shows datapool with 2 partitions ( yes I know ZFS is supposed to do that) But when viewing in list / status etc that pool doesn’t show the partition 1 and 9.
Oh and if somebody bring up ZED , yeah ZED was running, and active during this, which if my understanding is correct should have triggered replacement.

sgt-mike · December 17, 2024, 9:45am

Might have found the solution, reset the vdev_id.conf file to a /disk/by-path configuration. Then established a raidz1 pool. Which I used a drive that I figured would fault inside that pool. Then sent 6 TB write to the pool, which is what happened in the middle of the write, and tray11 went into fault.
So I’ll attach a copy of the vdev_id.conf file I used.

#_______Start of File_______________________________________________________________________
#    vdev_id.conf file using disk by-path alaises
#for disk uuid ls -lh /dev/disk/by-uuid |grep sd*# <...(recommended method)
#for partitions ls -lh /dev/disk/by-partuuid |grep sd*# < must be partitioned and formatted
#for wwn# ls -lh /dev/disk/by-id |grep sd*#
# for by-path ls -lh /dev/disk/by-path |grep sd*
#---------------------------------------------------------------------------------------------------------------
alias tray10          /dev/disk/by-path/pci-0000:05:00.0-sas-phy2-lun-0   
#SN Z1ZAVMM6
alias tray11          pci-0000:05:00.0-sas-phy3-lun-0
#SN ZC116KNN
alias tray12          pci-0000:05:00.0-sas-phy1-lun-0
#SN Z1ZAVMFC
alias tray13          pci-0000:05:00.0-sas-phy0-lun-0
#SN Z1ZAYE7F

#--------------------------------------------------------------------------------------------
# once fully edited save then issue> sudo udevadm trigger
# validate new disk aliases are working issue> cd /dev/disk/by-vdev && ll
# note that the alias name may not be visable to ls / lsblk or blkid but
# once the zpool is created the zpool will ID by alais name in zpool status and list reports
# if pool is created without alias export the pool then import with 
# >  sudo import -d /dev/disk/by-vdev [poolname]                                 
#_____________End Of File_______________________________________________________

Now, for the details, because of the fact that the alias is tied to the pci path it assumes that name “pci-0000:05:00.0-sas-phy3-lun-0” in lieu of any labels or other names on the drive.

sudo zpool status
  pool: datapool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 0B in 00:00:48 with 0 errors on Mon Dec 16 13:19:51 2024
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            tray10  ONLINE       0     0     0
            tray11  FAULTED      0    10     0  too many errors
            tray12  ONLINE       0     0     0
            tray13  ONLINE       0     0     0

errors: No known data errors

I then pulled the fault drive and inserted the new replacement drive.

sudo zpool status datapool
  pool: datapool
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using zpool online' or replace the device with
        'zpool replace'.
  scan: scrub in progress since Tue Dec 17 02:45:46 2024
        4.47T / 4.47T scanned, 179G / 4.47T issued at 203M/s
        0B repaired, 3.91% done, 06:08:43 to go
config:

        NAME        STATE     READ WRITE CKSUM
        datapool    DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            tray10  ONLINE       0     0     0
            tray11  REMOVED      0     0     0
            tray12  ONLINE       0     0     0
            tray13  ONLINE       0     0     0

errors: No known data errors

So I then issued sudo zpool replace datapool tray11 with the replacement drive in the same slot as the fault drive. Which according to the documentation it should have been just inserting the new drive.
Which before I loaded the new vdev_id.conf file when I attempted to use the same slot on the enclosure. It would error and refuse to replace the drive.
But now the drive is resilvering

$ sudo zpool status datapool
[sudo] password for mike:
  pool: datapool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Dec 17 03:02:34 2024
        4.47T / 4.47T scanned, 590G / 4.47T issued at 261M/s
        148G resilvered, 12.90% done, 04:20:30 to go
config:

        NAME                    STATE     READ WRITE CKSUM
        datapool                DEGRADED     0     0     0
          raidz1-0              DEGRADED     0     0     0
            tray10              ONLINE       0     0     0
            replacing-1         DEGRADED     0     0     0
              tray11-part1/old  FAULTED      0     0     0  too many errors
              tray11            ONLINE       0     0     0  (resilvering)
            tray12              ONLINE       0     0     0
            tray13              ONLINE       0     0     0

errors: No known data errors

Although it’s not exactly as the man pages said, it is in my opinion, acceptable as I don’t have to attach the drive to a different slot or connection for the replacement drive. I did not try a reboot to see if it would resilver on it’s own. Which could be the case as even though ZED is running it forces ZED to re-examine the drives.

I’ll add what the attempts to get autoreplace earlier, as I read one post elsewhere that the pool needed a spare. so I had established a pool with a spare using the early conf files. which lead to a fail.

sudo zpool status
[sudo] password for mike:
  pool: datapool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
config:

        NAME                      STATE     READ WRITE CKSUM
        datapool                  DEGRADED     0     0     0
          raidz1-0                DEGRADED     0     0     0
            tray10                ONLINE       0     0     0
            15221825480377120240  UNAVAIL      0     0     0  was /dev/disk/by-vdev/tray11-part1
            tray12                ONLINE       0     0     0
        spares
          tray13                  AVAIL

errors: No known data errors

as you can see the spare didn’t kick in and I had placed a drive into the tray and issued the replace command. When I actually read the out put “One or more devices could not be used because the label is missing or invalid.” I went to the actual openzfs manual and checked it when I seen a vague burp about using the /disk/by-path in the conf file. While it didn’t say explicitly that is should be used but, as read I realized it would clear the label fault. So I figured it was worth a shot.
history of pool creation To show that I used the aliases to create the pool, as well as settings. On the “successful” time with the NEW .cof file by-path.

2024-12-16.12:20:34 zpool create -f -o ashift=12 -o autoexpand=on -o autoreplace=on datapool raidz1 tray10 tray11 tray12 tray13
2024-12-16.12:20:45 zfs set compression=lz4 recordsize=1M xattr=sa atime=off datapool

sgt-mike · December 26, 2024, 2:39am

I’ll mark this solved for right now as I think this is as close as I can get to my objective.