ZFS drive unavailable

Hi,- it appears that one of my drives has gone bad.
From what I can see it is K8K6ZT5N /dev/sdh

Some information:

zpool status -x Mypool
  pool: Mypool
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
	invalid.  Sufficient replicas exist for the pool to continue
	functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
  scan: scrub repaired 0B in 09:28:48 with 0 errors on Sun Feb  9 09:52:50 2025
config:

	NAME                                      STATE     READ WRITE CKSUM
	Mypool                                    DEGRADED     0     0     0
	  raidz2-0                                DEGRADED     0     0     0
	    c6febe17-7fd8-43ec-9bdb-1891a17eeac6  ONLINE       0     0     0
	    e3fdfc6d-41ad-489d-be3f-15e16c9220d9  ONLINE       0     0     0
	    9142784188148639209                   UNAVAIL      0     0     0  was /dev/disk/by-partuuid/0a447919-6db7-4182-8aee-15ee39a90ce4
	    59364659-836d-46c3-a12a-948fc26f2fc3  ONLINE       0     0     0

errors: No known data errors




K8K7P8PN  OK
K8K7NZTN  OK
K8K7PD9N  OK
K8K7T51N  OK,- but possibly not used
K8K6ZT5N  failure, bad drive


disk  sdg                       Mypool                    5.5T HUS726060AL4210            K8K7T51N             0x5000cca271b73f40 12447916164639100982
disk  sdh                                                 5.5T HUS726060AL4210            K8K6ZT5N             0x5000cca271b5d120 


sudo smartctl -a /dev/sdh
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.0-19-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS726060AL4210
Revision:             AAG0
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca271b5d120
Serial number:        K8K6ZT5N
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Sun Mar 16 10:47:15 2025 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     29 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 25652:52
Manufactured in week 14 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  24
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  6849
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 23457345250000896

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0      288         0       288    1319011     380526.948           0
write:         0       11         0        11    1086528      40832.172           0
verify:        0        0         0         0     335732          0.000           0

Non-medium error count:        0

  Pending defect count:0 Pending Defects
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   24685                 - [-   -    -]

Long (extended) Self-test duration: 53828 seconds [15.0 hours]


sudo smartctl -a /dev/sdg
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.0-19-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HGST
Product:              HUS726060AL4210
Revision:             AAG0
Compliance:           SPC-4
User Capacity:        6,001,175,126,016 bytes [6.00 TB]
Logical block size:   4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca271b73f40
Serial number:        K8K7T51N
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Sun Mar 16 10:48:12 2025 UTC
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     29 C
Drive Trip Temperature:        85 C

Accumulated power on time, hours:minutes 25438:59
Manufactured in week 14 of year 2019
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  25
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1272
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 1544130903670784

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0      76018      25582.011           0
write:         0        0         0         0      28745      12518.477           0
verify:        0        0         0         0      35386          0.000           0

Non-medium error count:        0

  Pending defect count:0 Pending Defects
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     491                 - [-   -    -]
# 2  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self-test duration: 58876 seconds [16.4 hours]

I think that it is possible to replace the bad or unavailable one by using /dev/sdg , but Im not sure how I do that.

Please give this a try, If the device is still present but the label is corrupted, you can try to bring the device offline and then back online to see if it resolves the issue:

sudo zpool offline poolname deviceid

Now bring the device back online:

sudo zpool online poolname deviceid

If the device/disk/pool does not come back online, you will need to replace it with a new device:

sudo zpool replace poolname deviceid /dev/newdevice

After replacing the device, you may need to run a scrub to ensure data integrity:

sudo zpool scrub poolname
1 Like

That device has died (
K8K6ZT5N)

This replaced it:

sudo zpool offline Mypool 9142784188148639209 
sudo zpool online Mypool 9142784188148639209 
warning: device '9142784188148639209' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
sudo zpool replace Mypool 9142784188148639209  /dev/sdh



zpool status
  pool: Mypool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Mar 17 09:36:38 2025
	1.79T / 17.5T scanned at 29.6G/s, 0B / 17.5T issued
	0B resilvered, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	Mypool                                    DEGRADED     0     0     0
	  raidz2-0                                DEGRADED     0     0     0
	    c6febe17-7fd8-43ec-9bdb-1891a17eeac6  ONLINE       0     0     0
	    e3fdfc6d-41ad-489d-be3f-15e16c9220d9  ONLINE       0     0     0
	    replacing-2                           DEGRADED     0     0     0
	      9142784188148639209                 OFFLINE      0     0     0  was /dev/disk/by-partuuid/0a447919-6db7-4182-8aee-15ee39a90ce4
	      sdh                                 ONLINE       0     0     0
	    59364659-836d-46c3-a12a-948fc26f2fc3  ONLINE       0     0     0

errors: No known data errors

  pool: media
 state: ONLINE
  scan: scrub repaired 0B in 02:32:13 with 0 errors on Sun Mar  9 02:56:14 2025
config:

	NAME                                    STATE     READ WRITE CKSUM
	media                                   ONLINE       0     0     0
	  7e685125-5818-4e5c-9f20-b6a3a87dcacc  ONLINE       0     0     0

errors: No known data errors

Good to see a solution on this as I was engagaged for a week doing some pentesting on my servers.

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.