Identifying and replacing a disk in a zfs pool

sudo sg_format /dev/sdf
    HGST      HUS726060AL4210   AAG0   peripheral_type: disk [0x0]
      << supports protection information>>
      Unit serial number:         K8K6ZT5N
      LU name: 5000cca271b5d120
Mode Sense (block descriptor) data, prior to changes:
  Number of blocks=1465130646 [0x57541e96]
  Block size=4096 [0x1000]
Read Capacity (10) results:
   Number of logical blocks=1465130646
   Logical block size=4096 bytes
No changes made. To format use '--format'. To resize use '--resize'

BINGO !!! what I suspected a 4kn drive…
give me a min and let me review some of my notes to MAKE SURE of the next command

1 Like

ok? whats up with that ?

changes the command in the size for sg_format… 4kn is compatible with 512 sector bit systems but takes a terrible hit when formatted to 512

1 Like

OK this will take a long time… maybe two days do you what to proceed?
because we are gonna wipe every thing on that drive and I do mean everything and it ill actually rebuild the grown defect list and the factory grown defect list

Its fine by me. I can do that.

OK…
here we go…

sg_format --format --size=4096 -v /dev/sdf

the screen will ask you if you want to proceed and it will delay the format to give you time to reconsider this is normal and written into the command as a failsafe in the event you entered the command in error… all normal
while it is formatting do nothing else to that drive noting no hitting control-c …nothing let it run on it’s own in a terminal window naturally you will watch it… if it errors don’t shutdown the system ping me
of course do let me know the estimated time of completion so I can be sure to be on line as we will do two more command to get it ready for future use

Can I safely disconnect aka exit the ssh connection?

yes… but you will lose the ability to see when it completes
you can minimize the terminal window

Thats OK. It is probably listed in running processes anyway…

how long till it’s done… as your drive are larger than mine I really don’t know how long it will take I know my 4TB is roughly a little under 24 hours

Its a 6TB drive. So a bit larger…

THIS PART IS AFTER EVERYTHING WITH FORMAT IS COMPLETED

The next thing I was going to have you do is to do fdisk to set the partition table to GPT
so don’t reboot when completed…

sudo fdisk /dev/sdf

then when it come up enter g press enter
then enter w and then press enter
which will then exit
next to check smartctrl

smartctl -H dev/sdf

You should see the errors gone…

at this point if the drive while not formatted for a filesystem is ready to be used as a hot or cold spare … or other uses it is basically the same as a new drive when shipped from the manufacturer.

1 Like

This is in a 10 minute timeframe:

Format unit has started
Format unit in progress, 0.17% done
Format unit in progress, 0.41% done
Format unit in progress, 0.64% done
Format unit in progress, 0.86% done
Format unit in progress, 1.09% done
Format unit in progress, 1.32% done
Format unit in progress, 1.54% done
Format unit in progress, 1.76% done
Format unit in progress, 1.99% done

So around 1-2 % per 10 minutes.

about right…
just as a side bar question what is your SAS controller?
I’m using a LISI 9300-16i which is really overkill but will handle anything I throw at it

OHHH I almost forgot when you run the smartctl command look at the Grown defect line see if it went up to 1 or remained at 0. if it goes to 1 really no biggy as long as you monitor it… I have drive with 3 and don’t give issues yet but I do watch then if the go up by 2 defect is a 2 day window.
That drive is promptly replaced…

I do have another thought on your pool which is the ashift value
because your drives are 4Kn the ashift should be 12 (4Kn , 4096 sector), but can run on 9 (512sector)
to check he ashift

zdb -C | grep ashift

if it comes back 0 means autodetect
comes back 9 = 512 sector
comes back 12 = 4096 or 4Kn sector size
which for your drive 12 should be the prime settings …0 is ok too, 9 it will work and function just not run as fast or well as 12…
this value can only be changed if the pool is destroyed and recreated … would I change it probably not… it will run fine on 9

I don’t remember, but it will take around 17 hours.

see above I was editing a post

zdb -C | grep ashift
ashift: 12
ashift: 12
haaken@media:~$

17 hours for a 6tb yepp pretty good speed

my 4Kn drive outrun the daylight out of my 512 sector drive…
There is two ways to know if your drive is 4Kn one I showed you with sg_format… the other look at the label ā€œUsuallyā€ you will see AF or a reference to 4Kn

HEY great ashift=12

Now in the future if you need to do another low level format of that drive (or the others) you don’t have to use the --size=4096 flag with sg_format (one can use --size=4k as well it basically same command) which can be omitted.

I had you add that flag for one reason only. Just in case someone formatted it with --size=512 which it will accept the command.

(only for others onlookers if YOUR using a SATA DON"T DO THIS SG_FORMAT. This process only works with SAS /SCSI … this will destroy a SATA drive a majority of the time. For SATA you can use it to see the drive geometry but don’t use the --format flag)

1 Like

My controller is a Broadcom / LSI SAS2308 PCI-Express Fusion-MPT SAS-2.
It’s a old Dell R720 server.