Problem Description:
I had a 4TB WD RED (WDC WD40EFRX-68N) used for Plex media which started playing up but the drive was around 8 years old. The drive would dissapear but reappear after a reboot, but eventually it stopped working completely.
So I replaced it with a 6TB (WDC WD60EFPX-68C) drive 6 months ago. However, this drive is starting to behave in the same way, it would dissapear but then reappear after a reboot. Until the other day when even a reboot would not fix the problem
So I removed the drive from the server and put it in a USB enclsoure and it worked correctly. I then put the drive back in the server but changed the SATA connection to one on the mother board (it was oriignally connected to a SATA card Startech PEXSAT32).
After a few days it dissapeared again, but reappeared after a reboot.
I have another 5 drives in the server and not had a problem with them and they are around 8 years old too. It just seems to be this drive I use for Plex (I have a total of two drives for Plex) that which seems to be jinxed.
How can I diagnose and fix the problem ?
If the drive is faulty I would like to return it before the warranty expires.
Relevant System Information:
The server is a Dell Power Edge T30
Open the Disks utility. Select a drive and click the hamburger icon (three horizontal lines) and see if it has an option for SMART data and self tests.
As a test, to try to pinpoint the issue, would you be willing to swap the “problematic” drive with one of the “good” drives, to see if the problem follows the drive? … or is the problem one having to do with motherboard
I have performed a “Check Filesystem” using Disks and the result was that the filesystem was intact.
When the drive dissapears, it does not show up in Disks at all. There was only one time when it dissapeared and showed up in Disks but was not mounted, which was fixed by mounting it.
But usually when the drive dissapears it does not even show up in Disks
Im currently running a self test (Extended) as you suggested
The suspect drive Plex 2 was originally connected to SATA 4 (on the PEXSAT32 card), but I swapped it (with drive “General”) to use SATA 0 on the mother board.
There are several failed command errors, that means the disk controller in the disk fails to understand how to do seek and perform other functions. In short, the disk controller is bad.
My suggestion: return the hdd since it is under warranty.
PSUs degrade over time. It’s plausible. And 290W is on the very low end to start with. Imagine it was giving out the bare minimum for all the cards and drives you had initially. Just replacing a drive with a slightly more demanding one can trigger the problem.
A Thought From the Bath: A way of possibly testing this is to disconnect some of the drives completely or have a script at start up that delays the spin-up of some of the drives so that there is not so much demand on your power supply at initial boot.
The OP does say they come online before dropping off.
Maybe monitoring before and after a “drop-off” event would reveal specific Voltage “brown-outs” triggering a built-in failsafe to “peel-off” drives one-by-one until regaining power “sufficiency”?
Out of curiosity, would there be any way of specifying, via systemd, that the drives should power up sequentially, and in a specified order, in order to avoid the power spike that is being encountered?
Or is that a capability (currently lacking?) which would need to be built into BIOS/UEFI configuration code?