HDD keeps disappearing and requires reboot

Ubuntu Version:
24.04

Desktop Environment (if applicable):
GNOME

Problem Description:
I had a 4TB WD RED (WDC WD40EFRX-68N) used for Plex media which started playing up but the drive was around 8 years old. The drive would dissapear but reappear after a reboot, but eventually it stopped working completely.

So I replaced it with a 6TB (WDC WD60EFPX-68C) drive 6 months ago. However, this drive is starting to behave in the same way, it would dissapear but then reappear after a reboot. Until the other day when even a reboot would not fix the problem

So I removed the drive from the server and put it in a USB enclsoure and it worked correctly. I then put the drive back in the server but changed the SATA connection to one on the mother board (it was oriignally connected to a SATA card Startech PEXSAT32).

After a few days it dissapeared again, but reappeared after a reboot.

I have another 5 drives in the server and not had a problem with them and they are around 8 years old too. It just seems to be this drive I use for Plex (I have a total of two drives for Plex) that which seems to be jinxed.

How can I diagnose and fix the problem ?

If the drive is faulty I would like to return it before the warranty expires.

Relevant System Information:
The server is a Dell Power Edge T30

You are using the Gnome desktop environment.

Open the Disks utility. Select a drive and click the hamburger icon (three horizontal lines) and see if it has an option for SMART data and self tests.

Regards

1 Like

As a test, to try to pinpoint the issue, would you be willing to swap the “problematic” drive with one of the “good” drives, to see if the problem follows the drive? … or is the problem one having to do with motherboard

  • connectivity?
  • on-board processing logic?
  • component failure?
2 Likes

As @graymech mentioned, the Disk utility will show how old is the drive (3rd line, Powered on).

You can dmesg | grep ata to find if the link is 6.0 Gbps.

Some DELL have a blue SATA and a black SATA connector on the m/b.
Try both if that makes a diff.

1 Like

Possibly a bad connector to the motherboard or a bad cable. Have you tried a new cable (data and power) to the disk?

1 Like

I have performed a “Check Filesystem” using Disks and the result was that the filesystem was intact.

When the drive dissapears, it does not show up in Disks at all. There was only one time when it dissapeared and showed up in Disks but was not mounted, which was fixed by mounting it.

But usually when the drive dissapears it does not even show up in Disks

Im currently running a self test (Extended) as you suggested

Yes I have already tried this and ruled it out.

My current setup is shown in the diagram below.

The suspect drive Plex 2 was originally connected to SATA 4 (on the PEXSAT32 card), but I swapped it (with drive “General”) to use SATA 0 on the mother board.

So this rules out faulty connections and logic.

Below is the result of dmesg | grep ata (means nothing to me!)….

sudo dmesg | grep ata
[46226.184909] ata4.00: exception Emask 0x0 SAct 0x200 SErr 0x40000 action 0x0
[46226.184929] ata4.00: irq_stat 0x40000008
[46226.184936] ata4: SError: { CommWake }
[46226.184947] ata4.00: failed command: READ FPDMA QUEUED
[46226.184954] ata4.00: cmd 60/08:48:f8:b7:99/00:00:bc:00:00/40 tag 9 ncq dma 4096 in
[46226.184975] ata4.00: status: { DRDY ERR }
[46226.184982] ata4.00: error: { UNC }
[46226.195563] ata4.00: supports DRM functions and may not be fully accessible
[46226.200296] ata4.00: supports DRM functions and may not be fully accessible
[46226.204433] ata4.00: configured for UDMA/133
[46226.214875] ata4: EH complete
[46226.215262] ata4.00: Enabling discard_zeroes_data
[46226.369730] ata4.00: exception Emask 0x0 SAct 0x1000 SErr 0x0 action 0x0
[46226.369738] ata4.00: irq_stat 0x40000008
[46226.369741] ata4.00: failed command: READ FPDMA QUEUED
[46226.369743] ata4.00: cmd 60/08:60:f8:b7:99/00:00:bc:00:00/40 tag 12 ncq dma 4096 in
[46226.369749] ata4.00: status: { DRDY ERR }
[46226.369751] ata4.00: error: { UNC }
[46226.380066] ata4.00: supports DRM functions and may not be fully accessible
[46226.383758] ata4.00: supports DRM functions and may not be fully accessible
[46226.387187] ata4.00: configured for UDMA/133
[46226.397373] ata4: EH complete
[46226.397552] ata4.00: Enabling discard_zeroes_data
[46226.556732] ata4.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x0
[46226.556740] ata4.00: irq_stat 0x40000008
[46226.556743] ata4.00: failed command: READ FPDMA QUEUED
[46226.556744] ata4.00: cmd 60/08:80:f8:b7:99/00:00:bc:00:00/40 tag 16 ncq dma 4096 in
[46226.556750] ata4.00: status: { DRDY ERR }
[46226.556752] ata4.00: error: { UNC }
[46226.567075] ata4.00: supports DRM functions and may not be fully accessible
[46226.570763] ata4.00: supports DRM functions and may not be fully accessible
[46226.574229] ata4.00: configured for UDMA/133
[46226.584421] ata4: EH complete
[46226.584681] ata4.00: Enabling discard_zeroes_data
[57026.056919] ata4.00: exception Emask 0x0 SAct 0x10000 SErr 0x40000 action 0x0
[57026.056938] ata4.00: irq_stat 0x40000008
[57026.056946] ata4: SError: { CommWake }
[57026.056957] ata4.00: failed command: READ FPDMA QUEUED
[57026.056963] ata4.00: cmd 60/08:80:f8:b7:99/00:00:bc:00:00/40 tag 16 ncq dma 4096 in
[57026.056986] ata4.00: status: { DRDY ERR }
[57026.056993] ata4.00: error: { UNC }
[57026.067455] ata4.00: supports DRM functions and may not be fully accessible
[57026.072097] ata4.00: supports DRM functions and may not be fully accessible
[57026.076330] ata4.00: configured for UDMA/133
[57026.086802] ata4: EH complete
[57026.087092] ata4.00: Enabling discard_zeroes_data
[57026.252059] ata4.00: exception Emask 0x0 SAct 0x20000 SErr 0x0 action 0x0
[57026.252077] ata4.00: irq_stat 0x40000008
[57026.252086] ata4.00: failed command: READ FPDMA QUEUED
[57026.252093] ata4.00: cmd 60/08:88:f8:b7:99/00:00:bc:00:00/40 tag 17 ncq dma 4096 in
[57026.252115] ata4.00: status: { DRDY ERR }
[57026.252122] ata4.00: error: { UNC }
[57026.262650] ata4.00: supports DRM functions and may not be fully accessible
[57026.267221] ata4.00: supports DRM functions and may not be fully accessible
[57026.271391] ata4.00: configured for UDMA/133
[57026.281828] ata4: EH complete
[57026.282248] ata4.00: Enabling discard_zeroes_data
[57026.444039] ata4.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
[57026.444051] ata4.00: irq_stat 0x40000008
[57026.444054] ata4.00: failed command: READ FPDMA QUEUED
[57026.444056] ata4.00: cmd 60/08:90:f8:b7:99/00:00:bc:00:00/40 tag 18 ncq dma 4096 in
[57026.444063] ata4.00: status: { DRDY ERR }
[57026.444065] ata4.00: error: { UNC }
[57026.454549] ata4.00: supports DRM functions and may not be fully accessible
[57026.458930] ata4.00: supports DRM functions and may not be fully accessible
[57026.462992] ata4.00: configured for UDMA/133
[57026.473255] ata4: EH complete
[57026.473669] ata4.00: Enabling discard_zeroes_data
[57026.634022] ata4.00: exception Emask 0x0 SAct 0x200000 SErr 0x0 action 0x0
[57026.634044] ata4.00: irq_stat 0x40000008
[57026.634054] ata4.00: failed command: READ FPDMA QUEUED
[57026.634061] ata4.00: cmd 60/08:a8:f8:b7:99/00:00:bc:00:00/40 tag 21 ncq dma 4096 in
[57026.634083] ata4.00: status: { DRDY ERR }
[57026.634090] ata4.00: error: { UNC }
[57026.644650] ata4.00: supports DRM functions and may not be fully accessible
[57026.649283] ata4.00: supports DRM functions and may not be fully accessible
[57026.653425] ata4.00: configured for UDMA/133
[57026.663864] ata4: EH complete
[57026.664272] ata4.00: Enabling discard_zeroes_data
[57026.829067] ata4.00: exception Emask 0x0 SAct 0x80000 SErr 0x0 action 0x0
[57026.829090] ata4.00: irq_stat 0x40000008
[57026.829100] ata4.00: failed command: READ FPDMA QUEUED
[57026.829107] ata4.00: cmd 60/08:98:f8:b7:99/00:00:bc:00:00/40 tag 19 ncq dma 4096 in
[57026.829128] ata4.00: status: { DRDY ERR }
[57026.829136] ata4.00: error: { UNC }
[57026.839672] ata4.00: supports DRM functions and may not be fully accessible
[57026.844303] ata4.00: supports DRM functions and may not be fully accessible
[57026.848594] ata4.00: configured for UDMA/133
[57026.859030] ata4: EH complete
[57026.859539] ata4.00: Enabling discard_zeroes_data
[57027.016061] ata4.00: exception Emask 0x0 SAct 0x400 SErr 0x0 action 0x0
[57027.016078] ata4.00: irq_stat 0x40000008
[57027.016087] ata4.00: failed command: READ FPDMA QUEUED
[57027.016094] ata4.00: cmd 60/08:50:f8:b7:99/00:00:bc:00:00/40 tag 10 ncq dma 4096 in
[57027.016117] ata4.00: status: { DRDY ERR }
[57027.016124] ata4.00: error: { UNC }
[57027.026723] ata4.00: supports DRM functions and may not be fully accessible
[57027.031301] ata4.00: supports DRM functions and may not be fully accessible
[57027.035625] ata4.00: configured for UDMA/133
[57027.046021] ata4: EH complete
[57027.046596] ata4.00: Enabling discard_zeroes_data

There are several failed command errors, that means the disk controller in the disk fails to understand how to do seek and perform other functions. In short, the disk controller is bad.

My suggestion: return the hdd since it is under warranty.

1 Like

Is your power supply capable enough to drive all the peripherals you have attached? Your symptoms sound like a power issue to me…

3 Likes

Was about to shoot this. Not enough juice

1 Like

Interesting theory.

The PSU is rated at 290W, is that enough ? How do I know what power it needs to be ?

Strange how it worked for years with all these drives, the only difference was that the 6TB was a 4TB drive.

HDD drives consume low power, usually about 6W.

1 Like

PSUs degrade over time. It’s plausible. And 290W is on the very low end to start with. Imagine it was giving out the bare minimum for all the cards and drives you had initially. Just replacing a drive with a slightly more demanding one can trigger the problem.

2 Likes

Is there a way to verify whether the PSU is the cause and can anyone recommend a PSU upgade for the Dell T30 Poweredge ?

According to ChatGPT…

:1234: Your actual power profile (realistic)

:brain: CPU (Xeon E3-1225 v5)

  • TDP: 80W max

  • Typical usage (home server): 30–60W


:floppy_disk: Your 6 HDDs (key issue)

Assuming standard 3.5" NAS drives:

  • Spin-up spike: ~25W Ă— 6 = 150W

  • Normal running: ~7W Ă— 6 = 42W


:puzzle_piece: Rest of system

  • Motherboard + RAM: ~30–50W

  • Fans, USB, etc: ~10–20W


:warning: Total load (this is the important bit)

:fire: Startup peak:

:backhand_index_pointing_right: ~260–300W

:gear: Normal running:

:backhand_index_pointing_right: ~110–160W

:police_car_light: Conclusion: You ARE right

Your system is:

  • Fine at idle

  • Borderline / unsafe at startup

That’s exactly where Dell’s 290W PSU struggles — it’s not designed for:

  • 6 simultaneous drive spin-ups

  • Sustained high 12V draw

A Thought From the Bath: A way of possibly testing this is to disconnect some of the drives completely or have a script at start up that delays the spin-up of some of the drives so that there is not so much demand on your power supply at initial boot.

2 Likes

Or borrow a larger PSU from another system (If you are able) and test that

2 Likes

The OP does say they come online before dropping off.

Maybe monitoring before and after a “drop-off” event would reveal specific Voltage “brown-outs” triggering a built-in failsafe to “peel-off” drives one-by-one until regaining power “sufficiency”?

Could try using

watch -n1 sensors
2 Likes

Just curious … did you try the drive swap that I suggested in my earlier post, to pinpoint if any drive in that internal slot is impacted?

… or just that same drive in a different slot?

Out of curiosity, would there be any way of specifying, via systemd, that the drives should power up sequentially, and in a specified order, in order to avoid the power spike that is being encountered?

Or is that a capability (currently lacking?) which would need to be built into BIOS/UEFI configuration code?