Ubuntu Server NVMe errors in syslog

DarkoD · February 3, 2025, 5:48pm

Ubuntu Version:
Ubuntu Server 24.04.1 LTS

Desktop Environment (if applicable):
none

Problem Description:
Errors in syslog like these:

cat /var/log/syslog | grep 2025-02 | grep abort
2025-02-02T05:59:44.862527+01:00 filesrv kernel: nvme nvme1: I/O tag 198 (50c6) opcode 0x0 (I/O Cmd) QID 4 timeout, aborting req_op:FLUSH(2) size:0
2025-02-02T10:38:12.958510+01:00 filesrv kernel: nvme nvme1: I/O tag 3 (f003) opcode 0x0 (I/O Cmd) QID 1 timeout, aborting req_op:FLUSH(2) size:0
2025-02-02T10:38:45.726477+01:00 filesrv kernel: nvme nvme1: I/O tag 106 (906a) opcode 0x1 (I/O Cmd) QID 3 timeout, aborting req_op:WRITE(1) size:4096
2025-02-03T03:25:49.406470+01:00 filesrv kernel: nvme nvme1: I/O tag 15 (a00f) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:8192
2025-02-03T03:25:49.406488+01:00 filesrv kernel: nvme nvme1: I/O tag 130 (e082) opcode 0x1 (I/O Cmd) QID 3 timeout, aborting req_op:WRITE(1) size:4096
2025-02-03T06:38:39.583476+01:00 filesrv kernel: nvme nvme1: I/O tag 42 (b02a) opcode 0x1 (I/O Cmd) QID 2 timeout, aborting req_op:WRITE(1) size:8192

Relevant System Information:
Supermicro X11SCL-F motherboard, nvme0 is directly on motherboard M.2 slot, nvme1 is on M.2 to PCIe adapter. Even though above log example shows only nvme1, I can see sometimes the same type of error messages for nvme0.

Hello all,
I hope someone can help me figure out what is happening (if anything) with my nvme disks. I recently noticed the error messages posted above but there might have been there since many years ago, I just wasn’t checking too often for this.
I have a snapraid made from 2x nvme disks and 2x HDDs for data, and then 2x more HDDs for parity. Seems to work OK for long time, I don’t even use the server too much. But recently when trying a scrub and also a sync, I noticed the process takes very very long. Upon checking the syslog I saw these messages which I am not sure mean the disks are bad, or the board, or nothing is wrong.
Google search didn’t help very much so i decide to post here. I am long time ubuntu user for my home setup and was very active on ubuntuforums helping people out. But now I need some help with this.
If you need more out let me know.
Thanks in advance.
Darko.

sgt-mike · February 3, 2025, 6:13pm

just seen my question already answer…
Looking at what you listed .

A SSD or even the NVMe has firmware to rebuild the cells if power only is applied.

I seen you have a PCI to Nvme adapter…
do you have a system that a pcie slot is shutoff from data and only supplies power?
I have a Asus board that when a Nvme is installed that data is shut off, and only supplies power. Which I needed to rebuild I would leave 1 nvme in the slot on the motherboard. take the nvme in the adapter move it to that slot for a hour or so while the system is running to rebuild shut down swap the Nvme slots.
now down sides of doing this is you could maybe have data loss.
the other is that a SSD/NVME is only good for so many writes then kaput …
right now the command 's to check write cycle escape me , but I do know that checking the vendors do publish that information for what the drive is rated for in write cycle it may not be your problem

HERE IS A POWER ONLY PCIE ADAPTER
Not pushing that vendor just showing another option to rebuild without a data connection and provide power only. there are many whom sell such products

nasubiq · February 3, 2025, 6:18pm

Query the drive(s) about their (self-reported) health status directly: smartctl --all /dev/nvme0, but if you say things were not as slow before, it could indicate drive failure

DarkoD · February 3, 2025, 6:40pm

Sorry, I didn’t understand what testing is actually suggested. Taking the nvme1 from the adapter and connecting it on the board directly replacing current nvme0? I could do that, although technicaly I would be without one “snapraid member” like that. But snapraid doesn’t care much if it is for short time.
Regarding ssd/nvme TBW (total bytes written), I am very far from the spec. The data is mainly static and the disks are Kingston NV2 2TB. So right now according to smart info I have written little more than 2TB on them, and the spec for TBW is 640TB. You could overwrite the disk 300x times lets say (wonder if it is really true).
I still have 1 year of warranty but I wanted to get some better evidence myself first before taking a shot with Kingston support.

sgt-mike · February 3, 2025, 6:40pm

do what @nasubiq has posted First

DarkoD · February 3, 2025, 6:43pm

sudo smartctl --all /dev/nvme0n1
[sudo] password for darko: 
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-51-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SNV2S2000G
Serial Number:                      50026B76862D9B8C
Firmware Version:                   SBK00104
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 6862d9b8c5
Local Time is:                      Mon Feb  3 19:41:14 2025 CET
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x009f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Verify
Log Page Attributes (0x12):         Cmd_Eff_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.00W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0     200
 2 +     2.50W       -        -    2  2  2  2        0    1000
 3 -     1.50W       -        -    3  3  3  3     5000    5000
 4 -     1.50W       -        -    4  4  4  4    20000   70000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    13,849,819 [7.09 TB]
Data Units Written:                 4,565,752 [2.33 TB]
Host Read Commands:                 37,101,167
Host Write Commands:                21,202,977
Controller Busy Time:               4,731
Power Cycles:                       106
Power On Hours:                     6,615
Unsafe Shutdowns:                   45
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

sudo smartctl --all /dev/nvme1n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-51-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       KINGSTON SNV2S2000G
Serial Number:                      50026B76862D96E0
Firmware Version:                   SBK00104
PCI Vendor/Subsystem ID:            0x2646
IEEE OUI Identifier:                0x0026b7
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            0026b7 6862d96e05
Local Time is:                      Mon Feb  3 19:42:31 2025 CET
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0016):   Format Frmw_DL Self_Test
Optional NVM Commands (0x009f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Verify
Log Page Attributes (0x12):         Cmd_Eff_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     83 Celsius
Critical Comp. Temp. Threshold:     90 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.00W       -        -    0  0  0  0        0       0
 1 +     3.50W       -        -    1  1  1  1        0     200
 2 +     2.50W       -        -    2  2  2  2        0    1000
 3 -     1.50W       -        -    3  3  3  3     5000    5000
 4 -     1.50W       -        -    4  4  4  4    20000   70000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        41 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    13,107,703 [6.71 TB]
Data Units Written:                 4,538,497 [2.32 TB]
Host Read Commands:                 30,811,091
Host Write Commands:                21,089,657
Controller Busy Time:               5,039
Power Cycles:                       106
Power On Hours:                     6,541
Unsafe Shutdowns:                   46
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

DarkoD · February 3, 2025, 6:44pm

Don’t see anything critical there for my eyes. Towards the end both disks show “No Errors Logged”

sgt-mike · February 3, 2025, 6:46pm

I looked here…
on both and that is well under the 640TB threshold. As they /both are close to the same amount of data. written

DarkoD · February 3, 2025, 6:46pm

Correct, I have barely written, fairly static data.
If I decide to RMA that is good, it is below their limit. But SMART info shows nothing critical or wrong that I can see. Not sure RMA would be accepted at all.

sgt-mike · February 3, 2025, 6:53pm

Don’t get me wrong I know how to make the firmware recover cells.
But I’m not sure if that is it.
he trick is to just provide power to the NVMe/SSD without a data call or access for 30min to 2hours. that board I linked to would do that with your existing adapter. The down side is that some whom have used that method have lost data so if you attempt it I would back up the data. I’ve done it with several SSD/NVMe and never had a issue…
hmm I’ll go back and re-read in case I missed something.

DarkoD · February 3, 2025, 6:53pm

I just saw your edit above. I think there is misunderstanding about the PCIe adapter. I have one of these (no intention to market it, just show the picture):
https://www.amazon.es/gp/product/B07FN3YZ8P/ref=ppx_yo_dt_b_search_asin_title?ie=UTF8&psc=1
So that I can use the M.2 disk in the main PCIe x16 slot (electrical x8 which should be good because the card is x4). No power connections on it, or additional power needed.

sgt-mike · February 3, 2025, 6:54pm

and NO data calls as it’s not connected to the bus

DarkoD · February 3, 2025, 6:56pm

I think I got your suggestion now. Basically providing only power to the M.2 disk so that it does its thing isolated from the board sort of.

sgt-mike · February 3, 2025, 6:58pm

Exactly for 30min to 2 hours with power only and not able to do data calls. usually I’ll plug it in for at least a hour or 2 come back disconnect it re-install life is good… but backup your data like I said I haven’t lost any but it don’t it won’t happen…
and it is completely harmless to the drive.

I have use adapter like this HERE and just plug in the sata power side works well too

DarkoD · February 3, 2025, 7:04pm

OK, I will think about it. Meanwhile, does anyone know of any testing I could do, something more specific to nvme testing? Not just a basic data write with dd or similar.
I think those timeout errors show more when the disks are under more use. Recently I don’t use the server much that is why I only saw them when trying a snapraid scrub and started to investigate why is it so slow.
And are they really errors or because the nvme is the fastest component it simply timeout at times??
And few weeks ago I noticed the nvme0 dropped from my mdadm (where the OS is) but a simple mdadm --add fixed it. So I can’t quite figure out how all this is related (or not).

sgt-mike · February 3, 2025, 7:07pm

most of my knowledge is ZFS, now back when I used Madam I use only spinning rust.

And Like you I’m not 100% sure what I proposed is actually needed… it might be and then again not.
Only shared what I know about getting the firmware to rebuild the cells which may not be the problem but is 1 less possibility
i did a bit of googling this might help might not
BTW nice hardware on your system
I’ll bow out and hopefully someone else with better grasp steps in … I do wish you the best on this.
P.S. I did a bit of google search as I did recall someone in the past mentioning trimming NVMe’s
maybe the links will help
trim operation
enable trim in linux

DarkoD · February 3, 2025, 7:38pm

Thanks anyway. I will continue googling too. I love ZFS but in my home setup it is definite overkill, plus I can’t afford to replace to all identical drives so it was a no go. My current setup is still very strange because I am still in transition and using the old HDDs which still work plus few new NVMes. Don’t have the heart to throw away still working drives.
Hence I went with snapraid because it allows to mix technologies and drives sizes, more importantly.

sgt-mike · February 3, 2025, 7:38pm

after the google fu session I did a bit of research on conducting a trim on NVMe drives…
so I did a dry run first

mike@bastion:~$ sudo /usr/sbin/fstrim --fstab --verbose --dry-run
/boot/efi: 0 B (dry run) trimmed on /dev/disk/by-uuid/8FF2-678F
/: 0 B (dry run) trimmed on /dev/disk/by-uuid/eaec50bd-1b94-447e-9894-8c2c07edd587

looks successful now to try a live session

mike@bastion:~$ sudo /usr/sbin/fstrim --fstab --verbose
/boot/efi: 1 GiB (1118564352 bytes) trimmed on /dev/disk/by-uuid/8FF2-678F
/: 932.6 MiB (977952768 bytes) trimmed on /dev/disk/by-uuid/eaec50bd-1b94-447e-9894-8c2c07edd587

then I tried all the NVMe partitions etc

sudo /usr/sbin/fstrim --all --verbose
/boot/efi: 1 GiB (1118564352 bytes) trimmed on /dev/nvme0n1p1
/: 726.4 MiB (761634816 bytes) trimmed on /dev/nvme0n1p2

after running the above on my desktop it actually made a difference in the loading after I rebooted

source article

DarkoD · February 3, 2025, 8:09pm

This looks like a similar situation, no errors in SMART and same kernel messages for nvme. But the “solution” doesn’t seem convincing. Is the power state the culprit…?
https://community.frame.work/t/nvme-timeout-woes/54999

I will continue to read on it. Also I found many references for these kernel messages and Proxmox but I am running native Ubuntu Server and don’t think Proxmox is very relevant.

sgt-mike · February 3, 2025, 8:12pm

I tried the trim command on my Ubuntu Desktop (a NVMe drive) see above
after posting above I looked into the power setting which was mention in raid setup as well as write through setting
" Reasons to avoid write caching with NVMe RAID:

Redundancy overhead:

Since RAID already provides data redundancy, the additional write caching layer can introduce unnecessary complexity and potential performance bottlenecks.

Reduced write throughput:

With high-performance NVMe SSDs, the write cache may not be able to keep up with the write requests, leading to slower overall write speeds."

" To optimize NVMe power state settings for write-intensive RAID on Linux, focus on configuring the drive to prioritize performance over power saving by setting the “active power state” to a higher level, potentially disabling autonomous power state transitions, and ensuring your RAID software is optimized for NVMe performance; consult your specific NVMe drive documentation for the best settings and adjust based on your workload demands.

Key points to consider:

Active Power State:

This setting directly controls the power consumption of the NVMe drive, with higher levels allowing for greater sustained write performance.

Autonomous Power State Transition (APST):

While useful for power saving in idle situations, disabling APST can ensure the drive stays in a high-performance state during heavy write operations.

NVMe-CLI tool:

Use the NVMe command line interface to access and modify power state settings on your NVMe drives.

How to adjust power settings:

Check Drive Specifications:

Consult your NVMe drive manufacturer’s documentation to understand the available power states and their performance implications.

Use NVMe-CLI to set power state:
- Command: nvme <device_path> set_features <feature_set>
- Example: nvme /dev/nvme0n1 set_features 0x01 (sets the active power state to a higher level)
Disabling APST (if necessary):
- Kernel parameter: Add nvme_core.default_ps_max_latency_us=0 to your kernel boot parameters

Important factors to consider:

RAID Level:

Different RAID levels (e.g., RAID 0, RAID 1, RAID 5) have varying performance characteristics, so optimize power settings accordingly.

Workload:

Adjust power settings based on your specific write-intensive workload, balancing performance needs with power consumption.

Thermal Management:

Monitor drive temperatures and adjust power settings if necessary to prevent overheating."