UPDATE:
Reading various recommendations for these kernel messages and also some threads on the Linux Mint forum and wiki, I added the following parameters to grub:
“nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off”
That looked to help at first, the kernel messages were gone from syslog, but only until I started to put more load on the disks. At that time the messages showed again, many of them.
I am still confused where exactly the problem is, or if it is a problem at all (it seems to be since any significant load on the disks slows them down and they should be the fastest components in the system). But smartctl shows no errors, and also ‘nvme smart-log’ shows no errors. So if I really needed to prove to any Kingston technician that these disks are broken, I really don’t know which argument would I have.
In any case, noticing that I bought them on Amazon DE I decided to give them a try first. To my very positive surprise, and even though they were bought 2 years ago, Amazon DE offered me to send them back for a full refund. So that is what I will do. That will be the first phase.
As it turns out, and with the help of a smaller spare disk I have around, I can just about fit all the data onto the remaining disks, so that I could take these two out and send them without buying anything new first.
The thing that bugs me when deciding what to buy is the following. From price perspective and getting more for your money, the choice would be NVMe since for same TB they are basically similar price to SATA SSD. And their performance is much higher and is something good to have for the future.
But from practical perspective, even a SATA SSD is much faster than what my home server can use. No client will ever pull nowhere near the SATA SSD speed of 500-550MB/s especially over 1Gb network. So the NVMe is extreme overkill taking that into account. The SATA technology is proven and long time around, and I expect much higher compatibility of such SSDs in my server. Plus I have much more 2.5" and SATA slots available on the board than M.2. And SATA is much easier to expand. The NVMe is very limited because each disk needs x4 PCIe lanes.
Was I just unlucky with these Kingston NVMe? Would other brand might have acted differently? Maybe…
I will have to think a lot about the next buy. If anyone has any ideas, I would be happy to hear them.
UPDATE:
I consider my issue resolved but I didn’t understand most of it so I decided to put my conclusions here in case someone arrives to this thread in the future.
The kernel error messages in syslog I noticed during copy operation from NVMe to SATA HDD. I was mostly focusing on those errors and trying to figure out if some NVMe disks are failed or failing. What I didn’t notice until some days later, is that the CPU load was very high during those copy operations. I know that copy from very fast NVMe to much slower rotational HDD will be limited by the HDD speed and not very high. But it seems that it will not only impact the speed, but also the CPU load.
In my case my home server is not very used so during my last HW upgrade I decided to go with fairly low level CPU, the Intel i3-9100F. It has only 4 cores, no HT, so in total 4 cores/threads. If I understand correctly how linux specifies CPU load, a load of 4.0 would mean 4 cores working at the maximum of 100%. Well, in my case during the data copy I was noticing CPU load as high as 20.0. That would mean 20 cores (which I don’t have, not even near) working at 100%. No wonder the server and the copy operation were coming to a stand still. Maybe this is a result of the NVMe using the PCIe bus directly. I don’t know how actually that technology works or why it would create so high CPU load.
THE FINAL OUTCOME:
After Amazon accepted to refund me for the NVMe purchases, I started to copy the data off the NVMe disks so that I could return them. Little by little, I managed to do that, so no data was lost as far as I know. Now I have only few smaller spare SSDs of 250GB and 500GB, plus the four HDDs (3x 3TB, 1x 4TB). Actually the disk operations now are somewhat better without the NVMe.
My assumption that the NVMes themselves are not broken I think is correct. I never could find any errors on them, and SMART info was OK. But it seems the combination in my system was somewhat incompatible, so the fastest disk component was actually making issues when it shouldn’t have.
For now I just about managed to distribute the data on the remaining disks, and I will think about next purchases in the future, without rushing.
Just a thought for you on the i3-9100F lack of cores and threads.
Which I just updated a board and processor on a backup server.
maybe a supermicro (maybe X10 gen using the LGA 2011 v3 or aka LGA 2011 R3) or even other vendor with a newer xeon ( I went single processor because of role, X9 generation of Supermicro’s board using the LGA 2011 socket but if I was using it in a NFS or as a NAS I think I would go dual in Xeon).
Going with Xeon also affords one the ability to use ECC Rdimms which honestly I didn’t realize how much that helped until I use a board with Rdimms. The ability to error correct before a write yeah a definite plus
In my NFS I went with a X-99 board using a single i-7 5930K 6 core 12 threads in a LGA 2011 v3 which the system literally see the 12 threads and utilizes them
(which I can upgrade easily with Xeon’s E5-2699 v3 18 core 36 thread or the v4 for 22 core 44 threads, or even the Intel Extreme edition i7 - 6950X giving me a 10 core / 20 thread count).
That being said, if I needed more processing a dual Xeon would be my preferred method.
The thing with my server is that I never really needed much CPU power. And I prefer having a CPU with lower TDP (heat). Although newer CPUs since some years ago do clock/heat downsizing etc when not in high use. Also I do not have access to some extensive refurbished HW here in Spain so it limits you a bit to get deals too.
In this particular case it seems the NVMe was loaing the CPU and I am not sure if it always does that, or it just happens with my system specs. From reading quickly on google it says using NVMe actually should lower CPU load, not the other way around.
In any case, I probably jumped the gun with NVMe and by sticking to “good old” SATA SSD things should be much better.
PS. The i3-9100F does support ECC and I have 2x 8GB ECC modules.