Fine-Tuning the Ubuntu 24.04 Kernel for low latency, throughput, and power efficiency

Overview

One of the most significant changes in the new Ubuntu 24.04 generic kernel is the availability of additional low-latency tunable settings (Bug #2051342 “Enable lowlatency settings in the generic kernel” : Bugs : linux package : Ubuntu).

These options enable dynamic adjustment of the generic Ubuntu kernel to suit different performance profiles, using specific boot-time or run-time parameters, eliminating the need for kernel recompilation.

NOTE: as mentioned in a previous post (Enable low latency features in the generic Ubuntu kernel for 24.04) we are still going to maintain the low-latency kernel in 24.04, but the long-term plan is to deprecate this particular kernel flavor (providing the opportune transitionals for a smooth migration), considering that the generic kernel can be configured to act exactly like a low-latency kernel.

Kernel options

The new relevant kernel options can be be classified in the following categories:

  • kernel preemption model

  • kernel noise

  • CPU wake-up events

Kernel preemption model

The kernel provides the following preemption models, that can be changed at boot-time by passing the following options (or even at run-time via /sys/kernel/debug/sched/preempt):

  • preempt=full: fully preemptible kernel

  • preempt=voluntary: kernel can be interrupted only at specific points

  • preempt=none: kernel is never interrupted

A fully-preemptible kernel is most suitable for low-latency workloads - such as gaming, live-streaming, multimedia, etc. - since high-priority tasks have more chances to interrupt low-priority tasks and acquire a CPU. However, there is a cost to pay in terms of throughput with the additional preemption.

Typical server or high-performance computing workloads may prefer a less responsive kernel with fewer interruptions, yet capable of sustaining higher computational performance. In such a scenario, voluntary preemption may be preferred.

The last option (non-preemptive kernel) rarely provides benefits, only very specific CPU-intensive workloads with minimal I/O can experience improvements, such as complex image processing, or machine learning algorithms.

Kernel noise

Interrupts can introduce noise in certain latency-critical applications that can affect the expected level of performance and response predictability.

This includes hardware interrupts but also the tick interrupt: a periodic source of interruption used for timekeeping purposes and task scheduling.

The boot parameter nohz_full=<CPU_LIST> can be used to disable the periodic tick interrupt on a subset of CPUs if they run 0 or 1 task. In fact, there is no need to trigger the tick interrupt in this case; context switching is not required if just 1 task (or no task at all) is using the CPU.

This, together with proper IRQ affinity settings, can help to isolate certain CPUs from any source of interruption, providing a more predictable level of performance to the user-space applications running on such CPUs.

CPU wake-up events

In modern CPUs, power consumption is mostly determined by how often they need to leave their idle state.

The longer CPUs can stay idle, the more beneficial it is for your battery life.

Interrupts and asynchronous events that bring back CPUs to an operative state can be a significant source of power consumption.

In the kernel, a big source of wake-up events can be caused by RCUs https://lwn.net/Articles/652156/), due to the asynchronous RCU callback execution.

In fact, RCU callbacks are used for deferred cleanup or destruction of data structures after they are no longer in use by readers. However, if a system is subject to a significant amount of changes in some RCU-protected structures, the amount of callback executions per second can be relevant and cause a lot of wake-up events.

In order to reduce the amount of CPU wake-ups a new combination of options can be used in the kernel boot parameters: rcu_nocbs and rcutree.enable_rcu_lazy=1.

The former can be used to move the execution of RCU callbacks from a softirq context to a kthread context, the latter allows you to batch RCU callbacks and then flush them after a timed delay, instead of executing them immediately.

Grouping multiple RCU callbacks together and processing them all at once can significantly reduce the rate of wake-up events and provide around 5~10% power-savings for idle or lightly-loaded systems.

Adding rcu_nocbs=<CPU_LIST> rcutree.enable_rcu_lazy=1 to the kernel boot parameter can enhance power efficiency, making this setting particularly interesting in a mobile / laptop context.

Performance profiles examples

Following there is a list of performance profile examples mapped to these new kernel settings:

  • server (use default settings)

  • gaming: preempt=full

  • virtualization: preempt=full nohz_full=all

  • audio: preempt=full nohz_full=all threadirqs

  • mobile: preempt=full rcu_nocbs=all rcutree.enable_rcu_lazy=1

Simply copy paste the options above for the desired performance profile and add them to the GRUB_CMDLINE_LINUX_DEFAULT= line in /etc/default/grub (then run sudo update-grub to apply them at the next reboot).

Additional low-latency settings

Following there are other extra run-time settings that can help to increase system predictability and responsiveness:

  • set the cpufreq governor to performance via:

  • /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

  • disable proactive memory compaction:

    echo 0 | sudo tee /proc/sys/vm/compaction_proactiveness

  • disable kernel samepage merging

    echo 0 | sudo tee /sys/kernel/mm/ksm/run

  • trashing mitigation (prevent working set from getting evicted for 1000 millisec, this can help to mitigate stuttering behavior under memory pressure conditions):

    echo 1000 | sudo tee /sys/kernel/mm/lru_gen/min_ttl_ms

  • prevent stuttering behavior during intense I/O writes that may involve massive page cache flushing:

    echo 5 | sudo tee /proc/sys/vm/dirty_ratio
    echo 5 | sudo tee /proc/sys/vm/dirty_background_ratio

## Conclusion

The ongoing trend in Ubuntu on improving performance and observability highlights the importance of having a wider range of tunable options available. This allows users to adjust performance levels according to their specific needs.

In this article we have explored some new kernel options that are available in the new 24.04 generic kernel and how they can be used to dynamically adjust the same kernel to different system profiles.

Keep in mind that these settings are not valid for all the possible workloads. These are provided as generic examples of potential performance profiles that are possible to be defined by tuning certain kernel parameters.

In certain scenarios, a particular profile might outperform another, despite their intended purposes.

Therefore, it’s advisable to identify a set of metrics for measuring the performance, then experiment with a combination of the above-mentioned settings to determine the optimal configuration for your specific use case.

Future plans

In the future we may consider exploring the integration of these new kernel settings with automated tools (i.e., tuned GitHub - redhat-performance/tuned: Tuning Profile Delivery Mechanism for Linux) and try to automatically determine the best settings upon analyzing the typical system workload.

For now, expanding the range of tunable options and incorporating more observability features appears to be the optimal approach to improve the quality of the system, as opposed to offering a single fixed system solution “to rule them all”.

Therefore, the key objective is to offer more customization and freedom, without introducing significant overhead to support such flexibility and it appears that Ubuntu is progressing in this direction.

12 Likes

Hi, I wanted to ask, what are the best ways to benchmark and test the settings (Like what actions, tools or tasks should I be doing) to offer some insight about the impact of them for a desktop user.