Boosting the Real Time Performance of Gnome Shell 3.34 in Ubuntu 19.10
As you may have read many times, Gnome 3.34 brings much improved desktop performance. In this article we will describe some of the improvements contributed by Canonical, how the problems were surprising, how they were approached and what other performance work is coming in future.
What is Gnome Shell, really?
Let’s get this right because many people don’t and it leads to unfair criticism of Gnome Shell. It is a desktop environment written in C and JavaScript on top of the Mutter compositor/window manager.
The majority of the logic is in the Mutter project, which includes the Clutter and Cogl graphics toolkits. Mutter is comprised of 1389 C source files (at time of writing). While most of the Mutter project is used as libraries by Gnome Shell, a mutter
command also exists as a standalone compositor if all you need is a mouse pointer and wallpaper.
The Gnome Shell project is the smaller of the two projects, adding bling like the desktop panel and launcher, similar to how Unity 7 sits on top of Compiz . Gnome Shell is made up of 199 C source files and 157 JavaScript source files (at the time of writing).
The important thing to note here is that most of the source code is in the Mutter project, not Gnome Shell. So overall only around 10% of Gnome Shell is written in JavaScript when you consider Mutter, and around 90% written in C.
Problem
Gnome Shell 3.32 in Ubuntu 19.04 feels slower than Unity and other desktops. If you have a slow machine then it won’t run smoothly. Perhaps more surprisingly even if you have a fast machine then it still would not run completely smoothly.
Proposed and Obvious Solutions
Since many users knew Gnome Shell contained JavaScript and that it is an interpreted language, it was easy to blame that. The thing is most people wouldn’t know it’s only 10% JavaScript and that much of the time JavaScript isn’t running at all. Most of the time if you’re just interacting with an application then gnome-shell is running native machine code only, from C.
Developers on the other hand tend to focus on CPU usage and GPU usage. There are decades of knowledge on how to do that. Typically you profile your program and look for hot spots using the most CPU or GPU time.
The thing is in the case of Gnome Shell its biggest performance problems of late were not hot spots at all. They were better characterised as cold spots where it was idle instead of updating the screen smoothly. Such cold spots are only apparent when you look at the real time usage of a program, and in not the CPU or GPU time consumed.
What is real time usage?
Real time usage is the 123 when you run sleep 123
. That’s not using any CPU time or GPU time but you know it’s going to take a while to finish. In the case of a single-threaded event-driven program like Gnome Shell it is the time spent in poll()
and other blocking code that makes the system idle most of the time. And that’s mostly on purpose – you do want your system to be idle and consume minimal power when you’re not interacting with it.
Real time is also a superset of CPU time. So even if your program is CPU-bound it’s still a useful measurement to start on. If you measure real time you will catch both real time and CPU problems.
Gnome Shell and Mutter are single-threaded Glib event loop apps. So any pauses could cause them to miss the next frame and exhibit stuttering. Some common pauses are for disk IO or GPU IO, but just miscalculating when to render the next frame has also been an issue in gnome-shell.
How to measure real time usage
There are multiple ways, not all listed here, but many developers might not be familiar with any of them. For Ubuntu 19.10 we mostly used Google Profiler and Mesa itself.
Google Profiler is a profiler like gprof or callgrind, but the one Google has made freely available is different because it is stochastic and has some useful tricks. One such trick is that you can ask it to measure real time usage instead of the default CPU time usage:
env CPUPROFILE_REALTIME=1
This option means we can see where real time was spent rather than where the most watts were spent.
The other way we know there are real time problems is that Mesa’s Intel graphics driver will report if you’re making real time mistakes “stalling” the CPU or GPU:
env INTEL_DEBUG=perf myprogram
It won’t tell you where, but at least it will tell you if. Having the OpenGL library tell you that directly is pretty convincing.
Real time bugs found and fixed in Gnome 3.34
1. Accidentally missing the next frame by miscalculating when to start rendering it
This didn’t happen all the time so people would be understandably skeptical any such bug existed in 3.32 and earlier. It happened when frame scheduling is delayed by a couple of milliseconds, for any reason at all. But after fixing that the result is now consistently higher and smoother frame rates in 3.34.
2. In Xorg sessions everything was one frame laggier than in Wayland sessions
This was somewhat obvious when dragging windows in 19.04, and was one reason to prefer Wayland over Xorg. It was however not obvious that it was Mutter’s fault. It was easy to assume Xorg is just slower and so wasn’t obvious there was any Mutter or Gnome Shell bug to fix. After all, Unity and other desktops exhibit the same kind of lag.
The issue was caused by scheduling the next frame too early most of the time (opposite of the previous issue). This increased the time between the frame being rendered and displayed by ~16ms, increasing the visual latency. Now that’s fixed we have one frame lower latency in Xorg sessions all the time, compared to 3.32.
3. Mutter had grown a collection of different frame timing mechanisms
This was to deal with different drivers, and it would try some or all of them on every frame. As a result, cursor movement was artificially limited to 60Hz in Wayland sessions. 60Hz is so last year. Also as a result, some Nvidia systems using Xorg sessions would end up spinning at 100% CPU, thinking the first throttling method worked when it didn’t so would skip the other throttling methods intended to keep CPU usage under control.
Now that it’s fixed, the cursor movement in Wayland is at full refresh rate (already was in Xorg) and Nvidia systems no longer hog the CPU (as much, if they did at all because only some systems were affected).
4. Mutter queued all input events
And it didn’t reveal them to the shell or apps until after the next frame was rendered. Frames are typically 16ms apart (1000ms per second divided by 60 frames per second) and that’s near the threshold of just long enough for many people to notice a delay. Kind of like pointing at your screen with a slightly wobbly stick.
Partially fixed! Touchpad scrolling (and buttons and keys) has one frame lower latency now than it did before. This is a big deal for those with physically slow touchpads like some current ThinkPads.
Partially not fixed. Other parts of Gnome Shell as well as the Nvidia driver are too problematic to handle a full fix that includes mouse movement at full hardware speed. Yet…
Tip: Use Chromium on Xorg to get high resolution touchpad scrolling. It doesn’t yet work properly in (X)wayland sessions or in Firefox.
5. The Nvidia driver (in Xorg sessions) was being throttled
Yes, just Nvidia in Xorg sessions. This limited the CPU and GPU’s ability to work in parallel with each other and render the next frame smoothly on time. It was caused by a formerly necessary evil to work around the Nvidia driver being too fast for older versions of Mutter. New Mutter doesn’t need that workaround since the frame throttling mechanisms have been simplified and consolidated this cycle (item #3 above).
As a result, Nvidia desktop rendering is much faster and smoother with Mutter 3.34 than in previous releases.
6. “Picking” was done using OpenGL
“Picking” means figuring out what is under the cursor, whenever the mouse moves or the screen changes.
This required stalling the pipeline blocking both the CPU and GPU to synchronize them whenever you move the mouse. It was due to the historical design of Clutter to cover all possible user interface designs. But it used a lot more CPU to call OpenGL than you would expect.
Now that picking is done on the CPU and not on the GPU, cursor movement uses less CPU even (and zero GPU). At the same time it no longer stalls rendering of the entire desktop (or app or game) below it. So what started out as an attempt to reduced CPU usage ended up accidentally being a boost for the shell’s real time performance too.
Real time bugs found and not yet fixed in Gnome 3.34
1. Multi-monitor rendering in Wayland sessions spends some random fixed percentage of its time (average 50%) blocked, sleeping and unable to render the screen or respond to the user
This is caused by blocking in the Wayland (which is really the “EGL native”) backend. So presently for using Gnome with multi-monitors you need to choose between two suboptimal options:
- Wayland: Blocks and stutters too often, but won’t tear.
- Xorg: Screen tearing on all-but-one monitor thanks to DRI2 (apparently fixed in DRI3), but does not block or stutter.
This should be fixed in time for Gnome 3.36 and Ubuntu 20.04. When finally fixed for good Wayland sessions will provide the perfect balance of smoothness and not tearing.
2. Mutter is still failing to schedule the next frame on time in some cases
That is, if CPU usage becomes moderate, not even high. The only fix proposed so far has side effects of exposing crashes and deadlocks in the Wayland backend. Those are different problems but will need to be fixed first. Hopefully before Ubuntu 20.04.
Mistakes made along the way (please avoid these)
You live and learn. Please learn from these mistakes that we have made in researching the graphics performance of Gnome Shell…
-
Assuming that moderate CPU usage was the reason why graphics weren’t smooth. Yes 50% CPU on an i7 processor is a lot of processing. But no it’s not the reason your desktop isn’t smooth. That would be a number closer to 100% (which in
top
means saturating a single CPU core). Moderate CPU is not the main problem, yet. -
Skipping straight to GPU profiling when some simple measurement of render times would have told you that GPU utilization is low. At least, it’s a simple number but you have no way of knowing it yet. We have a proposal on the way to fix that.
-
Assuming that JavaScript is slower than everything else written in C.
-
Assuming that JavaScript is in use at all. Generally if you’re only interacting with an app window it’s not using JavaScript. That’s pure C.
-
Assuming everything happens as quickly as your CPU or GPU can do it (no delays in starting). Programming mistakes or just intentional design decisions will sometimes mean that’s not true.
-
Not testing high refresh rate monitors sooner. A machine with a 60Hz display might only achieve 30Hz for some things. The same machine with a 120Hz display seems to achieve 60Hz for those same things. Same CPU and same GPU. Hence CPU and GPU power are not the issue. The lesson here is that you can conclude there was an algorithmic bug somewhere causing every other frame to be missed and not a hardware limitation.
-
Believing increased CPU usage is always a bug. For example, if you were blocked 50% of the time and achieving 30 FPS at 40% CPU then removing that blockage might give you 60 FPS at 80% CPU. The CPU usage is higher but that’s also a desirable step forward to reach full frame rate. Don’t immediately think you’ve made a mistake just because your code change has increased CPU usage.
-
Dragging windows and assuming that’s related to performance of the rest of the shell. It’s not always. We found that dragging windows had its own unique reason for being slow on top of everything else.
-
Looking at the icon spring animation and assuming that’s related to performance of the rest of the shell. It’s not. Admittedly this is one thing that is very much JavaScript and the reason it was slow in the past was not related to your GPU or graphics driver, just spikey CPU usage.
-
Not spending enough time to fully understand profiler results. Profilers will always generate more information than a human brain can cope with. It’s important to filter, analyse and really take some time to understand what it’s telling you.
-
Never using other operating systems to compare against. I know we are here because we love Ubuntu, and that’s great. I’m just saying that you won’t know what the competitive difference is if you don’t have a little bit of experience with macOS, ChromeOS, Windows, etc.
-
Guessing without measurement.
-
Educated guessing based on previous experience, without fresh measurement. Developers think they know what’s fast and what’s slow. But sometimes you need to lose those preconceptions. Tell yourself you really know nothing and allow careful measurement to surprise and enlighten you. You’ll often find those optimizations you do regularly out of habit are actually negligible in the big picture.
The Grand Plan for Gnome Shell performance
As you’ve seen, there are lots of problems that have been fixed and still some tricky problems yet to be fixed. If you’re interested in project tracking then the links we use in Ubuntu are (stutter | latency | CPU).
But what to work on first? To break up the many issues into something more manageable we have two main goals for the Gnome Shell desktop on Ubuntu:
- Make it fast on newer/fast machines.
- Make it fast on older/slow machines.
“Make it fast” means maintaining the full frame rate of your monitor with no stutters. “Fast machines” means anything that could already run Unity or Gnome desktops usably. But admittedly that’s a little subjective.
Making it fast on newer/fast machines
To do this we first need to fix all real time delay issues. Because those might hurt you the same even with an infinitely fast CPU and GPU. You should be able to upgrade your hardware and actually experience some improvement.
The good news is that this is mostly done in Mutter 3.34 for Ubuntu 19.10. We aim for it to be fully done in Mutter 3.36 in Ubuntu 20.04. What’s still remaining to achieve this goal is mainly:
- Revisit and rewrite mutter!719 to avoid missed frames. This might take multiple steps. A couple are already in progress (1, 2).
- Revisit/rewrite mutter!73 to complete high performance multi-monitor rendering for Wayland. Although it’s been suggested the upstream Gnome developers are already planning on doing this for us. Awesome!
Finally, we need to find and fix any blocking disk IO. The upstream Gnome developers have made a start on this (1, 2). But actually we probably need some discussion and advice on how best to catch and measure these when you don’t know the point in time, or in code, they’re going to happen at.
Making it fast on older/slow machines
This is more difficult because first we must be sure that all the real time blocking issues that hinder even the fastest of machines are fixed.
Next, the plan is to measure, measure, measure, and continue fixing a number of CPU hot spots when those become the main class of problem. Then measure some more and if there are any GPU bottlenecks fix those too.
We hope to make Gnome Shell much faster for older/slow machines in 2020 and the first step is almost completed already.
Final thoughts
So there you have it. Gnome Shell 3.34 in Ubuntu 19.10 is noticeably faster than previous releases, and I hope you have found the steps we took to get there interesting. But in the grand scheme of things we’re only partly done:
17.10: Gnome Shell arrives in Ubuntu
18.04: Minor performance improvements
18.10: Minor performance improvements
19.04: Minor performance improvements
19.10: Major performance improvements You are here
20.04: Goal: High performance on fast/modern machines
20.10: Goal: High performance on slow/older machines
The future of Gnome Shell is bright and worth getting excited about.