Boosting the Real Time Performance of Gnome Shell 3.34 in Ubuntu 19.10

Boosting the Real Time Performance of Gnome Shell 3.34 in Ubuntu 19.10

As you may have read many times, Gnome 3.34 brings much improved desktop performance. In this article we will describe some of the improvements contributed by Canonical, how the problems were surprising, how they were approached and what other performance work is coming in future.

What is Gnome Shell, really?

Let’s get this right because many people don’t and it leads to unfair criticism of Gnome Shell. It is a desktop environment written in C and JavaScript on top of the Mutter compositor/window manager.

The majority of the logic is in the Mutter project, which includes the Clutter and Cogl graphics toolkits. Mutter is comprised of 1389 C source files (at time of writing). While most of the Mutter project is used as libraries by Gnome Shell, a mutter command also exists as a standalone compositor if all you need is a mouse pointer and wallpaper.

The Gnome Shell project is the smaller of the two projects, adding bling like the desktop panel and launcher, similar to how Unity 7 sits on top of Compiz . Gnome Shell is made up of 199 C source files and 157 JavaScript source files (at the time of writing).

The important thing to note here is that most of the source code is in the Mutter project, not Gnome Shell. So overall only around 10% of Gnome Shell is written in JavaScript when you consider Mutter, and around 90% written in C.

Problem

Gnome Shell 3.32 in Ubuntu 19.04 feels slower than Unity and other desktops. If you have a slow machine then it won’t run smoothly. Perhaps more surprisingly even if you have a fast machine then it still would not run completely smoothly.

Proposed and Obvious Solutions

Since many users knew Gnome Shell contained JavaScript and that it is an interpreted language, it was easy to blame that. The thing is most people wouldn’t know it’s only 10% JavaScript and that much of the time JavaScript isn’t running at all. Most of the time if you’re just interacting with an application then gnome-shell is running native machine code only, from C.

Developers on the other hand tend to focus on CPU usage and GPU usage. There are decades of knowledge on how to do that. Typically you profile your program and look for hot spots using the most CPU or GPU time.

The thing is in the case of Gnome Shell its biggest performance problems of late were not hot spots at all. They were better characterised as cold spots where it was idle instead of updating the screen smoothly. Such cold spots are only apparent when you look at the real time usage of a program, and in not the CPU or GPU time consumed.

What is real time usage?

Real time usage is the 123 when you run sleep 123. That’s not using any CPU time or GPU time but you know it’s going to take a while to finish. In the case of a single-threaded event-driven program like Gnome Shell it is the time spent in poll() and other blocking code that makes the system idle most of the time. And that’s mostly on purpose – you do want your system to be idle and consume minimal power when you’re not interacting with it.

Real time is also a superset of CPU time. So even if your program is CPU-bound it’s still a useful measurement to start on. If you measure real time you will catch both real time and CPU problems.

Gnome Shell and Mutter are single-threaded Glib event loop apps. So any pauses could cause them to miss the next frame and exhibit stuttering. Some common pauses are for disk IO or GPU IO, but just miscalculating when to render the next frame has also been an issue in gnome-shell.

How to measure real time usage

There are multiple ways, not all listed here, but many developers might not be familiar with any of them. For Ubuntu 19.10 we mostly used Google Profiler and Mesa itself.

Google Profiler is a profiler like gprof or callgrind, but the one Google has made freely available is different because it is stochastic and has some useful tricks. One such trick is that you can ask it to measure real time usage instead of the default CPU time usage:

env CPUPROFILE_REALTIME=1

This option means we can see where real time was spent rather than where the most watts were spent.

The other way we know there are real time problems is that Mesa’s Intel graphics driver will report if you’re making real time mistakes “stalling” the CPU or GPU:

env INTEL_DEBUG=perf myprogram

It won’t tell you where, but at least it will tell you if. Having the OpenGL library tell you that directly is pretty convincing.

Real time bugs found and fixed in Gnome 3.34

1. Accidentally missing the next frame by miscalculating when to start rendering it

This didn’t happen all the time so people would be understandably skeptical any such bug existed in 3.32 and earlier. It happened when frame scheduling is delayed by a couple of milliseconds, for any reason at all. But after fixing that the result is now consistently higher and smoother frame rates in 3.34.

2. In Xorg sessions everything was one frame laggier than in Wayland sessions

This was somewhat obvious when dragging windows in 19.04, and was one reason to prefer Wayland over Xorg. It was however not obvious that it was Mutter’s fault. It was easy to assume Xorg is just slower and so wasn’t obvious there was any Mutter or Gnome Shell bug to fix. After all, Unity and other desktops exhibit the same kind of lag.

The issue was caused by scheduling the next frame too early most of the time (opposite of the previous issue). This increased the time between the frame being rendered and displayed by ~16ms, increasing the visual latency. Now that’s fixed we have one frame lower latency in Xorg sessions all the time, compared to 3.32.

3. Mutter had grown a collection of different frame timing mechanisms

This was to deal with different drivers, and it would try some or all of them on every frame. As a result, cursor movement was artificially limited to 60Hz in Wayland sessions. 60Hz is so last year. Also as a result, some Nvidia systems using Xorg sessions would end up spinning at 100% CPU, thinking the first throttling method worked when it didn’t so would skip the other throttling methods intended to keep CPU usage under control.

Now that it’s fixed, the cursor movement in Wayland is at full refresh rate (already was in Xorg) and Nvidia systems no longer hog the CPU (as much, if they did at all because only some systems were affected).

4. Mutter queued all input events

And it didn’t reveal them to the shell or apps until after the next frame was rendered. Frames are typically 16ms apart (1000ms per second divided by 60 frames per second) and that’s near the threshold of just long enough for many people to notice a delay. Kind of like pointing at your screen with a slightly wobbly stick.

Partially fixed! Touchpad scrolling (and buttons and keys) has one frame lower latency now than it did before. This is a big deal for those with physically slow touchpads like some current ThinkPads.

Partially not fixed. Other parts of Gnome Shell as well as the Nvidia driver are too problematic to handle a full fix that includes mouse movement at full hardware speed. Yet…

Tip: Use Chromium on Xorg to get high resolution touchpad scrolling. It doesn’t yet work properly in (X)wayland sessions or in Firefox.

5. The Nvidia driver (in Xorg sessions) was being throttled

Yes, just Nvidia in Xorg sessions. This limited the CPU and GPU’s ability to work in parallel with each other and render the next frame smoothly on time. It was caused by a formerly necessary evil to work around the Nvidia driver being too fast for older versions of Mutter. New Mutter doesn’t need that workaround since the frame throttling mechanisms have been simplified and consolidated this cycle (item #3 above).

As a result, Nvidia desktop rendering is much faster and smoother with Mutter 3.34 than in previous releases.

6. “Picking” was done using OpenGL

“Picking” means figuring out what is under the cursor, whenever the mouse moves or the screen changes.

This required stalling the pipeline blocking both the CPU and GPU to synchronize them whenever you move the mouse. It was due to the historical design of Clutter to cover all possible user interface designs. But it used a lot more CPU to call OpenGL than you would expect.

Now that picking is done on the CPU and not on the GPU, cursor movement uses less CPU even (and zero GPU). At the same time it no longer stalls rendering of the entire desktop (or app or game) below it. So what started out as an attempt to reduced CPU usage ended up accidentally being a boost for the shell’s real time performance too.

Real time bugs found and not yet fixed in Gnome 3.34

1. Multi-monitor rendering in Wayland sessions spends some random fixed percentage of its time (average 50%) blocked, sleeping and unable to render the screen or respond to the user

This is caused by blocking in the Wayland (which is really the “EGL native”) backend. So presently for using Gnome with multi-monitors you need to choose between two suboptimal options:

  • Wayland: Blocks and stutters too often, but won’t tear.
  • Xorg: Screen tearing on all-but-one monitor thanks to DRI2 (apparently fixed in DRI3), but does not block or stutter.

This should be fixed in time for Gnome 3.36 and Ubuntu 20.04. When finally fixed for good Wayland sessions will provide the perfect balance of smoothness and not tearing.

2. Mutter is still failing to schedule the next frame on time in some cases

That is, if CPU usage becomes moderate, not even high. The only fix proposed so far has side effects of exposing crashes and deadlocks in the Wayland backend. Those are different problems but will need to be fixed first. Hopefully before Ubuntu 20.04.

Mistakes made along the way (please avoid these)

You live and learn. Please learn from these mistakes that we have made in researching the graphics performance of Gnome Shell…

  • Assuming that moderate CPU usage was the reason why graphics weren’t smooth. Yes 50% CPU on an i7 processor is a lot of processing. But no it’s not the reason your desktop isn’t smooth. That would be a number closer to 100% (which in top means saturating a single CPU core). Moderate CPU is not the main problem, yet.

  • Skipping straight to GPU profiling when some simple measurement of render times would have told you that GPU utilization is low. At least, it’s a simple number but you have no way of knowing it yet. We have a proposal on the way to fix that.

  • Assuming that JavaScript is slower than everything else written in C.

  • Assuming that JavaScript is in use at all. Generally if you’re only interacting with an app window it’s not using JavaScript. That’s pure C.

  • Assuming everything happens as quickly as your CPU or GPU can do it (no delays in starting). Programming mistakes or just intentional design decisions will sometimes mean that’s not true.

  • Not testing high refresh rate monitors sooner. A machine with a 60Hz display might only achieve 30Hz for some things. The same machine with a 120Hz display seems to achieve 60Hz for those same things. Same CPU and same GPU. Hence CPU and GPU power are not the issue. The lesson here is that you can conclude there was an algorithmic bug somewhere causing every other frame to be missed and not a hardware limitation.

  • Believing increased CPU usage is always a bug. For example, if you were blocked 50% of the time and achieving 30 FPS at 40% CPU then removing that blockage might give you 60 FPS at 80% CPU. The CPU usage is higher but that’s also a desirable step forward to reach full frame rate. Don’t immediately think you’ve made a mistake just because your code change has increased CPU usage.

  • Dragging windows and assuming that’s related to performance of the rest of the shell. It’s not always. We found that dragging windows had its own unique reason for being slow on top of everything else.

  • Looking at the icon spring animation and assuming that’s related to performance of the rest of the shell. It’s not. Admittedly this is one thing that is very much JavaScript and the reason it was slow in the past was not related to your GPU or graphics driver, just spikey CPU usage.

  • Not spending enough time to fully understand profiler results. Profilers will always generate more information than a human brain can cope with. It’s important to filter, analyse and really take some time to understand what it’s telling you.

  • Never using other operating systems to compare against. I know we are here because we love Ubuntu, and that’s great. I’m just saying that you won’t know what the competitive difference is if you don’t have a little bit of experience with macOS, ChromeOS, Windows, etc.

  • Guessing without measurement.

  • Educated guessing based on previous experience, without fresh measurement. Developers think they know what’s fast and what’s slow. But sometimes you need to lose those preconceptions. Tell yourself you really know nothing and allow careful measurement to surprise and enlighten you. You’ll often find those optimizations you do regularly out of habit are actually negligible in the big picture.

The Grand Plan for Gnome Shell performance

As you’ve seen, there are lots of problems that have been fixed and still some tricky problems yet to be fixed. If you’re interested in project tracking then the links we use in Ubuntu are (stutter | latency | CPU).

But what to work on first? To break up the many issues into something more manageable we have two main goals for the Gnome Shell desktop on Ubuntu:

  1. Make it fast on newer/fast machines.
  2. Make it fast on older/slow machines.

“Make it fast” means maintaining the full frame rate of your monitor with no stutters. “Fast machines” means anything that could already run Unity or Gnome desktops usably. But admittedly that’s a little subjective.

Making it fast on newer/fast machines

To do this we first need to fix all real time delay issues. Because those might hurt you the same even with an infinitely fast CPU and GPU. You should be able to upgrade your hardware and actually experience some improvement.

The good news is that this is mostly done in Mutter 3.34 for Ubuntu 19.10. We aim for it to be fully done in Mutter 3.36 in Ubuntu 20.04. What’s still remaining to achieve this goal is mainly:

  • Revisit and rewrite mutter!719 to avoid missed frames. This might take multiple steps. A couple are already in progress (1, 2).
  • Revisit/rewrite mutter!73 to complete high performance multi-monitor rendering for Wayland. Although it’s been suggested the upstream Gnome developers are already planning on doing this for us. Awesome!

Finally, we need to find and fix any blocking disk IO. The upstream Gnome developers have made a start on this (1, 2). But actually we probably need some discussion and advice on how best to catch and measure these when you don’t know the point in time, or in code, they’re going to happen at.

Making it fast on older/slow machines

This is more difficult because first we must be sure that all the real time blocking issues that hinder even the fastest of machines are fixed.

Next, the plan is to measure, measure, measure, and continue fixing a number of CPU hot spots when those become the main class of problem. Then measure some more and if there are any GPU bottlenecks fix those too.

We hope to make Gnome Shell much faster for older/slow machines in 2020 and the first step is almost completed already.

Final thoughts

So there you have it. Gnome Shell 3.34 in Ubuntu 19.10 is noticeably faster than previous releases, and I hope you have found the steps we took to get there interesting. But in the grand scheme of things we’re only partly done:

17.10: Gnome Shell arrives in Ubuntu
18.04: Minor performance improvements
18.10: Minor performance improvements
19.04: Minor performance improvements
19.10: Major performance improvements :arrow_left: You are here
20.04: Goal: High performance on fast/modern machines
20.10: Goal: High performance on slow/older machines

The future of Gnome Shell is bright and worth getting excited about.

55 Likes

Very interesting read! Congratulations to everyone involved!

1 Like

This post shows how deep you guys gone to bring those improvements, thanks everyone in team involved fixing these issues.
i also upgraded my machine to 19.10 and using gnome shell as my primary D.E, i also noticed few issues which were not in unity 7 or other desktops i used. please have a read - http://www.ktechpit.com/ubuntu/issues-i-faced-while-using-ubuntu-19-10-gnome-3-34/
thanks.

5 Likes

Thank you Daniel for the very interesting article and all your hard work. Gnome was next to unusable back in 17.10 before you guys shifted your focus away from Unity. It still has a way to go (my current use case requires multimonitor wayland for mixed dpi, and thus I’m using Sway for now) but I’m looking forward to be able to go back to a more full featured DE.

2 Likes

Couldn’t this be found when unattended-upgrades is running?
I experience stuttering from time to time especially when running a game and always thought it’s because of those, I might be wrong though.

1 Like

I want to know why you think that introduce threading will probably don’t needed. Cite [here]:

I do not know of anyone working on introducing threading yet, but the need for it is rapidly shrinking toward zero.

Have threading it’s not in fact a way to make things much more faster, especially on newer/fast machines ? I’ m not asking as a user, i’m asking as a developer of extensions. So:

  1. My extension will not run much more faster in a thread that is running outside all that issues you mention that mutter already have?
  2. It’s not good in fact isolate problems in different threads?
  3. It’s not in fact javascript a limitation to some type of specific performance problems like this for example?
  4. Will not be much more secure that extensions will run outside the mutter thread?

Thanks for the article. It help me a lot to understand a lot of things, but others things remains as questions…

2 Likes

So making it as fast and light as Xfce, LXQt, MATE & Plasma? That should be interesting. I may not use Gnome shell, but this would definitely make the desktop environment arena much more interesting. Good luck!

2 Likes

Will these improvements help people who use Gnome Flashback?

Its interesting to hear what fixes have been going into Gnome by canonical for the upcoming distro releases.

Its nice to finally have some numbers for the percentage of the code base in C vs JS though its hard to make any kind of comparison without including size as a metric as well.

I wanted to say those numbers can’t be right because if those numbers are correct, that would mean most of the codes documentation is largely undocumented at this point. The most documented parts of Gnome are the API and JS bindings, there’s almost no documentation except for the source code on the C internals and much of the documentation available from the gnome site is stale (hasn’t been updated in at least 10 minor releases).

There are a lot of people that would like to contribute, but can’t because the documentation is in such a bad place, and there are at least some that won’t switch to gnome without having a clear picture of how it works.

Can you comment on any tips or tricks the canonical team uses internally to get up to speed when working with Gnome given the current state of Gnome’s documentation?

2 Likes

Yes, everything applies to Flashback.

No. GNOME Flashback does not use mutter or gnome-shell. GNOME Classic does use GNOME Shell and mutter. (GNOME Classic is currently available in the gnome-shell-extensions package but maybe eventually it will get moved to a better named package).

2 Likes

First, thank you so much for your work on all of this. I’ve been following all of these issues for a long time and I’m still following more in hopes that I can one day use my Quadro M1200 laptop with my two external monitors. Without you fighting the good fight, I would have no hope of ever making use of my hardware.

Unfortunately, I’ve killed so much time just trying to achieve a stable system using just one monitor with my laptop screen turned off, and every time I make it work I’m terrified to do updates or reboot my system because it’s hard to keep it working.

On the latest updates with the 440 driver I still can’t boot up or login with my external monitor plugged in. Using two screens makes everything super laggy and slow. I had to set “use_root_rights=yes” in Xwrapper.config just to get it to detect my monitors and using the proprietary driver I still can’t get Wayland to even show up as an option logging in with GDM. It shows up with lightdm but that breaks other things. My brother has the same laptop but not the same issues, despite comparing every variable I know of.

I’m not sure all the issues are even tracked but there are so many variables I’m not even sure where to report anything or what exactly to report.

1 Like

Threading is something that I would recommend if starting a new project from scratch. When done well it will definitely improve performance. It’s only risky to introduce to a large project that already relies on being single threaded. That means it has very little awareness/safety of concurrent memory accesses etc.

Also worth noting that threads do not “isolate problems” or increase security at all. Because threads share memory with each other. Zero isolation or security. If you want those things then you need to put each component in a different process, like Chrome does with browser tabs.

7 Likes

I’m not a developer but just an ordinary Ubuntu desktop user but even to someone like me, this article made a very interesting read. Just getting a look into how the developers tackle certain usability problems and improve the performance of an existing system is quite an eye opener. Thank you for all your good work and creating something which benefits a whole ecosystem of users :slight_smile:

3 Likes

Also worth noting that threads do not “isolate problems”…

Well yes… The original idea is create a different process for all the GUI (the shell part), not just a different thread. But apparently some GNOME developers thinks also that have a different thread will be probably enough: https://www.youtube.com/watch?v=QgZRU9eQqKc&t=484s

It’s only risky to introduce to a large project that already relies on being single threaded…

Yes probably, but programming is take risk always or not? No do anything is take more risk in my opinion. In GNOME you are in fact the prove of that theory because you are doing things also when not all people are convinced about what you are doing.

Thanks for answer me. Now I understand your point, is just that i prefer the risk over the alternative of do nothing and have a single thread forever…

1 Like

The documentation I personally use the most is the Clutter documentation. Clutter is the core of Gnome Shell and Mutter and is primarily maintained inside the Mutter project. So it’s the main thing to learn.

The above web site isn’t completely up to date with the source code that lives in Mutter, but is mostly still accurate. The public API and documentation there change very little.

2 Likes
  1. Make sure you have a Launchpad account.
  2. Open a Terminal window and run:
    ubuntu-bug gnome-shell
    

That should help you to create a detailed bug report which we can then analyse and answer.

5 Likes

Dumb question : will any of the fixes done here at a drivers / gtk level improve Unity?

1 Like

Unity does not use mutter or gnome shell, so improvements to these will have no effect.

2 Likes