Triple Buffering, a debrief

As you may have read, triple buffering for GNOME desktop landed recently in Mutter 48. This was a multi-year effort to rethink how frames are scheduled so as to maintain an optimal frame rate under different conditions. But there are many caveats and tasks yet to be done.

At this stage anyone who knows anything about computer graphics is already balking at the first paragraph. Why did it take years when most OpenGL drivers give you triple buffering for free by default? The reason is best summarized as “Mutter’s architecture”. It squeezes all rendering into the single-threaded event loop. But Mutter is not alone in this.

Noticing there’s a problem

Nobody seemed to realize what Mutter’s performance problem was for most of its life. Or that a problem existed, so nobody was searching for a solution.

The smoking gun appeared in 2020 when the Ubuntu Desktop team was doing a hack day to experiment with wild ideas. And in demonstrating some fun eye candy it quickly became apparent the frame rate in GNOME Shell was never reaching the monitor’s refresh rate at the same time as the GPU never being adequately utilized. The hardware was mostly idle and not even trying to reach full frame rate. So clearly there was something wrong with Mutter’s motivation to render frames.

The early years

While it only took a few months in 2020 to get triple buffering done for Xorg sessions, that only covered the frame clock changes. This worked immediately because GLX is also triple buffered and accepted the extra frames without complaint.

For the next 1.5 years from late 2020 to early 2022, the focus was on Mutter’s Wayland backend to make it as flexible as Xorg. It had to be made to cope with multiple frames in flight simultaneously, meaning the CPU is working on the next frame before the GPU has finished following the instructions on how to render the previous one. But the CPU and GPU are separate entities so this works.

In early 2022 all known bugs were resolved and Ubuntu 22.04 shipped triple buffering enabled by default. The three years from 2022 to 2025 were then spent mostly idle waiting for reviews (I do have other jobs), or debating the architecture of the Wayland backend changes which got redesigned a few times in these years. But we got there in the end and it is now merged in GNOME 48 for everyone to enjoy.

Latency

Some graphics programmers and gamers will tell you that triple buffering has a latency penalty. That’s true for dumb implementations where SwapBuffers is your throttling mechanism, but you can be smarter about it and avoid the latency penalty like Mutter did. Actually Mutter isn’t the first. We implemented the same triple buffering algorithm in an early version of Mir. And before that likely other people invented the same approach.

The way you avoid latency in triple buffering is to keep your rendering only one frame ahead of the display “scanning it out”. That sounds a lot like double buffering, but there’s a twist: You schedule your frames on a clock independently of them being consumed such that you’re always rendering on time for when you should have been rendering in the perfect scenario of maintaining full frame rate. So we ignore whether or not the GPU is keeping up right now and provide it with the workload to show how hard it should be working to reach our desired frame rate. Of course the limit here is the number of buffers you have behind the scenes of SwapBuffers. In Mesa it’s typically 4 (yes it allows quad-buffering), but we limit ourselves to 3.

This algorithm has the effect of sticking to double buffering unless the system is unable to keep up in which case it naturally transitions to triple buffering behind the scenes. There are always 3 or 4 buffers we can render to, but it’s a matter of how many we are willing to utilize.

The rest of the software stack

Triple buffering was only implemented for Mutter. So we have optimized the frame rate of shell animations and that provides some encouragement to applications to render more smoothly, but many applications still need more work themselves.

As a first example, Firefox struggles to render inertial scrolling smoothly if you fling it with a touchpad. That is unless you set the machine to Performance mode to keep the clock frequencies high. This shouldn’t happen in 2025. A web page should be able to scroll smoothly on a modern GPU without the machine having to leave Power Saver mode. GTK seems to have similar issues (try a Nautilus/Files window with lots of files to scroll through).

One commonality between these apps/toolkits appears to be they’re always double buffering. You can see this in the number of unique buffers being used on each surface (a window or subwindow):

env WAYLAND_DEBUG=client firefox |& grep attach

So I believe the next step in making a fluid Linux desktop is to start fixing the toolkits and apps that touch Wayland directly; GTK, Firefox etc.

Flutter is in a similar situation since it is rendered on top of GTK, so it needs GTK to be performant first. But Flutter has an extra impediment right now using GTK3 instead of GTK4. In GTK3 the final frame gets software-copied back to the window using the CPU. So every frame is kind of software rendered. Fortunately GTK4 support in Flutter is on the way with no such bottleneck.

I’ve read a couple of times developers of other desktop environments say they don’t need triple buffering because their rendering is so efficient. That might be partially true but the moment you have a person try to open more windows than a low-end integrated GPU can composite quickly, you’re going to need triple buffering. Certainly if your focus is gaming you might not have any shortage of GPU power, and if your focus is full screen gaming then you don’t need to composite at all (direct scanout). But the windowing performance on low-to-medium range hardware is what Ubuntu Desktop cares about the most. And Mutter’s uniquely single-threaded architecture is probably another challenge those other desktop environments don’t have to cope with.

Where triple buffering fails

There were some hardware configurations where triple buffering was found to provide a less than optimal improvement (but still an improvement).

Xilinx Kria development boards were found to be missing a third buffer completely. They only offer double buffering in hardware. The saving grace here was that the driver seemed to be using linear-like buffers and would reallocate buffers already allocated (that’s a bug). So it would just tear on the frames that otherwise would have been triple buffered. It doesn’t tear while double buffering is able to reach full frame rate. That’s accidentally the same as how Windows seems to approach frame rate management and Nvidia has proposed the same in the past.

The Nvidia proprietary driver was found to perform differently to open source graphics drivers in that its gbm_surface_lock_front_buffer implementation is synchronous (LP#2081140). What’s weird here is that “synchronous” is what I expected. I didn’t know Mesa’s open source drivers were completely asynchronous and use “implicit synchronization”. This means file descriptors are passed alongside each buffer to flag when their contents are actually complete. So you can pass buffers around even before the GPU has finished rendering them. This further reduces the likelihood that the CPU is ever blocked waiting on the GPU.

Finally software rendering doesn’t get the same benefit from triple buffering as hardware rendering does. This is because there’s no physically separate GPU to do the work in parallel to the CPU. Software rendering does still get some benefit from triple buffering because the CPU itself is managing multiple threads and we’re encouraging those to work harder and schedule smarter. One can’t forget though, as amazing as Mesa’s software rendering mode is (using LLVM), it only emulates a GPU by using all your CPU cores. That’s not ideally something you want happening in a virtual machine or in a cloud-based GUI where GPUs are uncommon. So if you run virtual machines you may be better off using a desktop environment that doesn’t use OpenGL or Vulkan at all.

You’re up to date

Triple buffering isn’t quite as simple and straightforward as one might expect. I wanted to dispel some myths, share some trivia, and highlight where more work is likely still needed. Hopefully more of the open source community can contribute and learn from the lessons outlined above.

6 Likes