Performance tracing Mir with LTTNG

raof · April 16, 2019, 6:43am

We’ve long had a set of LTTNG performance tracepoints in in Mir; they’re¹ hooked up to the Reports framework, so you can, for example, pass --compositor-report=lttng on the command line and get access to a bunch of LTTNG tracepoints.

We haven’t been doing very much with these over the last while, and as a result we don’t have any tracepoints in the (no longer very) new Wayland code.

So I’ve been going though, re-familiarising myself with the tracing and starting to hook up some tracepoints in the Wayland code. What I’ve got so far is enough to measure the latency between a client committing a buffer to a surface and that surface being rendered onto the screen.

And then a nice little python script to analyse the trace:

 chris@Behemoth  ~/Canonical/Mir/mir/benchmarks   commit-to-present-latency ●✚  ./commit_to_present.py ~/lttng-traces/mir-eglstream-latency-20190416-114042/ust/uid/1000/64-bit 
Client:  139737634773776
# NumSamples = 115; Min = 1.00; Max = 22.00
# Mean = 3.339130; Variance = 9.441512; SD = 3.072704; Median 3.000000
# each ∎ represents a count of 1
    1.0000 -     3.1000 [   100]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
    3.1000 -     5.2000 [     1]: ∎
    5.2000 -     7.3000 [     1]: ∎
    7.3000 -     9.4000 [     7]: ∎∎∎∎∎∎∎
    9.4000 -    11.5000 [     4]: ∎∎∎∎
   11.5000 -    13.6000 [     0]: 
   13.6000 -    15.7000 [     1]: ∎
   15.7000 -    17.8000 [     0]: 
   17.8000 -    19.9000 [     0]: 
   19.9000 -    22.0000 [     1]: ∎
Client:  139737634804272
# NumSamples = 255; Min = 1.00; Max = 24.00
# Mean = 10.015686; Variance = 18.729166; SD = 4.327721; Median 10.000000
# each ∎ represents a count of 1
    1.0000 -     3.3000 [    14]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎
    3.3000 -     5.6000 [    33]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
    5.6000 -     7.9000 [    16]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
    7.9000 -    10.2000 [    73]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   10.2000 -    12.5000 [    50]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   12.5000 -    14.8000 [    34]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   14.8000 -    17.1000 [    24]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   17.1000 -    19.4000 [     4]: ∎∎∎∎
   19.4000 -    21.7000 [     6]: ∎∎∎∎∎∎
   21.7000 -    24.0000 [     1]: ∎
Client:  139737639132352
# NumSamples = 105; Min = 0.00; Max = 25.00
# Mean = 14.628571; Variance = 21.947755; SD = 4.684843; Median 16.000000
# each ∎ represents a count of 1
    0.0000 -     2.5000 [     6]: ∎∎∎∎∎∎
    2.5000 -     5.0000 [     3]: ∎∎∎
    5.0000 -     7.5000 [     0]: 
    7.5000 -    10.0000 [     2]: ∎∎
   10.0000 -    12.5000 [     4]: ∎∎∎∎
   12.5000 -    15.0000 [    26]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   15.0000 -    17.5000 [    54]: ∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎∎
   17.5000 -    20.0000 [     3]: ∎∎∎
   20.0000 -    22.5000 [     1]: ∎
   22.5000 -    25.0000 [     6]: ∎∎∎∎∎∎

This shows three clients - qterminal, gedit, and gnome-keyring - and the times (in milliseconds) between the client submitting a buffer and it appearing on the screen. We can see that one client regularly commits its buffers just before a repaint, one commits its buffers at almost exactly the wrong time, and so has about a full 16ms frame between commit-and-present, and another client is in between. Unfortunately I haven’t yet done the work that would let us associate a meaningful name with the client, so we don’t know which is which.

This in neither strictly a Mir benchmark, nor a client benchmark. Mir can’t tell clients when to commit their buffers, but Mir should be able to influence when they do by sending frame callbacks at the right time.

The obvious next work is to provide more tracepoints, and more data, so we can better nail down client behaviour and associate clients with the binaries.

¹: It’s somewhat inconvenient that the tracepoints are hooked up to the Reports infrastructure; I might be migrating them to freestanding code.