Exploring O3 Optimization for Ubuntu

Ubuntu’s long standing reputation for performance is rooted in our commitment to delivering the latest code, kernels and compilers. This commitment ensures that developers have access to the fastest and most secure environment for their applications.

Following our recent work with Ubuntu 24.04 LTS where we enabled frame pointers by default to improve debugging and profiling, we’re continuing our performance engineering efforts by evaluating the impact of O3 optimization in Ubuntu.

What is O3 Optimization?

O3 is a GCC optimization level that applies more aggressive code transformations compared to the default O2 level. These include advanced function and the use of sophisticated algorithms aimed at enhancing execution speed. While O3 can increase binary size and compilation time, it has the potential to improve runtime performance.

Why Experiment with O3?

After integrating frame pointers to improve performance insights, we’re now testing O3 optimizations to see how they affect performance in various workloads. This experiment aims to determine where O3 can provide performance improvements and to gather data for future optimisation decisions.

Try the Experimental Builds

We’ve recently rebuilt the Ubuntu archive with O3 optimisations and produced experimental Server and Desktop images. Over the next few weeks we will be conducting various internal benchmarking and evaluations in order to assess the effectiveness of these optimisations.

Disclaimer: These images are not supported. They will receive no security updates. Do not use them in production!

If you would like to get involved in this evaluation, please download these images, run your typical workloads, and share your feedback with us on this thread! Your participation will help us understand the potential real-world impact of O3 optimizations.

Thanks.

References:

https://ubuntu.com/blog/optimising-ubuntu-performance-on-amd64-architecture

https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Optimize-Options.html

2 Likes

how to list programs which have O3 usage(and may be incapable to run)? Some Apt Sources settings could be easily setup and server disk space … but part of O2 dynamic linking new specs?

But there should maybe be some preference for programs to show, search, list or run as first to make it easier…

But compiled program size grows a lot due gcc insist for modern CPU features and some benchmarks are also in place and even specialized Intel distribution was in place.
Seems as weird way for specialized Intel or other vendors soft when adding some channel or aspect(or checklist.conf) for APT SOURCES GUI should solve some trouble

Even old Debian way with its various channels in not planar way was gift for Ubuntu which could even use Debian way to compile package on the fly from sources or jackass way to transform(alien) Red Hat rpm packages which are part of archaic LSB standard but Steam Deck has other meaning for todays need of compilation and server space another, corporations policies are not present to allow or disallow something in their mind or occupations but browsers begin to occupy(if they will be allowed)…

Can ever growing gcc could solve this? and not ban something or not allow to run or say: Hi, this old program should be run in emulation(but missing glibc_old_api :smiley: ) if kernel allows this or some GUI prompt settings for this app(now there are even PHP Laravel text prompts)? But adding some user space system lib to say: make this optimization shit more understandable to human needs and not programmer way of compilation and then some O3 shits could be more gone.

Todays computer are a lot of faster with tons of GHz but todays programs are slow on old PCs and thats nature of gcc or instructions in CPU and user space instructions are missing to allow some Hello World program without few hundreds of MB to show something to display and be clickable but with fancy buttons and new GTK version with CSS animations not respecting some architecture of something deeper not only in PC world. Thats not all programs only minor one, but I don’t care how CPU vendors do their stuff, my problem is my program and my customer.

Thanks for working on performance! Here is some information I collected using your -O3 ISO.

System:
Tiger Lake Core i7-1165G7
System76 Lemur Pro 10 with 40GB single-channel RAM
Samsung 980 Pro 1TB NVME drive
All tests performed on ZFS with ZFS-native encryption

Compiling the Chapel 2.1 compiler:

347.42 seconds CachyOS x86-64-v4 and other optimizations
401.36 seconds Noble -O3
420.10 seconds Noble without -O3

Running the Chapel examples:

361.44 seconds CachyOS x86-64-v4 and other optimizations
367.39 seconds Noble -O3
387.37 seconds Noble without -O3

So, for this workload, -O3 improves performance by an average of about 5%. (CachyOS adds other optimizations on top of that, adding an average of 8+% in performance.)

Thank you so much @dmk-dmk for putting this quick comparison together, very interesting.