Open Source Summit from The Linux Foundation June 23-25 (Denver, Colorado USA)

hypothetical-lemon · June 27, 2025, 3:47pm

Event Report: Open Source Summit from The Linux Foundation (Denver, Colorado USA)

I recently attended the Open Source Summit in Denver (June 23-25th), and here are some of the talks that were presented.

The majority of talks were either about two topics - kernel or AI.
There were over 300 different presentations across the three days.
I have added link to the slides here
All the videos were recorded, but I don’t have a link to them yet.
I also attended the Women’s Tuesday Lunch and it was a great opportunity to meet other colleagues from the entire computer science spectrum.

I will go through some of the talks and their syllabuses. This is not an extensive list, please see linked google drive for all the available slides.

Opening Remarks and Keynote from Jim Zemlin

Tazama
Fraud detection - Tazama is an open-source software solution that provides robust, real-time fraud detection and prevention.

Cloudflare now supports C2PA
C2PA - Digital watermarking your AI content.
The Coalition for Content Provenance and Authenticity (C2PA) addresses the prevalence of misleading information online through the development of technical standards for certifying the source and history (or provenance) of media content. C2PA is a Joint Development Foundation project, formed through an alliance between Adobe, Arm, Intel, Microsoft and Truepic.

PXL_20250623_1642262791920×1446 171 KB

A Deep Dive into eBPF Program Loader

As eBPF continues to revolutionize Linux observability and networking, the complexity of its program loading mechanism has evolved significantly.

This technical deep dive unravels the sophisticated machinery behind eBPF program loading, exploring the intricate interplay between user space loader and Linux kernel verifier. We’ll dissect the eBPF program relocation mechanisms, examine the role of BTF (BPF Type Format) in enabling strong typing and verification capabilities, and analyze the complex choreography of bpf() syscalls that bridge user space and kernel operations. Finally, we will also discuss the security implications and program signing challenges in the loading pipeline.

eBPF Objects are sorted into four categories

Programs

executable bytecode
verified by kernel
attached to kernel hooks
type-specific (XDP, kbrobe, etc.)

Maps

data structures for storage
shared between programs and userspace
types (hash, array, etc.)
persistent across program runs

BTF (format)

metatdata
enables CO-RE and debugging
describe map structure
kernel and userspace type matching

Links

connection between program and hook
manages attachment lifecycle
automatic clean up on process exit
reference counting for sharing

File Descriptor Management

Each object gets a unique file descriptor
From the kernel’s perspective, everything is a file descriptor
FDs enable object sharing and persistence
Objects are destroyed when all FDs are closed
Can be pinned to filesystem for persistence
Challenge: FDs are unpredictable at compile time

The loading process: libbpf acts as a sophisticated translator between static ELF
representation and dynamic kernel objects

There are four phases to using eBPF

Discovery - Parsed object with identified sections, programs, and maps
Resolution - Fully resolved objects ready for kernel loading
Kernel Interaction - Loaded programs with kernel file descriptors
Attachment - Program runs when kernel events trigger the attached hook

CO-RE - compile once, run everywhere
There are several problems here

Kernel structures change across versions
Field offsets differ between kernels
Traditional eBPF programs break on kernel updates

Key Takeaways

Layered Design: libbpf provides sophisticated abstraction over kernel
complexity
Type Safety: BTF enables portable, type-safe program development
Smart Relocation: CO-RE technology enables true portability
Robust Verification: Multi-stage validation ensures program safety
Bottom Line: eBPF program loading is a carefully orchestrated dance between
user space tooling and kernel verification

Rex: Safe & Usable Kernel Extensions in Rust

We present the Rex project (GitHub - rex-rs/rex: Rex is a safe and usable kernel extension framework that allows loading and executing Rust kernel extension programs in the place of eBPF.). Rex is a Linux kernel extension framework that allows extension programs to be written in safe Rust. Rex offers similar safety guarantees to eBPF. Unlike eBPF-based tools like Aya, Rex extensions are not compiled into eBPF bytecode. Rex eliminates the in-kernel verifier – the safety of Rex extensions is built atop language-based safety plus runtime protection. Specifically, the Rex compiler enforces Rex extensions to be written in a subset of safe Rust, and emits native code directly. Rex implements its kernel crate with a safe interface that wraps existing eBPF interface. Rex also employs a lightweight runtime that implements graceful Rust panic handling with resource cleanups, kernel stack checks, and program termination.

Rex provides a more usable and arguably safer alternative to eBPF. The usability advantage comes from the elimination of in-kernel verifiers that are known to reject safe extension programs with cryptic feedback. We also show that Rex’s runtime protection provides stronger safety than eBPF in a few aspects, e.g., protecting kernel stacks from overflowing.

Rust extensions are not compiled into eBPF source code.
There is safety at the cost of usability and it’s difficult to map context between eBPF and the verifier.
This hard to map context comes from the verifier does not understand the compiler.
The workaround is to use the keyword volatile
This gap is known as the language-verifier gap.
Other known tools - Cilium, Aya, Katran
How do we ensure safety?

runtime safety checks
language based safety
extended type safety
safe exception handling (resource cleanup)
Rex stands for: Rex: Safe, usable Rust kernel extensions

Rex enforces extensions to access kernel memory safely

Reducing the Risk of Source Tampering with SLSA (salsa)

In 2023 Supply-chain Levels for Software Artifacts (SLSA) was released. It provided a framework for protecting software from tampering within the CI/CD workflow from source to publication. Now it’s nearing completion of the SLSA Source Track which brings a similar level of assurance to the management of source code.

The Source Track addresses the threat of tampering with source code within the repository and allows malicious changes to source to be attributed to the actors that introduced those changes. In addition, it provides a framework for recording additional results about source revisions such as if a code review was performed or if the source was analyzed by SAST tools.

We’ll cover how this track can prevent attacks like the 2021 attack against PHP where malicious commits were added to the PHP repository and how it can be used to ensure additional controls (like code review) are implemented to protect against attacks like the recent one against xz. Finally we’ll discuss how the source track can be implemented in existing source control systems by examining a proof-of-concept that enables Source Level 3 without specialized support from the source control platform.

Why do we trust their intent? Why do we trust their process?
SLSA safeguards artifact integrity across any software supply chain.
Uses “provenance” and other attestations to enable verification throughout the SDLC.
The three types of threat models: Source, Build, and Usage Threats.

Is the software the producer intends to create
What actually gets delivered to the consumer?
Could the source/build/delivery/dependencies have been tampered with?

The Source Track

Attacks on source code include adding malicious behavior, merging ‘unreviewable’ changes (binaries)
Hiding malicious commits
Abusing tags to immediately impact users

How to Resolve?

Use a versioned control system
History retention and controls for protected branches and tags
Create signed ‘provenance’ for each new revision
Require review for each change
Code reviews agreed upon by at least two trusted people

gittuf - GitHub - gittuf/gittuf: A security layer for Git repositories
Verifiable source control policies with any git host.

Regression Testing in Boot-Time Performance in the Linux Kernel

There are numerous tools to measure boot-time performance of Linux. However, there is no standard regression test of boot performance for Linux. This is due to a number of factors, including disparities in system performance, different requirements for quickly-needed functionality, and differences in boot-loader, kernel and user-space configuration. In this session Tim will present a boot-time regression test that utilizes a collection of reference value data files for different platforms, kernel versions and configurations. A meta-data matching system is used to select an appropriate reference data file. Boot time data (including initcall durations, and the durations of pre-selected boot operations) is compared with reference values, in order to report regressions in boot-time duration for specific elements of the boot sequence. The upstream status of this effort, along with the test and supporting tools, as well as issues found with this approach, will be discussed.

Why is there no upstream boot-time regression test?
All other boot-time features and tools are instrumentation, inspection or
visualization tools
They are NOT tests that yield a pass/fail result to be acted upon
In other words, boot-time testing is left as an exercise for the user (human)

Not aware of any automated performance tests of any kind in the
upstream Linux kernel
Boot-time code paths and durations vary widely from machine to machine
Different items that need to come up quickly at different times such as
camera, control bus, etc. Is similar to a “workload” test in other benchmark tests - its unique to
the thing you want to test. No single test can capture all use cases.
Not all boot-time delays are within kernel scope
The kernel boot-time blind spot
portion of the kernel boot occurs before clock and timer initialization
on average about 60-150 printk message emitted with 0.000 timestamps
representing about 100-400 milliseconds of boot time
self-instrumentation not being available in time to provide useful data

A Proposed Solution

An upstream test
Reference values
Keep reference values separate from test
Store them somewhere outside of upstream
Automatically find a useful reference value file
Automatic Testing
Report to Maintainer, and/or contributor of change that caused regression

Personal boot-time regression test

Simple steps:

Measure values
Compare with Reference values
Report regressions (not done yet)
Separate programs:
grab-boot-data.sh – for data collection
boot-time-regression-test.py – for comparing metrics with reference values
find-matching-ref-value.py – for automatically detecting a ref-value file

Reporting Regressions

deciding what constitutes a regression a.k.a. Pass/Fail criteria
options such as if delta of result > reference value ±5% or > 30 microseconds, too many false positives to weed out small changes
adding support for configurable pass/fail criteria
Issues found
Identifying boot regions with printks is hard
no clear start or end to regions
more printks equals slower boot
timestamp blind spots of 0s is worthless data
requires capturing data from a kernel booted with ‘initcall_debug’
the test framework might not support modifying kernel command line parameters

Next steps

put into git hub repo
gather boot-time data from more devices and platforms over more kernel versions
needs to automate reporting
better analysis of deferred probes
use Unified Boot Log

Efficient on-device core dump processing for IoT devices - A Rusty Implementation from Memfault

Embedded Linux devices operate in constrained environments with limited storage, bandwidth, and connectivity. Traditional core dumps can be quite large, making it impractical for some of the more constrained embedded systems. Over the past year, we’ve tackled this challenge head-on—optimizing Linux core dumps directly on the device to reduce size, protect privacy, and enable better debugging for IoT developers.

What We’ll Cover:
Inside ELF Core Dumps – A look at the ELF structure and how it applies to core dumps.

On-Device Optimization – How we reduced core dump size by capturing only the first N bytes of each stack, minimizing storage and bandwidth impact.

Privacy-Preserving Debugging – How our custom built (in Rust!) on-device stack unwinder hooks into the core handler, and reduces a coredump to a set of PCs per frame to save space and prevent potential PII from leaking.

Scaling to Millions of Coredumps – Lessons learned from parsing an unprecedented volume of core dumps with Rust.

A Linux coredump represents a snapshot of the crashing process
memory.

Written as an ELF file
Can be loaded into programs like GDB
Inspects the state of the process at the time of crash

Devices have limited storage space.
Devices maybe on a metered connection (LTE)
Collecting crashes on millions of devices
Inconsistent connection

Stages of coredump collection

Normal core pass through
Stack Only
On-Device Unwind

Configure kernel parameters set

CONFIG_COREDUMP=y
CONFDIG_COREDUMP_DEFAULT_ELF_HEADERS=y

Two main types

PT_NOTE - metadata about the process
PT_LOAD - memory segments (stack, heap, etc.)

Processing Steps:

read all program headers into memory
save all PT_NOTE segments
stream PT_LOAD segments from /proc/<pid>/mem
add custom metadata note
write modified ELF core dump

Memfault’s benefits of doing metadata injection

device identification and versioning
advanced processing
memory efficiency - streaming prevents large allocation blocks
compatibility - standard ELF format works with existing tools

Core dumps can be quite large. Processes running with many threads.
Large memory allocations, a problem for embedded devices with limited storage.

Why Rust?

memory safety
extensive ecosystem
ergonomics
cause the cool kids are doing it

Solution - Use only Essentials

stack memory
debugger info (frame data/dynamic info)

Requirements

limit each stack to N bytes
remove heap completely
capture metadata needed for debuggers

Lost Capabilities

no heap values
limited stack depth

Required for each mapped file

ELF Header
All program headers
Build ID note

Results
Traditional core dump with some functions
2.6M - very large for an embedded device

Optimized core dump
75KB - 35x size reduced!!

Impact

Can store multiple core dumps in space of original size
significant savings on constrained devices
full debug capabilities
consistently dramatic
on-device unwinding
- privacy - no sensitive customer data leaves device
- size - even more size reduce than previous core dumps

What’s left?
PC (program counter) for each frame
Symbolic information for each binary

GNU build ID
compile-time vs runtime offset (ASLR)
file path
PC range for each function

The ASLR Challenge

Address Space Layout Randomization randomizes load addresses
• Security feature prevents exploitation
• Compile-time addresses ≠ Runtime addresses
• Need mapping between the two
Leverages .eh_frame and addr2line for local stack
unwinding

Repeat process for each address in the stack

Identify which binary contains the address
Calculate the relative address
Resolve symbols with addr2line
Build the complete call stack

Result: Efficient, small size, privacy-preserving crash capture for embedded
Linux IoT devices.

Cheers,
Heather Lemon