Open Source Summit from The Linux Foundation June 23-25 (Denver, Colorado USA)

Event Report: Open Source Summit from The Linux Foundation (Denver, Colorado USA)

I recently attended the Open Source Summit in Denver (June 23-25th), and here are some of the talks that were presented.

The majority of talks were either about two topics - kernel or AI.
There were over 300 different presentations across the three days.
I have added link to the slides here
All the videos were recorded, but I don’t have a link to them yet.
I also attended the Women’s Tuesday Lunch and it was a great opportunity to meet other colleagues from the entire computer science spectrum.

I will go through some of the talks and their syllabuses. This is not an extensive list, please see linked google drive for all the available slides.

Opening Remarks and Keynote from Jim Zemlin

  • Tazama
    Fraud detection - Tazama is an open-source software solution that provides robust, real-time fraud detection and prevention.
  • Cloudflare now supports C2PA
    C2PA - Digital watermarking your AI content.
    The Coalition for Content Provenance and Authenticity (C2PA) addresses the prevalence of misleading information online through the development of technical standards for certifying the source and history (or provenance) of media content. C2PA is a Joint Development Foundation project, formed through an alliance between Adobe, Arm, Intel, Microsoft and Truepic.

A Deep Dive into eBPF Program Loader

As eBPF continues to revolutionize Linux observability and networking, the complexity of its program loading mechanism has evolved significantly.

This technical deep dive unravels the sophisticated machinery behind eBPF program loading, exploring the intricate interplay between user space loader and Linux kernel verifier. We’ll dissect the eBPF program relocation mechanisms, examine the role of BTF (BPF Type Format) in enabling strong typing and verification capabilities, and analyze the complex choreography of bpf() syscalls that bridge user space and kernel operations. Finally, we will also discuss the security implications and program signing challenges in the loading pipeline.

eBPF Objects are sorted into four categories

  1. Programs
  • executable bytecode
  • verified by kernel
  • attached to kernel hooks
  • type-specific (XDP, kbrobe, etc.)
  1. Maps
  • data structures for storage
  • shared between programs and userspace
  • types (hash, array, etc.)
  • persistent across program runs
  1. BTF (format)
  • metatdata
  • enables CO-RE and debugging
  • describe map structure
  • kernel and userspace type matching
  1. Links
  • connection between program and hook
  • manages attachment lifecycle
  • automatic clean up on process exit
  • reference counting for sharing

File Descriptor Management

  • Each object gets a unique file descriptor
  • From the kernel’s perspective, everything is a file descriptor
  • FDs enable object sharing and persistence
  • Objects are destroyed when all FDs are closed
  • Can be pinned to filesystem for persistence
    Challenge: FDs are unpredictable at compile time

The loading process: libbpf acts as a sophisticated translator between static ELF
representation and dynamic kernel objects

There are four phases to using eBPF

  1. Discovery - Parsed object with identified sections, programs, and maps
  2. Resolution - Fully resolved objects ready for kernel loading
  3. Kernel Interaction - Loaded programs with kernel file descriptors
  4. Attachment - Program runs when kernel events trigger the attached hook

CO-RE - compile once, run everywhere
There are several problems here

  • Kernel structures change across versions
  • Field offsets differ between kernels
  • Traditional eBPF programs break on kernel updates

Key Takeaways

  1. Layered Design: libbpf provides sophisticated abstraction over kernel
    complexity
  2. Type Safety: BTF enables portable, type-safe program development
  3. Smart Relocation: CO-RE technology enables true portability
  4. Robust Verification: Multi-stage validation ensures program safety
    Bottom Line: eBPF program loading is a carefully orchestrated dance between
    user space tooling and kernel verification

Rex: Safe & Usable Kernel Extensions in Rust

We present the Rex project (GitHub - rex-rs/rex: Rex is a safe and usable kernel extension framework that allows loading and executing Rust kernel extension programs in the place of eBPF.). Rex is a Linux kernel extension framework that allows extension programs to be written in safe Rust. Rex offers similar safety guarantees to eBPF. Unlike eBPF-based tools like Aya, Rex extensions are not compiled into eBPF bytecode. Rex eliminates the in-kernel verifier – the safety of Rex extensions is built atop language-based safety plus runtime protection. Specifically, the Rex compiler enforces Rex extensions to be written in a subset of safe Rust, and emits native code directly. Rex implements its kernel crate with a safe interface that wraps existing eBPF interface. Rex also employs a lightweight runtime that implements graceful Rust panic handling with resource cleanups, kernel stack checks, and program termination.

Rex provides a more usable and arguably safer alternative to eBPF. The usability advantage comes from the elimination of in-kernel verifiers that are known to reject safe extension programs with cryptic feedback. We also show that Rex’s runtime protection provides stronger safety than eBPF in a few aspects, e.g., protecting kernel stacks from overflowing.

Rust extensions are not compiled into eBPF source code.
There is safety at the cost of usability and it’s difficult to map context between eBPF and the verifier.
This hard to map context comes from the verifier does not understand the compiler.
The workaround is to use the keyword volatile
This gap is known as the language-verifier gap.
Other known tools - Cilium, Aya, Katran
How do we ensure safety?

  • runtime safety checks
  • language based safety
  • extended type safety
  • safe exception handling (resource cleanup)
    Rex stands for: Rex: Safe, usable Rust kernel extensions

Rex enforces extensions to access kernel memory safely


Reducing the Risk of Source Tampering with SLSA (salsa)

In 2023 Supply-chain Levels for Software Artifacts (SLSA) was released. It provided a framework for protecting software from tampering within the CI/CD workflow from source to publication. Now it’s nearing completion of the SLSA Source Track which brings a similar level of assurance to the management of source code.

The Source Track addresses the threat of tampering with source code within the repository and allows malicious changes to source to be attributed to the actors that introduced those changes. In addition, it provides a framework for recording additional results about source revisions such as if a code review was performed or if the source was analyzed by SAST tools.

We’ll cover how this track can prevent attacks like the 2021 attack against PHP where malicious commits were added to the PHP repository and how it can be used to ensure additional controls (like code review) are implemented to protect against attacks like the recent one against xz. Finally we’ll discuss how the source track can be implemented in existing source control systems by examining a proof-of-concept that enables Source Level 3 without specialized support from the source control platform.

Why do we trust their intent? Why do we trust their process?
SLSA safeguards artifact integrity across any software supply chain.
Uses “provenance” and other attestations to enable verification throughout the SDLC.
The three types of threat models: Source, Build, and Usage Threats.

  • Is the software the producer intends to create
  • What actually gets delivered to the consumer?
  • Could the source/build/delivery/dependencies have been tampered with?

The Source Track

  • Attacks on source code include adding malicious behavior, merging ‘unreviewable’ changes (binaries)
  • Hiding malicious commits
  • Abusing tags to immediately impact users

How to Resolve?

  • Use a versioned control system
  • History retention and controls for protected branches and tags
  • Create signed ‘provenance’ for each new revision
  • Require review for each change
  • Code reviews agreed upon by at least two trusted people

gittuf - GitHub - gittuf/gittuf: A security layer for Git repositories
Verifiable source control policies with any git host.


Regression Testing in Boot-Time Performance in the Linux Kernel

There are numerous tools to measure boot-time performance of Linux. However, there is no standard regression test of boot performance for Linux. This is due to a number of factors, including disparities in system performance, different requirements for quickly-needed functionality, and differences in boot-loader, kernel and user-space configuration. In this session Tim will present a boot-time regression test that utilizes a collection of reference value data files for different platforms, kernel versions and configurations. A meta-data matching system is used to select an appropriate reference data file. Boot time data (including initcall durations, and the durations of pre-selected boot operations) is compared with reference values, in order to report regressions in boot-time duration for specific elements of the boot sequence. The upstream status of this effort, along with the test and supporting tools, as well as issues found with this approach, will be discussed.

Why is there no upstream boot-time regression test?
All other boot-time features and tools are instrumentation, inspection or
visualization tools
They are NOT tests that yield a pass/fail result to be acted upon
In other words, boot-time testing is left as an exercise for the user (human)

  • Not aware of any automated performance tests of any kind in the
    upstream Linux kernel
    Boot-time code paths and durations vary widely from machine to machine
    Different items that need to come up quickly at different times such as
    camera, control bus, etc. Is similar to a “workload” test in other benchmark tests - its unique to
    the thing you want to test. No single test can capture all use cases.
    Not all boot-time delays are within kernel scope
    The kernel boot-time blind spot
  • portion of the kernel boot occurs before clock and timer initialization
  • on average about 60-150 printk message emitted with 0.000 timestamps
    representing about 100-400 milliseconds of boot time
    self-instrumentation not being available in time to provide useful data

A Proposed Solution

  • An upstream test
  • Reference values
  • Keep reference values separate from test
  • Store them somewhere outside of upstream
  • Automatically find a useful reference value file
    Automatic Testing
  • Report to Maintainer, and/or contributor of change that caused regression

Personal boot-time regression test

Simple steps:

  • Measure values
  • Compare with Reference values
  • Report regressions (not done yet)
    Separate programs:
  • grab-boot-data.sh – for data collection
  • boot-time-regression-test.py – for comparing metrics with reference values
  • find-matching-ref-value.py – for automatically detecting a ref-value file

Reporting Regressions

  • deciding what constitutes a regression a.k.a. Pass/Fail criteria
  • options such as if delta of result > reference value ±5% or > 30 microseconds, too many false positives to weed out small changes
  • adding support for configurable pass/fail criteria
    Issues found
  • Identifying boot regions with printks is hard
  • no clear start or end to regions
  • more printks equals slower boot
  • timestamp blind spots of 0s is worthless data
  • requires capturing data from a kernel booted with ‘initcall_debug’
  • the test framework might not support modifying kernel command line parameters

Next steps

  • put into git hub repo
  • gather boot-time data from more devices and platforms over more kernel versions
  • needs to automate reporting
  • better analysis of deferred probes
  • use Unified Boot Log

Efficient on-device core dump processing for IoT devices - A Rusty Implementation from Memfault

Embedded Linux devices operate in constrained environments with limited storage, bandwidth, and connectivity. Traditional core dumps can be quite large, making it impractical for some of the more constrained embedded systems. Over the past year, we’ve tackled this challenge head-on—optimizing Linux core dumps directly on the device to reduce size, protect privacy, and enable better debugging for IoT developers.

What We’ll Cover:
Inside ELF Core Dumps – A look at the ELF structure and how it applies to core dumps.

On-Device Optimization – How we reduced core dump size by capturing only the first N bytes of each stack, minimizing storage and bandwidth impact.

Privacy-Preserving Debugging – How our custom built (in Rust!) on-device stack unwinder hooks into the core handler, and reduces a coredump to a set of PCs per frame to save space and prevent potential PII from leaking.

Scaling to Millions of Coredumps – Lessons learned from parsing an unprecedented volume of core dumps with Rust.

A Linux coredump represents a snapshot of the crashing process
memory.

  • Written as an ELF file
  • Can be loaded into programs like GDB
  • Inspects the state of the process at the time of crash

Devices have limited storage space.
Devices maybe on a metered connection (LTE)
Collecting crashes on millions of devices
Inconsistent connection

Stages of coredump collection

  • Normal core pass through
  • Stack Only
  • On-Device Unwind

Configure kernel parameters set

CONFIG_COREDUMP=y
CONFDIG_COREDUMP_DEFAULT_ELF_HEADERS=y

Two main types

  • PT_NOTE - metadata about the process
  • PT_LOAD - memory segments (stack, heap, etc.)

Processing Steps:

  • read all program headers into memory
  • save all PT_NOTE segments
  • stream PT_LOAD segments from /proc/<pid>/mem
  • add custom metadata note
  • write modified ELF core dump

Memfault’s benefits of doing metadata injection

  • device identification and versioning
  • advanced processing
  • memory efficiency - streaming prevents large allocation blocks
  • compatibility - standard ELF format works with existing tools

Core dumps can be quite large. Processes running with many threads.
Large memory allocations, a problem for embedded devices with limited storage.

Why Rust?

  • memory safety
  • extensive ecosystem
  • ergonomics
  • cause the cool kids are doing it

Solution - Use only Essentials

  • stack memory
  • debugger info (frame data/dynamic info)

Requirements

  • limit each stack to N bytes
  • remove heap completely
  • capture metadata needed for debuggers

Lost Capabilities

  • no heap values
  • limited stack depth

Required for each mapped file

  • ELF Header
  • All program headers
  • Build ID note

Results
Traditional core dump with some functions
2.6M - very large for an embedded device

Optimized core dump
75KB - 35x size reduced!!

Impact

  • Can store multiple core dumps in space of original size
  • significant savings on constrained devices
  • full debug capabilities
  • consistently dramatic
  • on-device unwinding
    • privacy - no sensitive customer data leaves device
    • size - even more size reduce than previous core dumps

What’s left?
PC (program counter) for each frame
Symbolic information for each binary

  • GNU build ID
  • compile-time vs runtime offset (ASLR)
  • file path
  • PC range for each function

The ASLR Challenge

Address Space Layout Randomization randomizes load addresses
• Security feature prevents exploitation
• Compile-time addresses ≠ Runtime addresses
• Need mapping between the two
Leverages .eh_frame and addr2line for local stack
unwinding

Repeat process for each address in the stack

  1. Identify which binary contains the address
  2. Calculate the relative address
  3. Resolve symbols with addr2line
  4. Build the complete call stack

Result: Efficient, small size, privacy-preserving crash capture for embedded
Linux IoT devices.

Cheers,
Heather Lemon

1 Like