LSN-0107-1

bromer · November 7, 2024, 2:17pm

Linux kernel vulnerabilities

A security issue affects these releases of Ubuntu and its derivatives:

Ubuntu 20.04 LTS
Ubuntu 18.04 LTS
Ubuntu 16.04 LTS
Ubuntu 22.04 LTS
Ubuntu 14.04 LTS

Summary

Several security issues were fixed in the kernel.

Software Description

linux - Linux kernel
linux-aws - Linux kernel for Amazon Web Services (AWS) systems
linux-azure - Linux kernel for Microsoft Azure Cloud systems
linux-gcp - Linux kernel for Google Cloud Platform (GCP) systems
linux-gke - Linux kernel for Google Container Engine (GKE) systems
linux-gkeop - Linux kernel for Google Container Engine (GKE) systems
linux-ibm - Linux kernel for IBM cloud systems
linux-oracle - Linux kernel for Oracle Cloud systems

Details

In the Linux kernel, the following vulnerability has been
resolved: inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.
If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released. This affects skb fragments
reassembled via netfilter or similar modules, e.g. openvswitch or ct_act.c,
when run as part of tx pipeline. Eric Dumazet made an initial analysis of
this bug. Quoting Eric: Calling ip_defrag() in output path is also implying
skb_orphan(), which is buggy because output path relies on sk not
disappearing. A relevant old patch about the issue was : 8282f27449bf
(“inet: frag: Always orphan skbs inside ip_defrag()”) […
net/ipv4/ip_output.c depends on skb->sk being set, and probably to an inet
socket, not an arbitrary one. If we orphan the packet in ipvlan, then
downstream things like FQ packet scheduler will not work properly. We need
to change ip_defrag() to only use skb_orphan() when really needed, ie
whenever frag_list is going to be used. Eric suggested to stash sk in
fragment queue and made an initial patch. However there is a problem with
this: If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree. IOW,
we have no choice but to fix up sk_wmem accouting to reflect the fully
reassembled skb, else wmem will underflow. This change moves the orphan
down into the core, to last possible moment. As ip_defrag_offset is aliased
with sk_buff->sk member, we must move the offset into the FRAG_CB, else
skb->sk gets clobbered. This allows to delay the orphaning long enough to
learn if the skb has to be queued or if the skb is completing the reasm
queue. In the former case, things work as before, skb is orphaned. This is
safe because skb gets queued/stolen and won’t continue past reasm engine.
In the latter case, we will steal the skb->sk reference, reattach it to the
head skb, and fix up wmem accouting when inet_frag inflates truesize.
(CVE-2024-26921)

In the Linux kernel, the following vulnerability has been
resolved: af_unix: Fix garbage collector racing against connect() Garbage
collector does not take into account the risk of embryo getting enqueued
during the garbage collection. If such embryo has a peer that carries
SCM_RIGHTS, two consecutive passes of scan_children() may see a different
set of children. Leading to an incorrectly elevated inflight count, and
then a dangling pointer within the gc_inflight_list. sockets are
AF_UNIX/SOCK_STREAM S is an unconnected socket L is a listening in-flight
socket bound to addr, not in fdtable V’s fd will be passed via sendmsg(),
gets inflight count bumped connect(S, addr) sendmsg(S, [V]); close(V)
__unix_gc() ---------------- ------------------------- ----------- NS =
unix_create1() skb1 = sock_wmalloc(NS) L = unix_find_other(addr)
unix_state_lock(L) unix_peer(S) = NS // V count=1 inflight=0 NS =
unix_peer(S) skb2 = sock_alloc() skb_queue_tail(NS, skb2[V]) // V became
in-flight // V count=2 inflight=1 close(V) // V count=1 inflight=1 // GC
candidate condition met for u in gc_inflight_list: if (total_refs ==
inflight_refs) add u to gc_candidates // gc_candidates={L, V} for u in
gc_candidates: scan_children(u, dec_inflight) // embryo (skb1) was not //
reachable from L yet, so V’s // inflight remains unchanged
__skb_queue_tail(L, skb1) unix_state_unlock(L) for u in gc_candidates: if
(u.inflight) scan_children(u, inc_inflight_move_tail) // V count=1
inflight=2 (!) If there is a GC-candidate listening socket, lock/unlock its
state. This makes GC wait until the end of any ongoing connect() to that
socket. After flipping the lock, a possibly SCM-laden embryo is already
enqueued. And if there is another embryo coming, it can not possibly carry
SCM_RIGHTS. At this point, unix_inflight() can not happen because
unix_gc_lock is already taken. Inflight graph remains unaffected.
(CVE-2024-26923)

In the Linux kernel, the following vulnerability has been
resolved: mm: swap: fix race between free_swap_and_cache() and swapoff()
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map. This is a
theoretical problem and I haven’t been able to provoke it from a test case.
But there has been agreement based on code review that this is possible
(see link below). Fix it by using get_swap_device()/put_swap_device(),
which will stall swapoff(). There was an extra check in _swap_info_get() to
confirm that the swap entry was not free. This isn’t present in
get_swap_device() because it doesn’t make sense in general due to the race
between getting the reference and swapoff. So I’ve added an equivalent
check directly in free_swap_and_cache(). Details of how to provoke one
possible issue (thanks to David Hildenbrand for deriving this): --8<-----
__swap_entry_free() might be the last user and result in “count ==
SWAP_HAS_CACHE”. swapoff->try_to_unuse() will stop as soon as soon as
si->inuse_pages==0. So the question is: could someone reclaim the folio and
turn si->inuse_pages==0, before we completed
swap_page_trans_huge_swapped(). Imagine the following: 2 MiB folio in the
swapcache. Only 2 subpages are still references by swap entries. Process 1
still references subpage 0 via swap entry. Process 2 still references
subpage 1 via swap entry. Process 1 quits. Calls free_swap_and_cache(). →
count == SWAP_HAS_CACHE [then, preempted in the hypervisor etc.] Process 2
quits. Calls free_swap_and_cache(). → count == SWAP_HAS_CACHE Process 2
goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()-> … WRITE_ONCE(si->inuse_pages,
si->inuse_pages - nr_entries); What stops swapoff to succeed after process
2 reclaimed the swap cache but before process1 finished its call to
swap_page_trans_huge_swapped()? --8<----- (CVE-2024-26960)

In the Linux kernel, the following vulnerability has been
resolved: Bluetooth: Fix use-after-free bugs caused by sco_sock_timeout
When the sco connection is established and then, the sco socket is
releasing, timeout_work will be scheduled to judge whether the sco
disconnection is timeout. The sock will be deallocated later, but it is
dereferenced again in sco_sock_timeout. As a result, the use-after-free
bugs will happen. The root cause is shown below: Cleanup Thread | Worker
Thread sco_sock_release | sco_sock_close | __sco_sock_close |
sco_sock_set_timer | schedule_delayed_work | sco_sock_kill | (wait a time)
sock_put(sk) //FREE | sco_sock_timeout | sock_hold(sk) //USE The KASAN
report triggered by POC is shown below: [ 95.890016
================================================================== [
95.890496] BUG: KASAN: slab-use-after-free in sco_sock_timeout+0x5e/0x1c0 [
95.890755] Write of size 4 at addr ffff88800c388080 by task kworker/0:0/7
… [ 95.890755] Workqueue: events sco_sock_timeout [ 95.890755] Call
Trace: [ 95.890755] [ 95.890755] dump_stack_lvl+0x45/0x110 [
95.890755] print_address_description+0x78/0x390 [ 95.890755
print_report+0x11b/0x250 [ 95.890755] ? __virt_addr_valid+0xbe/0xf0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_report+0x139/0x170 [ 95.890755] ? update_load_avg+0xe5/0x9f0 [
95.890755] ? sco_sock_timeout+0x5e/0x1c0 [ 95.890755
kasan_check_range+0x2c3/0x2e0 [ 95.890755] sco_sock_timeout+0x5e/0x1c0 [
95.890755] process_one_work+0x561/0xc50 [ 95.890755
worker_thread+0xab2/0x13c0 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] kthread+0x279/0x300 [ 95.890755] ? pr_cont_work+0x490/0x490 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755] ret_from_fork+0x34/0x60 [
95.890755] ? kthread_blkcg+0xa0/0xa0 [ 95.890755
ret_from_fork_asm+0x11/0x20 [ 95.890755] [ 95.890755] [ 95.890755
Allocated by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [ 95.890755
__kasan_kmalloc+0x86/0x90 [ 95.890755] __kmalloc+0x17f/0x360 [ 95.890755
sk_prot_alloc+0xe1/0x1a0 [ 95.890755] sk_alloc+0x31/0x4e0 [ 95.890755
bt_sock_alloc+0x2b/0x2a0 [ 95.890755] sco_sock_create+0xad/0x320 [
95.890755] bt_sock_create+0x145/0x320 [ 95.890755
__sock_create+0x2e1/0x650 [ 95.890755] __sys_socket+0xd0/0x280 [ 95.890755
__x64_sys_socket+0x75/0x80 [ 95.890755] do_syscall_64+0xc4/0x1b0 [
95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] Freed by task 506: [ 95.890755] kasan_save_track+0x3f/0x70 [
95.890755] kasan_save_free_info+0x40/0x50 [ 95.890755
poison_slab_object+0x118/0x180 [ 95.890755] __kasan_slab_free+0x12/0x30 [
95.890755] kfree+0xb2/0x240 [ 95.890755] __sk_destruct+0x317/0x410 [
95.890755] sco_sock_release+0x232/0x280 [ 95.890755] sock_close+0xb2/0x210
[ 95.890755] __fput+0x37f/0x770 [ 95.890755] task_work_run+0x1ae/0x210 [
95.890755] get_signal+0xe17/0xf70 [ 95.890755
arch_do_signal_or_restart+0x3f/0x520 [ 95.890755
syscall_exit_to_user_mode+0x55/0x120 [ 95.890755] do_syscall_64+0xd1/0x1b0
[ 95.890755] entry_SYSCALL_64_after_hwframe+0x67/0x6f [ 95.890755] [
95.890755] The buggy address belongs to the object at ffff88800c388000 [
95.890755] which belongs to the cache kmalloc-1k of size 1024 [ 95.890755
The buggy address is located 128 bytes inside of [ 95.890755] freed
1024-byte region [ffff88800c388000, ffff88800c388400) [ 95.890755] [
95.890755] The buggy address belongs to the physical page: [ 95.890755
page: refcount:1 mapcount:0 mapping:0000000000000000
index:0xffff88800c38a800 pfn:0xc388 [ 95.890755] head: order:3
entire_mapcount:0 nr_pages_mapped:0 pincount:0 [ 95.890755] ano
—truncated— (CVE-2024-27398)

In the Linux kernel, the following vulnerability has been
resolved: watchdog: cpu5wdt.c: Fix use-after-free bug caused by
cpu5wdt_trigger When the cpu5wdt module is removing, the origin code uses
del_timer() to de-activate the timer. If the timer handler is running,
del_timer() could not stop it and will return directly. If the port region
is released by release_region() and then the timer handler
cpu5wdt_trigger() calls outb() to write into the region that is released,
the use-after-free bug will happen. Change del_timer() to
timer_shutdown_sync() in order that the timer handler could be finished
before the port region is released. (CVE-2024-38630)

Update instructions

The problem can be corrected by updating your kernel livepatch to the following
versions:

Ubuntu 20.04 LTS

aws - 107.1
aws - 107.2
azure - 107.1
azure - 107.2
gcp - 107.1
gcp - 107.2
generic - 107.1
generic - 107.2
gke - 107.1
gkeop - 107.1
gkeop - 107.2
ibm - 107.1
ibm - 107.2
lowlatency - 107.1
lowlatency - 107.2
oracle - 107.1
oracle - 107.2

Ubuntu 18.04 LTS

aws - 107.1
aws - 107.2
azure - 107.1
azure - 107.2
gcp - 107.1
gcp - 107.2
generic - 107.1
generic - 107.2
lowlatency - 107.1
lowlatency - 107.2
oracle - 107.1
oracle - 107.2

Ubuntu 16.04 LTS

aws - 107.1
aws - 107.2
azure - 107.1
azure - 107.2
gcp - 107.1
gcp - 107.2
generic - 107.1
generic - 107.2
lowlatency - 107.1
lowlatency - 107.2

Ubuntu 22.04 LTS

aws - 107.1
aws - 107.2
azure - 107.1
azure - 107.2
gcp - 107.1
gcp - 107.2
generic - 107.1
generic - 107.2
gke - 107.1
gke - 107.2
ibm - 107.1
ibm - 107.2
oracle - 107.1

Ubuntu 14.04 LTS

generic - 107.1
lowlatency - 107.1

Support Information

Livepatches for supported LTS kernels will receive upgrades for
a period of up to 13 months after the build date of the kernel.

Livepatches for supported HWE kernels which are not based on
an LTS kernel version will receive upgrades for a period of
up to 9 months after the build date of the kernel, or until the end
of support for that kernel’s non-LTS distro release version,
whichever is sooner.