Raspberry Pi 4 issues - Voluntary context switch within RCU read-side critical section!

I decided to pull a spare Raspberry Pi 4 (a failed desktop experiment) into my LXD cluster. General shape of the cluster is 2 Xeon P510 workstations (64GB RAM) and this little Pi (4GB RAM, external 500G SATA HDD connected via USB 3.0).

It was stable for quite a while, but this morning, I was having some issues with ZFS and networking as I developed, so thought updating all the hosts and rebooting them all might be useful. So I did. All are on kernel 5.15.0 of some variant.

Now, the Pi4 doesn’t always reconnect to the cluster, and when it does, any lxc commands on the node tend to hang (likely showing the output but never returning to the terminal, so I need to kill the SSH connection via ~.)

When I look in dmesg, I see stuff like this:

[  188.892179] audit: type=1400 audit(1732046755.264:99): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxd-p1_</var/snap/lxd/common/lxd>" name="/run/systemd/unit-root/dev/tty" pid=2464 comm="(resolved)" flags="rw, nosuid, remount, bind"
[  191.481830] ------------[ cut here ]------------
[  191.481849] Voluntary context switch within RCU read-side critical section!
[  191.481868] WARNING: CPU: 1 PID: 2464 at kernel/rcu/tree_plugin.h:316 rcu_note_context_switch+0x2ac/0x320
[  191.481901] Modules linked in: veth nft_masq nft_chain_nat vxlan ip6_udp_tunnel udp_tunnel bridge stp llc nvme_tcp nvme_fabrics ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter nf_tables nfnetlink vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb cmac algif_hash algif_skcipher af_alg bnep hci_uart btqca btrtl btbcm btintel binfmt_misc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) btsdio bluetooth ecdh_generic ecc bcm2835_codec(CE) bcm2835_isp(CE) bcm2835_v4l2(CE) brcmfmac bcm2835_mmal_vchiq(CE) v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig brcmutil videobuf2_memops videobuf2_v4l2 videobuf2_common snd_bcm2835(CE) videodev cfg80211 snd_pcm snd_timer snd raspberrypi_hwmon vc_sm_cma(CE) mc bcm2835_gpiomem rpivid_mem uio_pdrv_genirq nvmem_rmem uio sch_fq_codel drm
[  191.482177]  ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear uas usb_storage spidev crct10dif_ce dwc2 roles spi_bcm2835 udc_core i2c_bcm2835 xhci_pci xhci_pci_renesas phy_generic aes_arm64
[  191.482265] CPU: 1 PID: 2464 Comm: systemd-resolve Tainted: P         C OE     5.15.0-1065-raspi #68-Ubuntu
[  191.482273] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
[  191.482277] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  191.482284] pc : rcu_note_context_switch+0x2ac/0x320
[  191.482291] lr : rcu_note_context_switch+0x2ac/0x320
[  191.482296] sp : ffff800009f8baf0
[  191.482300] x29: ffff800009f8baf0 x28: ffff165b8bebf660 x27: 0000000000000030
[  191.482312] x26: ffffd2c9687eb8d8 x25: 0000000000000000 x24: 0000000000000000
[  191.482323] x23: ffffd2c96953a996 x22: 0000000000000000 x21: ffff165b83ac3000
[  191.482333] x20: ffff165c3b7a00c0 x19: ffffd2c968e670c0 x18: 0000000000000000
[  191.482343] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  191.482353] x14: 0000000000000000 x13: 216e6f6974636573 x12: 206c616369746972
[  191.482363] x11: 6320656469732d64 x10: 6165722055435220 x9 : ffffd2c967b22048
[  191.482373] x8 : 6863746977732074 x7 : 0000000000000001 x6 : 0000000000000001
[  191.482382] x5 : ffff165c3b78e9c8 x4 : 0000000000000000 x3 : 0000000000000027
[  191.482392] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff165b83ac3000
[  191.482403] Call trace:
[  191.482406]  rcu_note_context_switch+0x2ac/0x320
[  191.482412]  __schedule+0x140/0x8b0
[  191.482420]  schedule+0x38/0x90
[  191.482426]  schedule_hrtimeout_range_clock+0x1e4/0x200
[  191.482433]  schedule_hrtimeout_range+0x1c/0x30
[  191.482439]  ep_poll+0x320/0x350
[  191.482449]  do_epoll_wait+0xe4/0x130
[  191.482456]  do_compat_epoll_pwait.part.0+0x1c/0xb0
[  191.482463]  __arm64_sys_epoll_pwait+0x78/0x120
[  191.482470]  invoke_syscall+0x50/0x120
[  191.482479]  el0_svc_common.constprop.0+0x180/0x1a0
[  191.482487]  do_el0_svc+0x30/0xb0
[  191.482494]  el0_svc+0x4c/0x170
[  191.482502]  el0t_64_sync_handler+0xa4/0x130
[  191.482507]  el0t_64_sync+0x1a4/0x1a8
[  191.482513] ---[ end trace 3ad09cbed3701a7a ]---
[  224.131121] kauditd_printk_skb: 34 callbacks suppressed
[  224.131138] audit: type=1400 audit(1732046790.504:134): apparmor="DENIED" operation="ptrace" namespace="root//lxd-p1_<var-snap-lxd-common-lxd>" profile="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=2198 comm="systemd-journal" requested_mask="readby" denied_mask="readby" peer="unconfined"
[  251.480654] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  251.486869] rcu: 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P2464/1:b..l

I did try downgrading from 5.15.0-1065-raspi to the prior kernel 5.15.0-1061-raspi… but I failed at that. (Just in case it was the update that brought in the instability.)

Wondering if anyone has ideas on this?

Current versions:

All are at the same version of Ubuntu:

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy

$ zfs version
zfs-2.1.5-1ubuntu6~22.04.4
zfs-kmod-2.1.5-1ubuntu6~22.04.4

$ lxc version
Client version: 5.21.2 LTS
Server version: 5.21.2 LTS

(The Pi never returned from the lxc version command.)

Thanks!
-Rob

An update: I did get the kernel downgraded to 5.15.0-1061-raspi and still see the rcu_preempt messages with dmesg.