Suddenly getting 404 on OIDC callbacks?

Out of nowhere, when I try and log into the LXD web interface using OpenID Connect with Keycloak, I get a hard failure on the callback from Keycloak to the backend, getting a 404 Not Found on the URL path /oidc/login.

I have zero idea why this has started, because the OIDC config hasn’t changed. Did LXD take a dump on me, and if so how do I fix the 404 on the Callback URL?

Using snapped LXD, 6.7 on an Ubuntu 24.04 host.

Hi @teward,

I’m unable to reproduce this behaviour on 6.7 LXD with 24.04. Can you tell me more about your setup?

  1. Is your LXD standalone or clustered?
  2. Do you have a load balancer or proxy that is forwarding requests to LXD?

Thanks, Mark.

Standalone.

And this is local on the machine itself. Direct to system with no load balancers or anything.

I suspect that the snap has refreshed and restarted LXD, but LXD was unable to reinstantiate the OIDC verifier.

Can you please inspect the logs (either via journalctl -u snap.lxd.daemon or via /var/snap/lxd/common/lxd/logs/lxd.log) and check for a warning that reads “Failed setting up OIDC verifier”.

Thanks.

There it is, it says “Failed to setup OIDC verifier” with a note “Failed to ensure verifier’s configuration: Failed to get OIDC relaying party: OID Provider Configuration Discovery has failed”. I’m wondering if this is due to the OIDC for the local LXD instance being inside an LXD container and the lookup fails (with ‘device or resource busy’). I’m curious why LXD doesn’t retry on this type of failure after a while, or occasionally recheck.

I have a cloud instance of Keycloak I’ll direct authentication to in the interim, and I may redeploy my local Keycloak as VM instead with virt-manager (I don’t use LXD to manage the VMs on this system) so that it can start up before LXD does.

We recently added a sanity check to perform OIDC discovery when all required configuration keys are set. This is to help surface configuration errors early (at time of configuration, and not time of use - see lxd: Validate OIDC issuer configuration during update by kwonkwonn · Pull Request #17098 · canonical/lxd · GitHub).

I think at start up we can ignore a discovery error with the assumption it was previously configured correctly. This would fix your issue. I’ve created an issue for it here: Ignore OIDC discovery error on start up · Issue #17907 · canonical/lxd · GitHub

Thank you for reporting. Generally we expect the central IdP to be deployed outside of the LXD cluster, so this is something we missed.

1 Like

Interesting that this only started failing in the last month, not since the December commit.

It does definitely sound like a very specific use-case bug, but if you think about it, if someone uses LXD as their container/VM infrastructure instead of direct QEMU or VMware or such, and the OIDC is in there, you run into this issue. It’s a very narrow case, sure, but it does indeed sound like a bug.

I have a separate issue with I think Firefox and LXD not recognizing the imported certificate when OIDC is configured, but I’m going to do some research on that first.

Please let me know if/when that bug (the new one) is addressed/fixed.

Likely cause is that you’re tracking 6/stable (if you’re tracking latest/stable we recommend changing to 6/stable). The stable channel gets feature releases roughly every quarter, and doesn’t get latest commits (that’s latest/edge).

I’ll post an update here when the fix lands. It may be quite a while before it appears in a feature release, so in the meantime you may need to refresh the OIDC configuration when LXD restarts.

Yeah, I’m tracking latest/stable

(system_python) teward@tau-volantis:~$ sudo snap list lxd
Name  Version      Rev    Tracking       Publisher   Notes
lxd   6.7-12e2019  38450  latest/stable  canonical✓  -

Not sure if downgrading will break things, but I’ve just moved the OIDC to external for now.

Now I’m debugging issues with local TLS auth as UI fallback. (Firefox isn’t prompting for the certificate even after imported/restarted) (mTLS auth failures was because Firefox at some point was set to NOT serve client certs for local, which is why it was failing. I didn’t set that so no idea how that happened…)

Note that latest/stable and 6/stable both have the same version, so the issues are ‘greater’ than just it being latest/stable vs. 6/stable at the moment, if the issue with the OIDC bug here isn’t pushed to 6/stable then I miss it until June (next quarter)

6/stable is equivalent to latest/stable, but please don’t use latest/stable anyway, see Time to pick a snap channel

@markylaing once the fix lands in edge please can you prepare a cherry-pick into 6/candidate and we’ll roll out an interim fix. Ta

2 Likes

What I’ll do is just offload to a production system myself of Keycloak. That will help until the fix is cherrypicked to candidate