Upgrade from 1.21 to 1.24.1 breaks everything

devperson · December 23, 2024, 6:26pm

We tried an upgrade from 1.21 to 1.24.1 and several things have broken. The upgrades off the various charms seemed to work but initially we couldn’t get the dashboard to load. The login prompt was presented but following that the call to /api/info kept timing out. We found that the AMS location was now defaulting to 10.0.3.1. We set the location to the private IP of our AMS unit and now the dashboard presents us with:

Configuration Missing
You need to configure both the AMS REST API and the Anbox Stream Gateway in order to use the dashboard.

It seems several configurations were blown away and AMS is no longer listening on the same ip and port as it used to. What can we do to fix this?

tk-dev · December 23, 2024, 9:10pm

Update to this, it seems the ams config option agent.api.url and location value only effects other things attempting a connection and not the listening address itself.

The following file /var/snap/ams/common/server/settings.yaml on the machine hosting ams begins with the listening address and updating it’s value to the private address of the machine hosting ams allows the cloud-dashboard to load. Streams however will not load when using the web interface, console reports 400 bad request. Is there a more appropriate way to make this change, I suspect there are pieces of the puzzle still missing elsewhere.

Digging through logs it appears ams and likely other units running on that machine were listening on the correct IP address, however after updating an additional lxd interface was created with the ip 10.0.3.1 and services are still listening on that port (turnserver, agent, ams prometheus listener).

gary-wzl77 · December 24, 2024, 1:45pm

Hi,

We’re sorry to hear about the issues encountered after upgrading to 1.24.1 from 1.21. I assume that a regular Anbox Cloud deployment was used rather than Anbox Cloud Appliance. Could you please provide the output of the following commands?

juju status
amc info
juju debug-log --replay

Please ensure all sensitive data is masked before sharing it with us.

Additionally, from the console reporting the 400 Bad Request error, did you notice any other messages from chrome devtools?

(Our response may be delayed due to the holiday season. Thank you for your understanding.)

BR
Gary

gary-wzl77 · December 30, 2024, 3:12am

Thanks for the shared logs via direct message.

It looks like the newly created lxcbr0 interface caused a breakage. It seems that the LXC deb package, which depends on lxc-utils (where the lxc-net service that creates the lxcbr0 interface originates from), was accidentally installed. Could you please check the log files

/var/lo- g/dpkg.log
/var/log/apt/history.log

to see if this lxc or a package that depends on lxc-utils was installed and when, or have you ever tried to install the lxc package by youself?

Due to the newly created lxcbr0 interface, during the charm upgrade, an incorrect address was set in the AMS configuration file (similar to the agent configuration file, as the endpoint of the Anbox stream agent was also affected). From the shared AMS settings.yaml file, it seems there are still something on the plate that the lxcbr0 bridge address remains in use. E.g.

store:

  driver: etcd
  data:
    servers: ['https://10.0.3.1:2379']
    ...
    ...
metrics:

  prometheus:
    listen-address: 10.0.3.1:20002

That could also explain why things started working again after you manually changed the addresses of the AMS and stream agent services.

Meanwhile for the symptom that

It’s worth noting that the value of agent.api.url appears to reset to https://10.0.3.1:8082 when machine 2 (the machine hosting ams) restarts. This occurs when set manually or using the cloud-dashboard.

You can manually run the following command to persist the changed agent URL.

$ amc config set agent.api.url https://10.110.5.152:8082

BR
Gary

tk-dev · December 30, 2024, 9:31pm

Gary,

Thanks for the update, as suspected packages with lxc dependencies were installed. They have since been removed and the interface has been cleaned up. The config files were modified to remove all instances of the 10.0.3.1 interfaces and I’m awaiting the next update to ensure automation deploys the correct address. Otherwise everything is working as intended and the root cause of the issue has been discovered.

Edit
To summarize the following modifications brought us back online:

/var/snap/anbox-stream-agent/common/agent/config.yaml
10.0.3.1:8082 → 10.110.5.152:8082
10.0.3.1:8444 → 10.110.5.152:8444

/var/snap/ams/common/server/settings.yaml
10.0.3.1:8444 → 10.110.5.152:8444
https://10.0.3.1:2379 → https://10.110.5.152:2379
10.0.3.1:20002 → 10.110.5.152:20002

amc config set agent.api.url https://10.110.5.152:8082
juju config ams location="10.110.5.152"
juju config ams public_interface="ens160"

Thanks for the help.

gary-wzl77 · January 3, 2025, 3:01am

The following lines are unnecessary:

juju config ams location="10.110.5.152"
juju config ams public_interface="ens160"

The location config is intended for specifying the load balancer URL to support AMS in high availability, so it is not required in your case.
There is no public_interface config option available in the AMS charm.

BR
Gary