Hey everyone,
Recently, we’ve experienced a few infrastructure reliability issues that impacted various Ubuntu services. We know firsthand how frustrating it is when the tools you need to do distro work are unresponsive. What makes these situations even more annoying is our current lack of clear, public-facing reporting and dashboards. Without a centralized way to surface the state of our infra it is that to know what is broken, what is degraded, and what is currently being worked on.
This recent friction highlighted a clear technical debt in our observability story, prompting us to take a step back and look at how we can improve the situation—both in the short term and the long term.
Our First Step: Ubuntu Engineering Upptime
As an immediate, short-term fix to improve transparency, we looked at available options and decided to set up a GitHub “Upptime” project for our services.
You can view the repository and the generated status page here:
canonical/ubuntu-engineering-upptime
What is Upptime?
Upptime is an open-source uptime monitor and status page powered entirely by GitHub. It uses GitHub Actions to run scheduled synthetic checks against our endpoints, GitHub Issues for incident reporting, and GitHub Pages to host a live status dashboard.
Currently, this provides us with a clean, public overview of whether our main services are reachable, alongside historical response times and uptime percentages.
It also creates issues when services are down (and auto-resolves those once the service is restored)
The Limitations
While Upptime is a great first step, it is strictly a “black-box” monitoring tool. It is excellent at telling us if a web frontend returns an HTTP 200 OK, but it falls short of giving us the complete picture.
For instance, Upptime doesn’t tell us anything about the state of our backends. A service’s frontend might be up and routing traffic, but the backend worker processes could be failing to update reports.
Going forward: Juju Charms and COS
To address the missing backend statuses, we are leaning into the work done in recent Ubuntu cycles to transition and encapsulate our legacy services into Juju Charms.
By charming our services, we aren’t just improving deployment; we are creating a standardized platform for managing the entire operational lifecycle of these applications. Currently, our charms are doing a great job at running the services, but they lack built-in monitoring and reporting. Adding better observability into these charms is something we plan to work on during the upcoming cycles.
To achieve this, we plan to integrate our charmed services with the Canonical Observability Stack (COS).
How COS Delivers Value
COS bundles standard monitoring tools (like Prometheus and Grafana) to give us a look inside our applications, rather than just pinging them from the outside.
By integrating our backend infrastructure with COS, we can achieve two major improvements:
- Actionable Alerting: We can set up intelligent, automated alerts for when a backend process silently fails or a job queue gets stuck.
- Surfacing Meaningful Metrics: Instead of relying on a simple “green/red” status, we will be able to surface actual operational data. For example, we can provide dashboards showing exactly when a specific report was last successfully updated, or the current status of the backend.
This transition will take some time, but it will fundamentally shift our infrastructure from being reactive to proactive, providing a much more robust foundation for the distro.
Feedback
Please take a look at the Ubuntu Engineering Upptime page and let us know any services are currently missing for the list.
We also welcome any questions you might have about the current status or the plan for future cycles.


