Anbox Cloud Dashboard: AI-powered testing with GitHub Copilot

Testing is a critical part of building reliable software, but it is also one of the areas where teams tend to accumulate technical debt the fastest. As Anbox has grown in complexity, so have our testing requirements, with more components, more edge cases, and higher needs for quality and reliability. Traditional testing approaches alone were no longer enough to keep up without slowing development.

This is where AI came into the picture. Instead of treating AI as a replacement for existing testing practices, we explored how it could augment them, helping us increase test coverage, improve code quality, and move faster without compromising reliability.

In this post, we’ll walk through how we introduced AI into different layers of testing in Anbox, spanning unit tests, visual tests, and accessibility checks.

Figure 1 - Anbox Cloud instance streaming.

Unit testing for Anbox Cloud Dashboard

Motivation and goal

We recognized that robust unit testing is essential for maintaining high-quality code in our Anbox Cloud Dashboard, so we set an ambitious goal for this cycle: boost test coverage from near-zero to 80% overall, starting with the highest-impact utility functions. But with hundreds of files and a packed roadmap, manual testing alone wouldn’t cut it. We needed a smarter way to increase our speed without sacrificing code quality.

The problem

Our baseline coverage was fairly low: just 0.32% statements, 0.12% branches, 0.5% functions, and 0.33% lines (as shown in the “before” screenshot, Figure 2). With only one test file (partially) covering a single utility, we faced risks like undetected bugs, fragile deployments, and mounting technical debt. Traditional approaches would take months, derailing our development pace for the other features planned for this cycle.

Figure 2 - Istanbul report before AI-assisted unit test creation.

The solution: AI-powered testing with GitHub Copilot

We used GitHub Copilot to create a plan and automate test creation, focusing first on pure utility functions that are easy to test and yield high ROI. Our phased strategy divided the work into sprints:

  • Sprint 1 (foundation): Tackled core utilities like date/time helpers, instance calculations, and file validation. Copilot generated comprehensive test suites, including edge cases for null inputs, malformed URLs, and async operations.
  • Sprint 2 (business logic): Expanded to application validators, image filters, and permission logic, achieving 100% coverage on multiple files.
  • Sprint 3 (complex logic): Moved on to heavy hitters like stats calculations (447 lines) and automotive HAL utilities (556 lines), with Copilot handling intricate mocking for browser APIs and statistical computations.

In just five weeks (November to December 2025), we completed 13 of 20 planned utility test files, generating over 300 tests. Copilot reduced writing time by 70-80%, surfacing edge cases we might have missed and ensuring robust coverage. The “after” screenshot (Figure 3) illustrates this transformation quite well: utilities are now at near-80% coverage, with overall coverage climbing to 9.63% statements.

Figure 3 - Istanbul report after AI-assisted unit test creation.

The result

We went from 2.72% to achieving ~80% coverage on critical utilities in just a few weeks thanks to AI-assisted automation. This improved code reliability while also boosting developer confidence and reducing long-term maintenance costs. Most importantly, Copilot augmented our workflows without replacing them – we ensured human reviews throughout the planning and implementation cycle, so that we could be sure that the newly added tests aligned with our quality requirements and other domain-specific needs.

Plans for the future

We’re on track to increase our overall coverage further by mid-2026, expanding to hooks, components, and API layers. Future phases will leverage more AI tools for integration tests and CI automation, sustaining our momentum as Anbox Cloud grows.

UI testing with Storybook

Motivation and goal

Similar to unit testing, Anbox Cloud Dashboard already had a wide collection of pages and functional components, yet the more it grew, the clearer it became that there was a critical gap in our infrastructure: visibility and testability.

Historically, our components were built directly into the dashboard logic, although structurally separate; it created a form of technical debt where UI elements were difficult to inspect or verify in isolation.

We also needed to bridge the gap between development and design by surfacing what we had already built into a centralized component catalog.

To resolve this, we decided to integrate Storybook (with React) transforming our existing components into a high-visibility, fully testable library.

The problem

Before adopting Storybook, our development process faced significant bottlenecks due to the way our UI was coupled with the application:

  • Developers had to deploy the appliance just to verify a small CSS tweak on a specific page.
  • Changes in a shared styling file or an update to react-components library, could unintentionally break layouts in distant parts of the Dashboard without being noticed.
  • Testing different states and edge cases required prior-knowledge of the product in order to navigate complex user journeys; which limited the pool of potential code and QA reviewers.
  • Stakeholders and designers couldn’t easily review components and quickly give reproducible and consistent feedback to developers without going through the challenges of setting the entire environment locally.

The solution: AI-powered retrofitting

To tackle these problems and attack our technical debt systematically and efficiently, we asked Copilot to audit our codebase and design a “test plan”. It came up with… interesting ideas. It proposed compiling an exhaustive list of every UI element available, suggested placing .stories.tsx all over the place, and finally made a lot of assumptions by creating stories with scenarios and states that never existed in our logic.

So we had to step in, narrow down the scope and iteratively worked on refining and reviewing the proposed plan until we ended up with a 3 phase roadmap that now lives in our repo’s instructions document:

  • Phase 1 (the setup and configuration): to integrate Storybook, and establish a clean architecture, mirroring the folder structure of our code base to ensure stories are faster to locate and easier to maintain. Additionally, we manually implemented the first two stories AuthNoticeModal and LoginPageLayoutWithLogo to serve as a guide-rail “template” for best practices, and to avoid some of AI’s “creativity” (a.k.a hallucinations).
  • Phase 2 (component stories): with the infrastructure and template in place, Copilot was able to generate the remaining 44 out 46 stories, from simple buttons and labels to modals and tables.
  • Phase 3 (pages stories): We have also structured a future plan to handle complex pages with states, contexts, and multiple integrated components. This introduces mocking network requests and response data, allowing us to simulate various user journeys without an actual backend.

The result

What could have been months of manual labor has already taken shape. With the assistance of Copilot, and by leveraging Storybook we were able to bring visibility to our existing components through stories.

Figure 4 - Storybook showing Anbox Cloud Dashboard login component.

With the current Storybook catalog …

  • We can render 46 components independent of the main app logic.
  • We can verify edge cases by instantly toggling component inputs (props) without needing to trigger complex app states.
  • We can provide designers and developers with a visual directory to verify UI changes rapidly, ensuring consistency and quick feedback across the entire dashboard.

Plans for the future

We envision further upgrades to our UI development and testing workflow by integrating the deployment of the Storybook environment into our pull request pipeline. This will immediately give designers and other stakeholders the ability to interact with updated stories directly from a pull request, eliminating the need for them to set up a local development environment.

We also acknowledge the importance of integration testing, that’s why we want to move beyond atomic components and start implementing full-page stories that incorporate multiple stateful components. By using service workers and data mocking, we will be able to simulate network requests to test how our UI handles real-world data and edge-cases in total isolation.

Accessibility testing (a11y)

Accessibility testing in Anbox Cloud Dashboard is built on Playwright and axe-core, which together provide reliable, standards-based validation against common accessibility issues such as missing labels, incorrect roles, contrast problems, and focus handling.

Before settling on this stack, we used AI during the research and discovery phase to explore and evaluate available accessibility auditing tools. AI helped us survey the ecosystem, compare different solutions, and understand trade-offs between browser-based audits, CI-friendly libraries, and runtime testing approaches. This allowed us to quickly narrow down options that aligned with our requirements for automation, determinism, and integration into existing test pipelines.

Given the deterministic nature of accessibility rules and the importance of precise, repeatable results, we deliberately did not use AI to perform or judge accessibility checks. Instead, we relied on axe-core’s well-defined rule set and Playwright’s ability to exercise real user interactions.

Where AI proved valuable was in accelerating how accessibility tests are designed and maintained. We used AI to:

  • Identify relevant axe-core rules based on component behavior and interaction patterns
  • Suggest edge cases humans often forget
  • Explain axe-core failures
  • Propose fixes aligned with our UI patterns

As part of our accessibility testing workflow, we run automated audits using Playwright and axe-core as part of our test suite. The table and code snippet below show a representative example of the output produced by these audits; highlighting detected accessibility issues and the contextual information used by engineers to investigate and resolve them.

Page Pass rate Page coverage Violations
Instances 91% 81% 3 - critical: 1, minor: 1, moderate: 1
Applications 91% 83% 4 - critical: 1, serious: 1, minor: 1, moderate: 1
Images 91% 96% 3 - critical: 1, minor: 1, moderate: 1
Nodes 93% 81% 2 - critical: 1, moderate: 1
Operations 98% (1 incomplete test) 76% 0
Permissions - Identities 93% 87% 3 - critical: 1, minor: 1, moderate: 1
{
  "id": "form-field-multiple-labels",
  "description": "Ensure form field does not have multiple label elements",
  "help": "Form field must not have multiple label elements",
  "helpUrl": "https://dequeuniversity.com/rules/axe/4.11/form-field-multiple-labels?application=playwright",
  "nodes": [
    {
      "none": [
        {
          "id": "multiple-label",
          "data": null,
          "relatedNodes": [
            {
              "html": "<label class=\"u-off-screen\" for=\"search-and-filter-input\">Search and filter</label>",
              "target": [
                "label[for=\"search-and-filter-input\"]"
              ]
            }
          ],
          "impact": "moderate",
          "message": "Form field does not have multiple label elements"
        }
      ],
      "impact": null,
      "html": "<input autocomplete=\"off\" class=\"p-search-and-filter__input\" id=\"search-and-filter-input\" placeholder=\"Search and filter\" type=\"search\" value=\"\" name=\"search\">",
      "target": [
        "#search-and-filter-input"
      ]
    }
  ]
}

Our experience with GitHub Copilot was largely positive, especially for accelerating repetitive testing tasks and uncovering overlooked edge cases. That said, achieving reliable results required clear prompts, defined scope, and consistent human review to avoid incorrect assumptions. For teams exploring similar workflows, we recommend starting with well-isolated, high-impact areas and using templates or examples to guide AI output.

AI has proven to be a valuable addition to our testing toolbox in Anbox. When applied selectively and with clear intent, it allowed us to consistently increase test coverage without introducing excessive manual effort. It helped surface quality issues earlier in the development cycle, reduced the long-term cost of maintaining and evolving our test suites, and ensured that development could continue at a healthy pace, even as system complexity grew. Most importantly, AI didn’t replace our existing workflows, it strengthened them.

To learn more, explore Anbox Cloud in action!

2 Likes