Ubuntu HPC Meeting Notes: 2024/3/27

Meeting participants

@nuccitheboss, @jedel, @jamesbeedy, @arif-ali, Jaime F. de Souza

Open OnDemand - the saga continues

  • Currently spiking on getting the nginx_stage utility working within the the Open OnDemand snap package. nginx_stage is used to run interactive web applications and workloads on the supercomputer.
  • Having challenges setting up the testing environment due to issues with the ood-portal-generator utility.
    • update_ood_portal hardcodes the expected location for Dex, the OpenID Connect provider recommended by upstream. We’re currently using a custom installation location for Dex, so the appropriate configurations are not rendered by the portal generator.
    • Developing a workaround so that we can quickly configure an OpenIDC provider so that we can login into Open OnDemand a start testing various components of nginx_stage.
    • Test LDAP provider is glauth which is currently what is used by pluto to bring up HPC clusters.

nfs-client-operator work

  • autofs is now used in the backend by nfs-client-operator to provide persistence of mounts between machine reboots.
    • Originally the mount command was used directly, but this presented several challenges when needing to reboot machines when applying updates.
  • nfs-client-operator now supports using IPv6 addresses for mounts.

Observing observability

  • Evaluating the work that needs to be done to enable the Canonical Observability Stack (COS) within Charmed HPC. Doing a deep dive on how each of the services in COS can or currently interface with Slurm across the ecosystem.
    • Evaluating current Prometheus exporters for Slurm before deciding which one to bake into the Slurm operators.

Deprecating legacy interfaces in slurmctld-operator

  • Removing legacy interfaces from the slurmctld-operator that are no longer in use by Slurm charm admins. Legacy interfaces selected for removal include the following:
    • interface_grafana_support
    • interface_influxdb
    • interface_elasticsearch

Slurm cluster federation and cloud nodes

  • Discussion on how to support ephemeral cloud nodes on AWS within Charmed HPC.
    • Have a path forward, now working on finding the capacity with our current roadmap to potentially add time for adding support for this feature.

