Ubuntu HPC Meeting Notes: 2024/3/27

Meeting participants

@nuccitheboss, @jedel, @jamesbeedy, @arif-ali, Jaime F. de Souza

Open OnDemand - the saga continues

  • Currently spiking on getting the nginx_stage utility working within the the Open OnDemand snap package. nginx_stage is used to run interactive web applications and workloads on the supercomputer.
  • Having challenges setting up the testing environment due to issues with the ood-portal-generator utility.
    • update_ood_portal hardcodes the expected location for Dex, the OpenID Connect provider recommended by upstream. We’re currently using a custom installation location for Dex, so the appropriate configurations are not rendered by the portal generator.
    • Developing a workaround so that we can quickly configure an OpenIDC provider so that we can login into Open OnDemand a start testing various components of nginx_stage.
    • Test LDAP provider is glauth which is currently what is used by pluto to bring up HPC clusters.

nfs-client-operator work

  • autofs is now used in the backend by nfs-client-operator to provide persistence of mounts between machine reboots.
    • Originally the mount command was used directly, but this presented several challenges when needing to reboot machines when applying updates.
  • nfs-client-operator now supports using IPv6 addresses for mounts.

Observing observability

  • Evaluating the work that needs to be done to enable the Canonical Observability Stack (COS) within Charmed HPC. Doing a deep dive on how each of the services in COS can or currently interface with Slurm across the ecosystem.
    • Evaluating current Prometheus exporters for Slurm before deciding which one to bake into the Slurm operators.

Deprecating legacy interfaces in slurmctld-operator

  • Removing legacy interfaces from the slurmctld-operator that are no longer in use by Slurm charm admins. Legacy interfaces selected for removal include the following:
    • interface_grafana_support
    • interface_influxdb
    • interface_elasticsearch

Slurm cluster federation and cloud nodes

  • Discussion on how to support ephemeral cloud nodes on AWS within Charmed HPC.
    • Have a path forward, now working on finding the capacity with our current roadmap to potentially add time for adding support for this feature.

Getting involved

Next Ubuntu HPC community is next Wednesday, April 3rd, at 16:30 UTC over Jitsi. Want to get involved or just generally interested in our community? Join our Matrix server!

Be sure to also check out the events calendar to see other upcoming events the Ubuntu HPC community has planned!