Microcloud test environment requirements / scenarios

Hello,

I am considering to use Microcloud as foundation for our prepared SaaS solution. First we would like to create test/lab environment with three nodes with following specs:

8 core AMD CPU
32GB RAM
1 x 128GB SSD for OS
1 x 1TB SSD for local storage
1 x 1TB SSD for remote storage
1x 1Gbps Ethernet card for uplink
1 x 1Gbps Ethernet card for OVN

What we would like to test:

  1. Footprint of the solution
  2. Deployment of Juju Charmed PostgreSQL and our SaaS software (we will very likely create Charm for our software)
  3. Test of scaling, backup, recovery of Charmed PostgreSQL and our SaaS software
  4. Test adding a machine
  5. Migration of VM/container in case of failure of the machine
  6. Disaster recovery of the Microcloud

Questions:

  1. Are specs for lab environment OK?
  2. Any examples of production setup, including network devices specs (switches, routers)?
  3. In which cases the interconnect 1Gbps network will limit the performance of the lab environment? Just in case of migration of the VM’s from failed machine, or other cases as well?
  4. Any practical tips/references to test scenarios I mentioned?

Many thanks for any help.

Lumir

Hi @jas02,

The specs seem to be fine for a lab environment. Are you aware of the hardware requirements section in the MicroCloud docs?

When it comes to the configuration of the networks note the following:

  • You have mentioned that each of the nodes will have two physical NICs. If I understood you right one of them will be used as the OVN uplink and the other one acts the actual uplink for the node. Note that this link will also be used for OVN underlay (geneve tunnels) and Ceph storage traffic (public/internal) in between the three nodes. This traffic is named as “intra-cluster” on the network requirements doc page.
    This means if you put lots of pressure onto the storage devices (using Ceph remote storage pool) this might have side effects on the OVN overlay networks as they share the same NIC across the cluster. Same happens the other way around.
    Have a look a the Ceph networking doc page for a more detailed view on the different networks and configuration options.
  • The NIC used for intra-cluster traffic needs to have an IP address assigned. The OVN uplink interface has to be unconfigured but UP so MicroCloud can pick it up during initialization of the cluster.
  • In general we recommend 10G NICs for prod setups. For this please refer to the docs page.
  • As long as you use the remote storage pool the live migration of VMs is fairly “light weight” as you don’t have to transfer the VMs root volume in between the two nodes.

Hello @jpelizaeus ,

thanks for your reply and sorry for long time without my reply.

Of course I read the documentation and I am aware about limitations of the solution with just two network interfaces.

That’s why I’m thinking of a different approach. I’d like to design an “ideal” production setup that doesn’t contain any bottlenecks. Based on such a production build, I will then do a cost optimization for the needs of the test setup, albeit with some compromises.

Could you please help me design such a production build? In my opinion the node could look like this:

16/32 core/threads AMD CPU
192 GB RAM
1 x 256GB NVMe fo OS
1 x 2TB NVMe for local storage
1 x 4TB NVMe for shared storage
1x XXX Gb Network interface to connect to the uplink network
1 x XXX Gb Network interface for intra-cluster traffic
1 x XXX Gb Fully disaggregated networking - public ceph subnet
1 x XXX Gb Fully disaggregated networking - internal ceph subnet
1 x XXX Gb Fully disaggregated networking - microcloud internal subnet - is this in fact intra-cluster network, or this has to be separate interface?
1 x XXX Gb Dedicated underlay network for OVN traffic

How many physical interfaces is necessary in ideal case and with what 1/10/40/100Gb throughput?

Any other thoughts?

Many thanks in advance!

Which CPU(s) and how much memory you would require heavily depends on the workload you are planning to run inside the MicroCloud (e.g. number of machines). There we can only recommend a baseline.

MicroCloud allows you to configure five different networks. Those are

  • MicroCloud’s internal “intra-cluster” network (mainly used during bootstrap of the cloud and during lifecycle management tasks like adding new members to the MicroCloud)
  • Ceph public network
  • Ceph internal network
  • OVN underlay network (traffic between the VM’s and containers)
  • OVN uplink network (egress traffic that leaves the MicroCloud)

For the purpose of installing MicroCloud your machines also require general connectivity to download the various snaps.
By default everything except the OVN uplink network uses the “intra-cluster” network (those are the defaults displayed during microcloud init).

In your case you assigned two interfaces for both “intra-cluster” and MicroCloud’s internal subnet which is, as you already mentioned, effectively the same.
You can also think about reusing the interface/network which the machines are using for general connectivity to the outside.
But if you want to have everything separated then allocate an additional interface/network for the “intra-cluster” network.

To summarize, you would require five interfaces if you want to have everything separated. Whether you want to use 1/10/100 again depends on the workload and traffic you expect those interfaces to handle.

Hello @jpelizaeus

what interfaces are most speed/latency-sensitive? I guess that mainly Ceph internal network and underlay network for OVN traffic. So I guess, that following production setup might be OK:

1x 10 Gb Network interface to connect to the uplink network
1 x 10 Gb Network interface for intra-cluster traffic
1 x 10 Gb Fully disaggregated networking - public ceph subnet
1 x 40 Gb Fully disaggregated networking - internal ceph subnet
1 x 40 Gb Dedicated underlay network for OVN traffic

We will be running probably mix of VMs and containers with web servers, databases (postgresql, memcache, redis…) and Odoo ERP in the middle. There will be some other workload like SMTP server(s) and so on, but main load will be created by Odoo and sgl/nosql databases. Postgresql/NoSQL will be probably handled by Juju.

I guess that most traffic will be made by ceph internal traffic and underlay OVN, please confirm that I get it right. Or is there speed/latency sensitive traffic also on intra-cluster interface?

Many thanks in advance.

You don’t have to expect much load on the intra-cluster network if it’s solely used for MicroCloud’s internal communication (mainly dqlite/raft related traffic between the cluster members to keep the Micro* daemons up and running within their cluster).

You can expect the majority of the load to be happening on the Ceph and OVN network interfaces depending on whether the workload requires lots of communication between the instances or reads and writes lots of data to and from the disks. However if you don’t have much ingress/egress traffic from and to the outside, you might also scale up or down this specific interface.