Building a ‘fake’ RISC-V server to enable datacenter software

Recently, we have seen announcements from SpacemiT and others about powerful RVA23-compliant RISC-V chips. These are closer than ever to server-class SoCs, with real server chassis coming soon.
However, most datacenter management software has never seen a RISC-V server, and this needs to be addressed quickly. For us, that software is Canonical MAAS, which currently has no support for RISC-V. This is the story of MAAS work-in-progress enablement.

Datacenter management software and Canonical MAAS

If you have a computer or server at home, you usually keep the default OS or install another using a USB stick. You reboot it and shut it down once in a while by pressing the buttons, maybe after a system update.
For a datacenter with 100+ servers, such manual maintenance is impossible. For this reason, rack servers usually have some sort of BMC (Baseboard Management Controller). This allows administrators to power servers on or off, change BIOS settings, handle inventory, and install virtual ISO images for OS installation - all over the network.
We then use datacenter management software to talk to all the BMCs and commission all servers at the same time. It acts as a central place for hardware management and maintenance.

Screenshot of Canonical MAAS UI, managing machines

RISC-V enablement

Now, what do we need to do to make sure that such software supports RISC-V?

Since they talk to BMCs in an architecture-independent way, not much is needed. However, most actually use network booting protocols, which require specific architecture handling at DHCP level, and also need specific OS images for the target architecture.
Let’s implement this!

A test system

So, we have a complex software that will expect a BMC to control and a network-booting server to talk to. We need to build a test system that will play this role before attempting to modify MAAS code. That is the main topic of this article.

Specifically, we need two ingredients:

  • A RISC-V system that has a firmware with network drivers and can boot over network
  • A BMC present on that system, or a way to build one

RISC-V board: the VisionFive 2

The VisionFive 2 is a well-supported board that is flexible and has upstream drivers in all relevant projects (Linux, OpenSBI, U-Boot). It also has network support in firmware, so network boot is a real possibility.


The VisionFive 2, a RISC-V Single Board Computer

The only thing it lacks is a BMC. It is also not a very powerful board, definitely not server-class.

Firmware settings

The VisionFive 2 boots using U-Boot. We need to build a modified version of U-Boot with configurations that enable network booting and set it as default.
We need to tweak the default configuration in the following way:

< # CONFIG_EFI_IP4_CONFIG2_PROTOCOL is not set
< # CONFIG_EFI_HTTP_PROTOCOL is not set
---
> CONFIG_EFI_IP4_CONFIG2_PROTOCOL=y
> CONFIG_EFI_HTTP_PROTOCOL=y

< # CONFIG_EFI_HTTP_BOOT is not set
---
> CONFIG_EFI_HTTP_BOOT=y

< CONFIG_BOOTCOMMAND="bootflow scan"
---
> CONFIG_BOOTCOMMAND="dhcp; bootefi $loadaddr; bootefi bootmgr"

> CONFIG_CMD_BOOTP=y
> CONFIG_BOOTP_BOOTPATH=y
> CONFIG_BOOTP_DNS=y
> CONFIG_BOOTP_GATEWAY=y
> CONFIG_BOOTP_HOSTNAME=y
> CONFIG_BOOTP_SUBNETMASK=y
> CONFIG_BOOTP_PXE=y
> CONFIG_DHCP_PXE_CLIENTARCH=0x1B
> CONFIG_BOOTP_PXE_DHCP_OPTION=y
> CONFIG_BOOTP_VCI_STRING="U-Boot"
> CONFIG_NET_TFTP_VARS=y

There are two main modifications. First, we change the default boot command to send a DHCP boot request and try to EFI-boot the image that the request should have loaded into memory. If that fails, we fall back to the EFI boot manager, which will find the next locally available boot source; in our case, the NVME drive.
The second modification is the activation of network booting protocols, with the right settings for PXE client architecture. We want to load the bootloader image over TFTP, so we set PXE_CLIENTARCH to 0x1B, which means riscv64 TFTP according to the specification.

With that set, our VisionFive 2 is ready to network-boot!

The VisionFive 2 starting up, loading Grub EFI image over TFTP and booting into Grub

The VisionFive 2 starting up, loading Grub EFI image over TFTP and booting into Grub

On the other side of the network is a DHCP server configured to serve grubriscv64.efi when hit with a client architecture of 0x1B. The network boot version of Grub is available in the Grub Ubuntu packages, as the monolithic grubnetriscv64.efi file. Ready-to-use network boot images for RISC-V are also available since Resolute.

Powering the board

The next step is ensuring the board can be controlled by a BMC. We don’t need anything fancy here: the only hard requirement is being able to turn the board on and off over the network. A nice bonus would be serial access, to debug the boot process if something goes wrong.

The VisionFive 2 can be powered using an USB-C connector or +5V/GND GPIOs. We want the final build to be in the form of a 1U server, as this will be installed in a datacenter rack for continuous testing and development infrastructure for MAAS RISC-V enablement.
The power supply for such servers is similar to a regular ATX PSU, only smaller. This rules out the USB connector option. Fortunately, ATX PSUs have +5V lines that we can use to power the board.

The only thing left to do would be to have a BMC that can control that ATX PSU to power the board on and off.

BMC

Switching the PSU and basic circuit

Fortunately, it is fairly easy to switch a PSU from a regular GPIO. ATX PSUs have a specific pin dedicated to that, which is traditionally connected to the case power button. We only need to connect it to our BMC GPIO, and voilà!

If our BMC board has UART I/O pins, we can also connect those to the UART pins of the VisionFive 2. This gives us instant serial access on the BMC, which is very convenient. There is one last problem: how do we power the BMC? If the BMC can switch off the ATX PSU, it cannot be powered directly by it; otherwise it would effectively power itself off, and nothing could bring the system back on.
This issue can be solved using a specific pin on the ATX connector: the +5V USB pin. ATX power supplies have a dedicated USB pin that is always on, and can provide up to 500 mA of current. It is traditionally used to power the USB ports of the motherboard, so they remain on even when the system is off. If we choose a board that can work with 2.5W of power, then we have a BMC!

The following figure shows the final circuit.


Connections between the ATX PSU 24-pin, the RISC-V VisionFive2 board and the BMC

ESP32-S3

I needed a microcontroller board with Ethernet/RJ45 connectivity and low power consumption. I had worked with ESP32 boards before in personal projects, so I chose an ESP32-S3 based board with an integrated Ethernet module.

The WaveShare ESP32-S3-ETH board.

Firmware for the BMC

Hardware is great, but it won’t do anything without software. We need to write a program for that board that connects to the Ethernet network and exposes a web server with an API to control the GPIO and access the serial console. We could also add an HTML/CSS web page to show status and allow easy control of that API.

Wait, what? This is all great, but writing embedded firmware like that seems really hard, takes a lot of time and is not really relevant to what we want to do here.
This is the part where generative AI comes into play.

DISCLAIMER: My views are my own and do not necessarily reflect the views of my employer. I used generative AI for this project as it is an internal experiment that will never be exposed to users other than our internal team, in order to learn about these tools and see what they were capable of. I won’t give my or anyone’s opinion on AI in this article as it is not the point. Please consider this an experiment, and you may find the results useful.

Normally, I would have tried to find an existing project with similar functionality to use as firmware for the BMC. Alternatively, I might have abandoned the ESP32 idea and used something powerful enough to run a full-blown Linux distribution, making it easy to host a web server.
However, recently we developers have seen AI agents targeted at code that can allegedly generate large projects, in complicated environments and in any kind of language, even C. I wanted to test that assumption here.

I used my personal AI stack for this. It started as a learning experience, but as you will see, it became capable enough that I was able to use it fully. I won’t discuss it in detail here, but the basics are: I use the open-source extension “Roo Code” on VSCode, with custom providers that allow me to access open-weights models like GLM-5, my current favorite. This is not an endorsement, just my preference from personal experience. I haven’t tried other models (for example Claude), so take my opinion with a grain of salt.

I started my ‘coding’ session with the following prompt:

I want to start a new quick project, developing ESP32-S3 firmware to act as a BMC for an external board. The firmware must connect using ethernet, expose a web server with endpoints to turn on, off, and see the status of the board. To perform those actions, we will need to read/set GPIO values (I will connect them to the right place). We could use ESP-IDF or Micropython. I also want an endpoint for serial, and a small HTML/CSS web interface.

The agent asked a few questions and we decided on going with ESP-IDF. I also communicated the Ethernet chip I was using, and other hardware details. After a bit of back and forth, it created an extensive specification and got to work. After a while, it proudly told me that everything was ready for me to try out.

I started the build process, and of course it would not even start compiling as there were missing dependencies. After a bit of debugging, I realized that the dependency file was not in the right directory, and moved it to the right place. Then, there were some missing drivers and compilation errors that the agent was able to fix on its own.

After fixing everything and adding a code formatter, I was surprised that it actually worked. I already had a board using DHCP to connect to the Ethernet network, hosting a web server with a nice web page and numerous API calls. In fact, it had done a bit too much, exposing 20 GPIOs and various other features. But it was really great, and it had even created a WebSocket endpoint for the serial console.
After manually tweaking SDK options, fixing bugs and adding features (like OTA updates and authentication) through discussions with the agent, I had a firmware that was stable enough for internal use. Of course, this could never be used in production without a thorough review and many improvements; but that would also have been true if I had implemented scripts to achieve the same job on a Linux box in less than a week.


BMC firmware web page. Not so pretty, but surely better than raw API endpoints, especially for debugging.

Overall, I was very pleased with the experience. I would have thought that an AI agent would fail miserably at generating embedded C, but it seems that I was wrong. This seems to be a really useful technology, at least for such experiments.

For the technical details, you can access the firmware Git repository.


My full discussion with the AI agent in Roo Code. +6925 lines, -527, 26 files generated

A real server?

3D-printing an ITX plastic mount

To fit the VisionFive 2 and ESP32 board inside the 1U server chassis, I needed them in the form of a mini-ITX motherboard. I quickly designed a square with the ITX dimensions, added holes for screws, and created mounts for screw inserts for the two boards. I then used my 3D printer to produce the ITX plastic mount.


My 3D printer software showing the plastic ITX mount 3D model

In order to have the two RJ45 ports line up at the end of the board, I printed another model that held two RJ45 keystones, and used tiny ethernet cables to connect everything.

Two RJ45 keystones are the only available I/O for the ITX board, as the rest is hidden behind


Everything is mounted and connected on the ITX plastic board

Finalizing the build

The whole project took about three to four weeks from start to finish.

The assembly was straightforward. The 1U chassis has built-in fans connected to the PSU, which provide active cooling. The VisionFive 2 and ESP32 boards themselves rely on passive cooling, which has been sufficient for our testing workloads.

Putting it all together inside the server chassis is a really nice feeling.


From the outside, it looks like a real server!

Conclusion

This was a very interesting experiment! I learned a lot, and most of it was really fun.

Of course, most of the work wasn’t on setting up the test system; it was on testing Canonical MAAS in parallel to ensure that, with some experimental changes, the system would boot. That work is now done, and I’m confident the system is good enough for the MAAS team to take over and start developing and testing MAAS for RISC-V.

The custom BMC firmware exposes a simple HTTP API with endpoints like /power/on and /power/off to control the board. This lightweight approach was sufficient for our testing needs

Since such experiments are rare and fun, I thought I would write a blog article to share the most interesting parts. I hope you enjoyed reading this!

6 Likes