14% performance degradation in container vs host

hey folks,

I am new to containers so tia for your patience.

tldr;
I am seeing a 14% hit in performance for my c++ app in the container vs host. configured for bare metal perf I think. Is this just something I have to accept to run a container?

host os: ubuntu 22.04.4
container os: ubunutu 22.04.4
kernel: 5.15.0-134-generic
x86_64 AMD cpu (32 core)
Nvidia GPU
FS: ext4

Context: Only one container will be run per host, ever

I have a c++ robotics app that uses gpu and cpu intensively. I have setup an LXD container and it and the host are pretty much identical. My container config is below. I am pretty sure I have things configured to be as close as possible to bare metal performance.

There will only ever be one container running per host

when I benchmark gpu, cpu, etc with sysbench, the perf is the same between container and host. but when I run my app, the container is 14% slower in execution.

I have tried everything I can think of, cpu pinning, scheduling priorities, etc.

I did some profiling and if I am understanding everything correctly, here is what is significantly slower in the container:

  • protobuf serder
  • many many more calls and time spent for syscalls read, close and wait4

I am starting to get the feeling that I am bumping up against the overhead a container carries in general for my type of app. Is that true?

Thanks!

-Shane

=====================

armstrong@benny:~/catkin_ws/src/vision$ lxc config show robot
architecture: x86_64
config:
image.architecture: x86_64
image.description: Ubuntu 20.04 LTS server (20220610)
image.os: ubuntu
image.release: focal
limits.cpu.nodes: “0”
limits.kernel.rtprio: “99”
nvidia.driver.capabilities: compute,video,display,graphics
nvidia.runtime: “true”
security.nesting: “true”
security.syscalls.intercept.sched_setscheduler: “false”
volatile.base_image: 3a974e258723103dbec442dc1e9fbdda95840f359f9250a6144eb65682974247
volatile.cloud-init.instance-id: 4a0da7ac-11d7-43d2-ac5d-0fafc41f7ebd
volatile.eth0.host_name: mac45d2d620
volatile.eth0.last_state.created: “false”
volatile.eth0.name: eth0
volatile.idmap.base: “0”
volatile.idmap.current: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.idmap.next: ‘[{“Isuid”:true,“Isgid”:false,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000},{“Isuid”:false,“Isgid”:true,“Hostid”:1000000,“Nsid”:0,“Maprange”:1000000000}]’
volatile.last_state.idmap: ‘
volatile.last_state.power: RUNNING
volatile.last_state.ready: “false”
volatile.uuid: 5561c2ef-c515-4cae-9b81-fa86554b00ab
volatile.uuid.generation: 5561c2ef-c515-4cae-9b81-fa86554b00ab
devices:
armstrong_data:
path: /armstrong_data
source: /armstrong_data
type: disk
eth0:
hwaddr: 00:16:3e:68:fb:4a
nictype: macvlan
parent: enx207bd2b46611
type: nic
gpu:
type: gpu
nvidia_runtime:
path: /dev/nvidiactl
type: unix-char
nvidia_uvm:
path: /dev/nvidia-uvm
type: unix-char
nvidia_uvm_tools:
path: /dev/nvidia-uvm-tools
type: unix-char
nvidia0:
mode: “0666”
path: /dev/nvidia0
type: unix-char
root:
path: /
pool: robot_disk
type: disk
ephemeral: false
profiles:

  • default
    stateful: false

armstrong@benny:~/catkin_ws/src/vision$ lxc storage show robot_disk
name: robot_disk
description: “”
driver: dir
status: Created
config:
source: /lxd_storage
used_by:

  • /1.0/instances/robot
  • /1.0/instances/robot/snapshots/202403171330
    locations:
  • none

Hello @shanehill00,

I am seeing a 14% hit in performance for my c++ app in the container vs host. configured for bare metal perf I think. Is this just something I have to accept to run a container?

No, I would say that these days we have very small difference between applications inside containers and on the host. Because even when you run you workload on the host cgroups are still involved. Namespaces are not, but they are “performance neutral”. You story sounds like something to investigate. 14% is not a joke.

when I benchmark gpu, cpu, etc with sysbench, the perf is the same between container and host. but when I run my app, the container is 14% slower in execution.

Which means that we should take a closer look at how you measure your application’s performance. What specifically you compare, so you get 14% peformance drop? Requests/second? something else?

I did some profiling and if I am understanding everything correctly, here is what is significantly slower in the container:
protobuf serder

What specifically slower with protobuf?

many many more calls and time spent for syscalls read, close and wait4

There is no difference between how read/close/wait4 work inside/outside of the container. At all.

Kind regards,
Alex

@amikhalitsyn I’ve been meaning to ask you about this one but forgot :confused: Could it be the seccomp policy that harms the syscall performance?

it theoretically can, but why it does not affect sysbench results then?

Also, 14% is not a joke, especially for syscalls we don’t intercept like read/close/wait4.

I would start from understanding where this 14% number comes from.