Instance start without error but stop soon

Hi, this week, I found that some instance start without error but stop soon. But I can not found any cause from debug log, can anyone give some idea, thank you :slightly_smiling_face:
image

LXD version:5.16-f2b0200

time="2023-07-28T12:41:16Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/events username=xiyou
time="2023-07-28T12:41:16Z" level=debug msg="Event listener server handler started" id=410e3f87-498a-4e24-950f-7eeee3d72402 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-28T12:41:16Z" level=debug msg="Handling API request" ip=@ method=PUT protocol=unix url=/1.0/instances/shpc-2291-instance-HiQDflgE/state username=xiyou


time="2023-07-28T12:41:16Z" level=debug msg="API Request\n\t{\n\t\t\"action\": \"start\",\n\t\t\"timeout\": 0,\n\t\t\"force\": false,\n\t\t\"stateful\": false\n\t}" ip=@ method=PUT protocol=unix url=/1.0/instances/shpc-2291-instance-HiQDflgE/state username=xiyou
time="2023-07-28T12:41:16Z" level=debug msg="New operation" class=task description="Starting instance" operation=cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc project=default
time="2023-07-28T12:41:16Z" level=debug msg="Started operation" class=task description="Starting instance" operation=cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc project=default
time="2023-07-28T12:41:16Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"async\",\n\t\t\"status\": \"Operation created\",\n\t\t\"status_code\": 100,\n\t\t\"operation\": \"/1.0/operations/cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {\n\t\t\t\"id\": \"cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting instance\",\n\t\t\t\"created_at\": \"2023-07-28T12:41:16.539288364Z\",\n\t\t\t\"updated_at\": \"2023-07-28T12:41:16.539288364Z\",\n\t\t\t\"status\": \"Running\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"instances\": [\n\t\t\t\t\t\"/1.0/instances/shpc-2291-instance-HiQDflgE\"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"lxd8\"\n\t\t}\n\t}" http_code=202
time="2023-07-28T12:41:16Z" level=debug msg="Start started" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default stateful=false
time="2023-07-28T12:41:16Z" level=debug msg="Instance operation lock created" action=start instance=shpc-2291-instance-HiQDflgE project=default reusable=false
time="2023-07-28T12:41:16Z" level=info msg="Starting instance" action=start created="2023-04-21 00:50:36.467426965 +0000 UTC" ephemeral=false instance=shpc-2291-instance-HiQDflgE instanceType=container project=default stateful=false used="2023-07-28 12:39:59.495276907 +0000 UTC"
time="2023-07-28T12:41:16Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url=/1.0/operations/cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc username=xiyou
time="2023-07-28T12:41:16Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {\n\t\t\t\"id\": \"cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc\",\n\t\t\t\"class\": \"task\",\n\t\t\t\"description\": \"Starting instance\",\n\t\t\t\"created_at\": \"2023-07-28T12:41:16.539288364Z\",\n\t\t\t\"updated_at\": \"2023-07-28T12:41:16.539288364Z\",\n\t\t\t\"status\": \"Running\",\n\t\t\t\"status_code\": 103,\n\t\t\t\"resources\": {\n\t\t\t\t\"instances\": [\n\t\t\t\t\t\"/1.0/instances/shpc-2291-instance-HiQDflgE\"\n\t\t\t\t]\n\t\t\t},\n\t\t\t\"metadata\": null,\n\t\t\t\"may_cancel\": false,\n\t\t\t\"err\": \"\",\n\t\t\t\"location\": \"lxd8\"\n\t\t}\n\t}" http_code=200
time="2023-07-28T12:41:16Z" level=debug msg="MountInstance started" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:16Z" level=debug msg="Activated RBD volume" dev=/dev/rbd8 driver=ceph pool=remote volName=container_shpc-2291-instance-HiQDflgE
time="2023-07-28T12:41:16Z" level=debug msg="Mounted RBD volume" dev=/dev/rbd8 driver=ceph options=discard path=/var/snap/lxd/common/lxd/storage-pools/remote/containers/shpc-2291-instance-HiQDflgE pool=remote volName=shpc-2291-instance-HiQDflgE
time="2023-07-28T12:41:16Z" level=debug msg="MountInstance finished" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:17Z" level=debug msg="Starting device" device=eth0 instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=nic
time="2023-07-28T12:41:17Z" level=debug msg="Starting device" device=root instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=disk
time="2023-07-28T12:41:17Z" level=debug msg="Starting device" device=home instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=disk
time="2023-07-28T12:41:17Z" level=debug msg="MountCustomVolume started" driver=ceph pool=remote project=default volName=custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:17Z" level=debug msg="Activated RBD volume" dev=/dev/rbd9 driver=ceph pool=remote volName=custom_default_custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:17Z" level=debug msg="Mounted RBD volume" dev=/dev/rbd9 driver=ceph options=discard path=/var/snap/lxd/common/lxd/storage-pools/remote/custom/default_custom-volume-of-2291-431-MiwzE14g pool=remote volName=default_custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:17Z" level=debug msg="MountCustomVolume finished" driver=ceph pool=remote project=default volName=custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:17Z" level=debug msg="UpdateInstanceBackupFile started" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:17Z" level=debug msg="Skipping unmount as in use" driver=ceph pool=remote refCount=1 volName=shpc-2291-instance-HiQDflgE
time="2023-07-28T12:41:17Z" level=debug msg="UpdateInstanceBackupFile finished" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:17Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url="/internal/containers/shpc-2291-instance-HiQDflgE/onstart?project=default" username=root
time="2023-07-28T12:41:18Z" level=debug msg="Scheduler: container shpc-2291-instance-HiQDflgE started: re-balancing"
time="2023-07-28T12:41:18Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {}\n\t}" http_code=200
time="2023-07-28T12:41:18Z" level=info msg="Started instance" action=start created="2023-04-21 00:50:36.467426965 +0000 UTC" ephemeral=false instance=shpc-2291-instance-HiQDflgE instanceType=container project=default stateful=false used="2023-07-28 12:39:59.495276907 +0000 UTC"

time="2023-07-28T12:41:18Z" level=debug msg="Instance operation lock finished" action=start err="<nil>" instance=shpc-2291-instance-HiQDflgE project=default reusable=false
time="2023-07-28T12:41:18Z" level=debug msg="Start finished" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default stateful=false
time="2023-07-28T12:41:18Z" level=debug msg="Success for operation" class=task description="Starting instance" operation=cecabdbe-43a4-4bb0-8b1c-c4a41a4527bc project=default


time="2023-07-28T12:41:18Z" level=debug msg="Event listener server handler stopped" listener=410e3f87-498a-4e24-950f-7eeee3d72402 local=/var/snap/lxd/common/lxd/unix.socket remote=@
time="2023-07-28T12:41:19Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url="/internal/containers/shpc-2291-instance-HiQDflgE/onstopns?netns=%2Fproc%2F1074792%2Ffd%2F4&project=default&target=stop" username=root
time="2023-07-28T12:41:19Z" level=debug msg="Instance initiated stop" action=stop instance=shpc-2291-instance-HiQDflgE instanceType=container project=default

time="2023-07-28T12:41:19Z" level=debug msg="Instance operation lock created" action=stop instance=shpc-2291-instance-HiQDflgE project=default reusable=false
time="2023-07-28T12:41:19Z" level=debug msg="Stopping device" device=eth0 instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=nic
time="2023-07-28T12:41:19Z" level=debug msg="Clearing instance firewall static filters" IPv4Nets="[172.16.11.135/32]" IPv6Nets="[]" device=eth0 driver=nic host_name=vethfc8d5477 hwaddr="00:16:3e:da:d4:5b" instance=shpc-2291-instance-HiQDflgE parent=bridge0 project=default
time="2023-07-28T12:41:19Z" level=debug msg="Clearing instance total protocol filters" IPv4Nets="[172.16.11.135/32]" IPv6Nets="[]" device=eth0 driver=nic host_name=vethfc8d5477 hwaddr="00:16:3e:da:d4:5b" instance=shpc-2291-instance-HiQDflgE parent=bridge0 project=default
time="2023-07-28T12:41:19Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {}\n\t}" http_code=200
time="2023-07-28T12:41:19Z" level=debug msg="Matched trusted cert" fingerprint=e9047f5581d5445d740979ef5ff2d7f87a37a1a8bbccdf93f89ea3872534c298 subject="CN=root@lxd3,O=linuxcontainers.org"
time="2023-07-28T12:41:19Z" level=debug msg="Replace current raft nodes" raftMembers="[{{11 172.16.4.9:8443 spare} lxd9} {{18 172.16.4.13:8443 spare} lxd13} {{4 172.16.0.20:8443 voter} client} {{10 172.16.4.8:8443 spare} lxd8} {{12 172.16.4.4:8443 spare} lxd4} {{14 172.16.4.11:8443 spare} lxd11} {{15 172.16.4.12:8443 spare} lxd12} {{13 172.16.4.10:8443 spare} lxd10} {{16 172.16.4.14:8443 spare} lxd14} {{2 172.16.4.2:8443 spare} lxd2} {{7 172.16.4.5:8443 voter} lxd5} {{9 172.16.4.7:8443 stand-by} lxd7} {{1 172.16.4.1:8443 spare} lxd1} {{17 172.16.4.15:8443 spare} lxd15} {{3 172.16.4.3:8443 voter} lxd3} {{8 172.16.4.6:8443 stand-by} lxd6}]"
time="2023-07-28T12:41:19Z" level=debug msg="Handling API request" ip=@ method=GET protocol=unix url="/internal/containers/shpc-2291-instance-HiQDflgE/onstop?project=default&target=stop" username=root
time="2023-07-28T12:41:19Z" level=debug msg="Instance operation lock inherited for stop" action=stop instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:19Z" level=debug msg="WriteJSON\n\t{\n\t\t\"type\": \"sync\",\n\t\t\"status\": \"Success\",\n\t\t\"status_code\": 200,\n\t\t\"operation\": \"\",\n\t\t\"error_code\": 0,\n\t\t\"error\": \"\",\n\t\t\"metadata\": {}\n\t}" http_code=200
time="2023-07-28T12:41:19Z" level=debug msg="Instance stopped, cleaning up" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:20Z" level=debug msg="Stopping device" device=home instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=disk
time="2023-07-28T12:41:20Z" level=debug msg="UnmountCustomVolume started" driver=ceph pool=remote project=default volName=custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:20Z" level=debug msg="Unmounted RBD volume" driver=ceph keepBlockDev=false path=/var/snap/lxd/common/lxd/storage-pools/remote/custom/default_custom-volume-of-2291-431-MiwzE14g pool=remote volName=default_custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:21Z" level=debug msg="Deactivated RBD volume" driver=ceph pool=remote volName=custom_default_custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:21Z" level=debug msg="UnmountCustomVolume finished" driver=ceph pool=remote project=default volName=custom-volume-of-2291-431-MiwzE14g
time="2023-07-28T12:41:21Z" level=debug msg="Stopping device" device=root instance=shpc-2291-instance-HiQDflgE instanceType=container project=default type=disk
time="2023-07-28T12:41:21Z" level=debug msg="UnmountInstance started" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:21Z" level=debug msg="Unmounted RBD volume" driver=ceph keepBlockDev=false path=/var/snap/lxd/common/lxd/storage-pools/remote/containers/shpc-2291-instance-HiQDflgE pool=remote volName=shpc-2291-instance-HiQDflgE
time="2023-07-28T12:41:21Z" level=debug msg="Matched trusted cert" fingerprint=d44c9c3b94c28b9489a737f326ea44009be8b7fa6f3ec7579e76d04252d43e3a subject="CN=metrics.local"
time="2023-07-28T12:41:21Z" level=debug msg="Handling API request" ip="172.16.0.21:41750" method=GET protocol=tls url=/1.0/metrics username=d44c9c3b94c28b9489a737f326ea44009be8b7fa6f3ec7579e76d04252d43e3a
time="2023-07-28T12:41:21Z" level=warning msg="Failed to get memory usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:21Z" level=warning msg="Failed to get oom kills" err="Failed getting oom_kill" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:21Z" level=warning msg="Failed to get swap usage" err="Failed parsing \"\": strconv.ParseInt: parsing \"\": invalid syntax" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:21Z" level=warning msg="Failed to get CPUs" err="Failed parsing \"\": strconv.Atoi: parsing \"\": invalid syntax" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:21Z" level=warning msg="Failed to get total number of processes" err="PID of LXC instance could not be initialized" instance=shpc-2291-instance-HiQDflgE instanceType=container project=default
time="2023-07-28T12:41:21Z" level=debug msg="Deactivated RBD volume" driver=ceph pool=remote volName=container_shpc-2291-instance-HiQDflgE
time="2023-07-28T12:41:21Z" level=debug msg="UnmountInstance finished" driver=ceph instance=shpc-2291-instance-HiQDflgE pool=remote project=default
time="2023-07-28T12:41:22Z" level=info msg="Shut down instance" action=stop created="2023-04-21 00:50:36.467426965 +0000 UTC" ephemeral=false instance=shpc-2291-instance-HiQDflgE instanceType=container project=default stateful=false used="2023-07-28 12:41:18.260754049 +0000 UTC"
time="2023-07-28T12:41:22Z" level=debug msg="Instance operation lock finished" action=stop err="<nil>" instance=shpc-2291-instance-HiQDflgE project=default reusable=false
time="2023-07-28T12:41:22Z" level=debug msg="Scheduler: container shpc-2291-instance-HiQDflgE stopped: re-balancing"
time="2023-07-28T12:41:31Z" level=debug msg="Matched trusted cert" fingerprint=e9047f5581d5445d740979ef5ff2d7f87a37a1a8bbccdf93f89ea3872534c298 subject="CN=root@lxd3,O=linuxcontainers.org"
time="2023-07-28T12:41:31Z" level=debug msg="Replace current raft nodes" raftMembers="[{{4 172.16.0.20:8443 voter} client} {{7 172.16.4.5:8443 voter} lxd5} {{13 172.16.4.10:8443 spare} lxd10} {{15 172.16.4.12:8443 spare} lxd12} {{8 172.16.4.6:8443 stand-by} lxd6} {{10 172.16.4.8:8443 spare} lxd8} {{11 172.16.4.9:8443 spare} lxd9} {{14 172.16.4.11:8443 spare} lxd11} {{17 172.16.4.15:8443 spare} lxd15} {{18 172.16.4.13:8443 spare} lxd13} {{2 172.16.4.2:8443 spare} lxd2} {{9 172.16.4.7:8443 stand-by} lxd7} {{1 172.16.4.1:8443 spare} lxd1} {{12 172.16.4.4:8443 spare} lxd4} {{16 172.16.4.14:8443 spare} lxd14} {{3 172.16.4.3:8443 voter} lxd3}]"

this is outout of lxc info --show-log shpc-2291-instance-HiQDflgE

lxc shpc-2291-instance-HiQDflgE 20230728154855.659 DEBUG start - …/src/src/lxc/start.c:__lxc_start:2147 - Segmentation fault(11) - Container “shpc-2291-instance-HiQDflgE” init exited
lxc shpc-2291-instance-HiQDflgE 20230728154855.659 TRACE network - …/src/src/lxc/network.c:lxc_restore_phys_nics_to_netns:3759 - Moving physical network devices back to parent network namespace
lxc shpc-2291-instance-HiQDflgE 20230728154855.735 TRACE network - …/src/src/lxc/network.c:lxc_restore_phys_nics_to_netns:3788 - Moved network device “eth0” back to network namespace
lxc shpc-2291-instance-HiQDflgE 20230728154855.735 INFO error - …/src/src/lxc/error.c:lxc_error_set_and_log:34 - Child <1431960> ended on signal Segmentation fault(11)

Yep that looks like the issue. But not possible to say what is causing the init process to suddenly stop.
Do you see anything in the logs inside the container?

I found this error in output of lxc info --show-log xxx, but I can not find further helpful log.

I publish the bad instance to a image and export it as a file. it’s about 4GB.
When I launch a container with that image, I got same error.

https://cowtransfer.com/s/d64a15d212564d click link to check [ shpc_2291_instance_HiQDflgE_image.tar.gz.tar.gz ] or access cowtransfer.com and input extract code: 5kyru3 to check;

How did you create the problem instance? It sounds like its got some corruption in it.

hi, tomp. I dont know why it happen. I rebuild the instance finally.

1 Like