Issue report for lxd loki logging in grafana dashboard

FYI, I’ve set up Prometheus metrics and Loki logging with the Grafana dashboard, and have found an issue with the Grafana dashboard’s handling of loki.

The Loki log panels at the bottom of the dashboard don’t display anything. This is because the queries are wrong. They have {app="lxd",type="lifecycle",instance="$job"} for the first panel, and the second is the same but with type="logging". However the “instance” label actually contains the hostname, not the job name or the container name.

I propose that at simplest, this needs to change to

{app="lxd",type="lifecycle",name=~"|$name"}

but more sophisticated is:

{app="lxd", type="logging", name=~"|$name"}  | logfmt | context_instance=~"|$name"

The vertical bar inside the regexp is because the “name” label may be missing. Here are some example logs:

2024-03-15T13:10:46Z {instance="nuc1", location="none", name="packetbeat", project="default", type="lifecycle"} action="instance-restarted" source="/1.0/instances/packetbeat" requester-address="@" requester-protocol="unix" requester-username="root" instance-restarted
2024-03-15T13:10:46Z {instance="nuc1", location="none", type="logging"}                                         context-action="shutdown" context-created="2022-06-24 12:38:48.192135561 +0000 UTC" context-ephemeral="false" context-instance="packetbeat" context-instanceType="container" context-project="default" context-timeout="10m0s" context-used="2023-09-07 08:38:24.929923055 +0000 UTC" level="info" Restarted instance
2024-03-15T13:10:39Z {instance="nuc1", location="none", type="logging"}                                         context-action="shutdown" context-created="2022-06-24 12:38:48.192135561 +0000 UTC" context-ephemeral="false" context-instance="packetbeat" context-instanceType="container" context-project="default" context-timeout="10m0s" context-used="2023-09-07 08:38:24.929923055 +0000 UTC" level="info" Restarting instance
2024-03-15T13:04:29Z {instance="nuc3", type="lifecycle"}                                                        action="config-updated" source="/1.0" requester-address="@" requester-protocol="unix" requester-username="root" config-updated
2024-03-15T13:04:25Z {instance="nuc2", type="lifecycle"}                                                        action="config-updated" source="/1.0" requester-address="@" requester-protocol="unix" requester-username="root" config-updated
2024-03-15T13:04:17Z {instance="nuc1", location="none", type="lifecycle"}                                       requester-protocol="unix" requester-username="root" action="config-updated" source="/1.0" requester-address="@" config-updated
2024/03/15 13:53:19 http://localhost:3100/loki/api/v1/query_range?direction=BACKWARD&end=1710507857767301302&limit=1000&query=%7Bapp%3D%22lxd%22%7D&start=1710507199542190712
2024/03/15 13:53:19 Common labels: {app="lxd", instance="nuc1", location="none", type="lifecycle"}

Notice how the first log has the name of the container as a label (name="packetbeat"), but some other logs relating to this container don’t. They may have it buried in the logfmt data though, e.g. context-instance="packetbeat"

Hence the more sophisticated filter adds | logfmt | context_instance=~"|$name" (note that the hyphen is converted to underscore when loki parses the logfmt message to create a pseudo-label). If the name label is present, then it must contain the selected container instance name; similarly, if the log message contains context_instance, then it must match the selected container instance name.

1 Like

Hello, thanks for reporting this. Could you please file a new issue over at https://github.com/canonical/lxd/issues?

Done, https://github.com/canonical/lxd/issues/13165

3 Likes