Kernel crash dump

A ‘kernel crash dump’ refers to a portion of the contents of volatile memory (RAM) that is copied to disk whenever the execution of the kernel is disrupted. The following events can cause a kernel disruption:

  • Kernel panic

  • Non-maskable interrupts (NMI)

  • Machine check exceptions (MCE)

  • Hardware failure

  • Manual intervention

For some of these events (kernel panic, NMI) the kernel will react automatically and trigger the crash dump mechanism through kexec. In other situations a manual intervention is required in order to capture the memory. Whenever one of the above events occurs, it is important to find out the root cause in order to prevent it from happening again. The cause can be determined by inspecting the copied memory contents.

Kernel crash dump mechanism

When a kernel panic occurs, the kernel relies on the kexec mechanism to quickly reboot a new instance of the kernel in a pre-reserved section of memory that had been allocated when the system booted (see below). This permits the existing memory area to remain untouched in order to safely copy its contents to storage.

Installation

The kernel crash dump utility is installed with the following command:

sudo apt install linux-crashdump

Note:
Starting with 16.04, the kernel crash dump mechanism is enabled by default.

During the installation, you will be prompted with the following dialogs.

 |------------------------| Configuring kexec-tools |------------------------|
 |                                                                           |
 |                                                                           |
 | If you choose this option, a system reboot will trigger a restart into a  |
 | kernel loaded by kexec instead of going through the full system boot      |
 | loader process.                                                           |
 |                                                                           |
 | Should kexec-tools handle reboots (sysvinit only)?                        |
 |                                                                           |
 |                    <Yes>                       <No>                       |
 |                                                                           |
 |---------------------------------------------------------------------------|

Select ‘Yes’ to select kexec-tools for all reboots.

 |------------------------| Configuring kdump-tools |------------------------|
 |                                                                           |
 |                                                                           |
 | If you choose this option, the kdump-tools mechanism will be enabled.  A  |
 | reboot is still required in order to enable the crashkernel kernel        |
 | parameter.                                                                |
 |                                                                           |
 | Should kdump-tools be enabled be default?                                 |
 |                                                                           |
 |                    <Yes>                       <No>                       |
 |                                                                           |
 |---------------------------------------------------------------------------|

‘Yes’ should be selected here as well, to enable kdump-tools.

If you ever need to manually enable the functionality, you can use the dpkg-reconfigure kexec-tools and dpkg-reconfigure kdump-tools commands and answer ‘Yes’ to the questions. You can also edit /etc/default/kexec and set parameters directly:

# Load a kexec kernel (true/false)
LOAD_KEXEC=true

As well, edit /etc/default/kdump-tools to enable kdump by including the following line:

USE_KDUMP=1

If a reboot has not been done since installation of the linux-crashdump package, a reboot will be required in order to activate the crashkernel= boot parameter. Upon reboot, kdump-tools will be enabled and active.

If you enable kdump-tools after a reboot, you will only need to issue the kdump-config load command to activate the kdump mechanism.

You can view the current status of kdump via the command kdump-config show. This will display something like this:

DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 
   /var/lib/kdump/vmlinuz
kdump initrd: 
   /var/lib/kdump/initrd.img
current state:    ready to kdump
kexec command:
  /sbin/kexec -p --command-line="..." --initrd=...

This tells us that we will find core dumps in /var/crash.

Configuration

In addition to local dump, it is now possible to use the remote dump functionality to send the kernel crash dump to a remote server, using either the SSH or NFS protocols.

Local kernel crash dumps

Local dumps are configured automatically and will remain in use unless a remote protocol is chosen. Many configuration options exist and are thoroughly documented in the /etc/default/kdump-tools file.

Remote kernel crash dumps using the SSH protocol

To enable remote dumps using the SSH protocol, the /etc/default/kdump-tools must be modified in the following manner:

# ---------------------------------------------------------------------------
# Remote dump facilities:
# SSH - username and hostname of the remote server that will receive the dump
#       and dmesg files.
# SSH_KEY - Full path of the ssh private key to be used to login to the remote
#           server. use kdump-config propagate to send the public key to the
#           remote server
# HOSTTAG - Select if hostname of IP address will be used as a prefix to the
#           timestamped directory when sending files to the remote server.
#           'ip' is the default.
SSH="ubuntu@kdump-netcrash"

The only mandatory variable to define is SSH. It must contain the username and hostname of the remote server using the format {username}@{remote server}.

SSH_KEY may be used to provide an existing private key to be used. Otherwise, the kdump-config propagate command will create a new keypair. The HOSTTAG variable may be used to use the hostname of the system as a prefix to the remote directory to be created instead of the IP address.

The following example shows how kdump-config propagate is used to create and propagate a new keypair to the remote server:

sudo kdump-config propagate

Which produces an output like this:

Need to generate a new ssh key...
The authenticity of host 'kdump-netcrash (192.168.1.74)' can't be established.
ECDSA key fingerprint is SHA256:iMp+5Y28qhbd+tevFCWrEXykDd4dI3yN4OVlu3CBBQ4.
Are you sure you want to continue connecting (yes/no)? yes
ubuntu@kdump-netcrash's password: 
propagated ssh key /root/.ssh/kdump_id_rsa to server ubuntu@kdump-netcrash

The password of the account used on the remote server will be required in order to successfully send the public key to the server.

The kdump-config show command can be used to confirm that kdump is correctly configured to use the SSH protocol:

kdump-config show

Whose output appears like this:

DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2c000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.4.0-10-generic
kdump initrd: 
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-10-generic
SSH:              ubuntu@kdump-netcrash
SSH_KEY:          /root/.ssh/kdump_id_rsa
HOSTTAG:          ip
current state:    ready to kdump

Remote kernel crash dumps using the NFS protocol

To enable remote dumps using the NFS protocol, the /etc/default/kdump-tools must be modified in the following manner:

# NFS -     Hostname and mount point of the NFS server configured to receive
#           the crash dump. The syntax must be {HOSTNAME}:{MOUNTPOINT} 
#           (e.g. remote:/var/crash)
#
NFS="kdump-netcrash:/var/crash"

As with the SSH protocol, the HOSTTAG variable can be used to replace the IP address by the hostname as the prefix of the remote directory.

The kdump-config show command can be used to confirm that kdump is correctly configured to use the NFS protocol :

kdump-config show

Which produces an output like this:

DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2c000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.4.0-10-generic
kdump initrd: 
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-10-generic
NFS:              kdump-netcrash:/var/crash
HOSTTAG:          hostname
current state:    ready to kdump

Verification

To confirm that the kernel dump mechanism is enabled, there are a few things to verify. First, confirm that the crashkernel boot parameter is present (note that the following line has been split into two to fit the format of this document):

cat /proc/cmdline
    
BOOT_IMAGE=/vmlinuz-3.2.0-17-server root=/dev/mapper/PreciseS-root ro
     crashkernel=384M-2G:64M,2G-:128M

The crashkernel parameter has the following syntax:

crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
    range=start-[end] 'start' is inclusive and 'end' is exclusive.

So for the crashkernel parameter found in /proc/cmdline we would have :

crashkernel=384M-2G:64M,2G-:128M

The above value means:

  • if the RAM is smaller than 384M, then don’t reserve anything (this is the “rescue” case)

  • if the RAM size is between 386M and 2G (exclusive), then reserve 64M

  • if the RAM size is larger than 2G, then reserve 128M

Second, verify that the kernel has reserved the requested memory area for the kdump kernel by running:

dmesg | grep -i crash

Which produces the following output in this case:

...
[    0.000000] Reserving 64MB of memory at 800MB for crashkernel (System RAM: 1023MB)

Finally, as seen previously, the kdump-config show command displays the current status of the kdump-tools configuration :

kdump-config show

Which produces:

DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x2c000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.4.0-10-generic
kdump initrd: 
      /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.4.0-10-generic
current state:    ready to kdump

kexec command:
      /sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-4.4.0-10-generic root=/dev/mapper/VividS--vg-root ro debug break=init console=ttyS0,115200 irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz

Testing the crash dump mechanism

Warning:
Testing the crash dump mechanism will cause a system reboot. In certain situations, this can cause data loss if the system is under heavy load. If you want to test the mechanism, make sure that the system is idle or under very light load.

Verify that the SysRQ mechanism is enabled by looking at the value of the /proc/sys/kernel/sysrq kernel parameter:

cat /proc/sys/kernel/sysrq

If a value of 0 is returned, the dump and then reboot feature is disabled. A value greater than 1 indicates that a sub-set of sysrq features is enabled. See /etc/sysctl.d/10-magic-sysrq.conf for a detailed description of the options and their default values. Enable dump then reboot testing with the following command:

sudo sysctl -w kernel.sysrq=1

Once this is done, you must become root, as just using sudo will not be sufficient. As the root user, you will have to issue the command echo c > /proc/sysrq-trigger. If you are using a network connection, you will lose contact with the system. This is why it is better to do the test while being connected to the system console. This has the advantage of making the kernel dump process visible.

A typical test output should look like the following :

sudo -s
[sudo] password for ubuntu: 
# echo c > /proc/sysrq-trigger
[   31.659002] SysRq : Trigger a crash
[   31.659749] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   31.662668] IP: [<ffffffff8139f166>] sysrq_handle_crash+0x16/0x20
[   31.662668] PGD 3bfb9067 PUD 368a7067 PMD 0 
[   31.662668] Oops: 0002 [#1] SMP 
[   31.662668] CPU 1 
....

The rest of the output is truncated, but you should see the system rebooting and somewhere in the log, you will see the following line :

Begin: Saving vmcore from kernel crash ...

Once completed, the system will reboot to its normal operational mode. You will then find the kernel crash dump file, and related subdirectories, in the /var/crash directory by running, e.g. ls /var/crash , which produces the following:

201809240744  kexec_cmd  linux-image-4.15.0-34-generic-201809240744.crash

If the dump does not work due to an ‘out of memory’ (OOM) error, then try increasing the amount of reserved memory by editing /etc/default/grub.d/kdump-tools.cfg. For example, to reserve 512 megabytes:

GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=384M-:512M"

You can then run sudo update-grub, reboot afterwards, and then test again.

Resources

Kernel crash dump is a vast topic that requires good knowledge of the Linux kernel. You can find more information on the topic here:

Hello Powersj,
I followed the instructions on Ubuntu 20.04LTS running on a EFI only system with secure boot enabled and I can’t seem to active kdump. I get the following error:

 #kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x64000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-5.4.0-66-lowlatency
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-5.4.0-66-lowlatency
current state:    Not ready to kdump

kexec command:
  no kexec command recorded


 #kdump-config load
 * Creating symlink /var/lib/kdump/vmlinuz
 * Creating symlink /var/lib/kdump/initrd.img
kexec_file_load failed: Operation not permitted
 * failed to load kdump kernel

#cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.4.0-66-lowlatency root=/dev/mapper/vgdata-lvroot ro console=tty0 console=ttyS6,115200n8 rootdelay=5 net.ifnames=0 ipv6.disable=1 crashkernel=512M-:192M

Any suggestions?

Bonjour Powersj, je n’arrive plus à demarer le linux ni le Windows. Quand j’allume l’ordinateur la console apparaît et on me demande de télécharger d’abord le kernel. Cependant je ne sais pas comment procéder à partir de la console

Vu que les commande sudo ne sont pas trouvable

Hi haimiko, sorry you hear you had trouble with kdump and secure boot. Hopefully you’ve long since resolved the issue. If you did, and you have some copyedit suggestions to improve this document, please follow up with the suggested text. Unfortunately this Discourse document is not a great way to do troubleshooting of usage issues, so if you happen to still be looking for help I’d recommend other technical support channels.

1 Like

The kernel crash dump guide needs updates

Here is a link to each of the events that can cause a kernel disruption
https://wiki.ubuntu.com/Kernel/CrashdumpRecipe

echo 1 > /proc/sys/kernel/hung_task_panic # panic when hung task is detected
echo 1 > /proc/sys/kernel/panic_on_io_nmi # panic on NMIs from I/O
echo 1 > /proc/sys/kernel/panic_on_oops # panic on oops or kernel bug detection
echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi # panic on NMIs from memory or unknown
echo 1 > /proc/sys/kernel/softlockup_panic # panic when soft lockups are detected
echo 1 > /proc/sys/vm/panic_on_oom # panic when out-of-memory happens

Changes under the default prompts after installing linux-crashdump.
During the installation, you will be prompted with the following dialogs.
“you can select no” for kexec-tools handling reboots. In fact it recommended to select no as this is the current default and
this step is unrelated to kernel panics and kernel crash dump collection and it applies to all
reboots. 'Yesis not actually required; This does change reboot behavior for the system not to actually reboot in the traditional fashion (soft power cycle) which might have issues (eg, some devices might not correctly handle a software-only re-initialization, instead of an actual soft power cycle), soNois a valid and safe choice, whileYes` may be seen as a reboot time ‘optimization’ if the system/devices handle it well.

‘Yes’ should be selected here as well, to enable kdump-tools.
Remove the ‘as well’, per the above.

You can also edit /etc/default/kexec and set parameters directly:
this needs to be updated to say /etc/default/kdump-tools

If you enable kdump-tools after a reboot, you will only need to issue the kdump-config load command to activate the kdump mechanism.

I don’t believe this is a correct statement
“If you enable kdump-tools after a reboot with the crashkernel= parameter in place, you will only need…”, since without the memory reservation for the crash kernel to be kexec’ed on, there’s little that kdump-tools itself can do about it.

Begin: Saving vmcore from kernel crash …
This log line is out of date.

needs to say this

Started Kernel crash dump capture service

Troubleshooting -
If you are trying to crashdump on a cloud like AWS please add the following lines
to /etc/sysctl.conf and put at bottom of file.

kernel.sysctl=1
kernel.unknown_nmi_panic=1

yeah, i agree with @hypothetical-lemon this looks out of date a lot.

Is there server technical author to improve this?

Thanks for this heads up. I took a note internally to discuss with the team how to handle this topic.

Also, how to crash dump in the cloud like AWS is slightly different from doing it in say Azure.

Could you additionally clarify the statement:
Starting with 16.04, the kernel crash dump mechanism is enabled by default.

This (to me) implies that linux-crashdump is installed in 16.04 and beyond by default, but this isn’t the case on my 22.04 jammy stock image, nor is it the case on public cloud images (I’m not sure it is anywhere). Does this mean something else maybe?