Kernel Crash Trace Log

From Proxmox VE
Revision as of 12:40, 28 November 2016 by Wolfgang Link (talk | contribs)
Jump to navigation Jump to search

Introduction

  • Sometimes you are not able to get a trace log of crashed kernel because it is not written into any file in a filesystem. This guide is aimed to help with getting a kernel crash log.
  • The easiest way to archive the goal is to utilize a remote system through network. This example shows the case with two Proxmox VE 2.X hosts. One of them (server1) is the host to debug and another one (server2) is the host to catch a log.
  • Server1 IP: 10.10.10.1
  • Server2 IP: 10.10.10.2

Server1 configuration

  • Run the following command to load netconsole module
modprobe netconsole netconsole=@10.10.10.1/,@10.10.10.2/

or with MAC address (MAC address is needed if a switch doesn't allow broadcast or server2 is in another network, i.e. behind a router)

modprobe netconsole netconsole=@10.10.10.1/,@10.10.10.2/xx:xx:xx:xx:xx:xx

where xx:xx:xx:xx:xx:xx - MAC address of server2 or MAC address of the router

Find out MAC address

  1. ping 10.10.10.2
    or "ping default_gateway_IP", check "ip route | grep default" for the IP
  2. arp -n 10.10.10.2
    or "arp -n default_gateway_IP"

Server2 configuration

  1. Create a new file /etc/rsyslog.d/01-netconsole-collector.conf with the following content:
    # Start UDP server on port 6666
    $ModLoad imudp
    $UDPServerRun 6666
    
    # Define templates
    $template NetconsoleFile,"/var/log/netconsole/%fromhost-ip%.log"
    $template NetconsoleFormat,"%rawmsg%"
    
    # Accept endline characters (unfortunatelly these options are global)
    $EscapeControlCharactersOnReceive off
    $DropTrailingLFOnReception off
    
    # Store collected logs using templates without local ones
    :fromhost-ip, !isequal, "127.0.0.1"     ?NetconsoleFile;NetconsoleFormat
    
    # Discard logs match the rule above
    & ~
    
  2. Restart rsyslog
    /etc/init.d/rsyslog restart
    
  • Note: it's good idea to disable this configuration after you have done with debugging because $EscapeControlCharactersOnReceive and $DropTrailingLFOnReception are global options and they change default behaviour of rsyslog.
mv /etc/rsyslog.d/01-netconsole-collector.conf /etc/rsyslog.d/01-netconsole-collector.conf.disabled
/etc/init.d/rsyslog restart

Examination

  • We can check if everything works by causing kernel crash intentionally (be careful!, it's going to be a real crash), type the following command on server1
echo c > /proc/sysrq-trigger
  • Server1 will crash and you should get a crash log in /var/log/netconsole/10.10.10.1.log on server2