全部博文(76)
分类:
2010-03-10 13:23:09
----------------------------Part 1
Kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through the BIOS. The BIOS can be very time consuming, especially on big servers with numerous peripherals. This can save a lot of time for developers who end up booting a machine numerous times.
Kdump is a new kernel crash dumping mechanism and is very reliable. The crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever the system crashes. This second kernel, often called a capture kernel, boots with very little memory and captures the dump image.
The first kernel reserves a section of memory that the second kernel uses to boot. Be aware that the kdump reserves a significant amount of memory at the boot time, which changes the actual minimum memory requirements of Red Hat Enterprise Linux 5. To compute the actual minimum memory requirements for a system, refer to for the listed minimum memory requirements and add the amount of memory used by kdump to determine the actual minimum memory requirements.
Kexec enables booting the capture kernel without going through BIOS hence the contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
Verify the kexec-tools package is installed:
# rpm -q kexec-tools
If it is not installed proceed to install it via yum:
# yum install kexec-tools
The location of the Kdump vmcore must be specified in /etc/kdump.conf. Not specifying the vmcore location will result in undefined behavior. You can either dump directly to a device, to a file, or to some location on the network via NFS or SSH.
You can configure Kdump to dump directly to a device by using the raw directive in kdump.conf. The syntax to be used is
raw devicename
For example:
raw /dev/sda1
Please be aware that this will overwrite any data that was previously on the device.
Kdump can be configured to mount a partition and dump to a file on disk. This is done by specifying the filesystem type, followed by
the partition in kdump.conf. The partition may be specified as a device name, a filesystem label, or UUID in the same manner as /etc/fstab.
The default directory in which the core will be dumped is /var/crash/%DATE/ where %DATE is the current date at the time of the cash dump. For example:
ext3 /dev/sda1
will mount /dev/sda1 as an ext3 device and dump the core to /var/crash/, whereas
ext3 LABEL=/boot
will mount the device that is ext3 with the label /boot. On most Red Hat Enterprise Linux installations, this will be the /boot directory. The easiest way to find how to specify the device is to look at what you're currently using in /etc/fstab. The default directory in which the core will be dumped is /var/crash/%DATE/ where %DATE is the current date at the time of the crash dump. This can be changed by using the path directive in kdump.conf. For example:
ext3 UUID=f15759be-89d4-46c4-9e1d-1b67e5b5da82
path /usr/local/cores
will dump the vmcore to /usr/local/cores/ instead of the default /var/crash/ location.
To configure kdump to dump to an NFS mount, edit /etc/kdump.conf and add a line with the following format:
net :/nfs/mount
For example:
net nfs.example.com:/export/vmcores
This will dump the vmcore to /export/vmcores/var/crash/[hostname]-[date] on the server nfs.example.com. The client system must have access to write to this mount point.
SSH has the advantage of encrypting network communications while dumping. For this reason this is the best solution when you're required to dump a vmcore across a publicly accessible network such as the Internet or a corporate WAN.
net user@
For example:
net kdump@crash.example.com
In this case, kdump will use scp to connect to the crash.example.com server using the kdump user. It will copy the vmcore to the /var/crash/[host]-[date]/ directory. The kdump user will need the necessary write permissions on the remote server.
To make this change take effect, run service kdump propagate, which should result in output similar to the following:
Generating new ssh keys... done,
kdump@crash.example.com's password:
/root/.ssh/kdump_id_rsa.pub has been added to
~kdump/.ssh/authorized_keys2 on crash.example.com
On large memory systems, it is advisable to both discard pages that are not needed and to compress remaining pages. This is done in kdump.conf with the core_collector command. At this point in them the only fully supported core collector is makedumpfile. The options can be viewed with makedumpfile --help. The -d option specifies which types of pages should be left out. The option is a bit mask, having each page type specified like so:
zero pages = 1
cache pages = 2
cache private = 4
user pages = 8
free pages = 16
In general, these pages don't contain relevent information. To set all these flags and leave out these pages, use a value of -d 31. The -c tells makedumpfile to compress the remaining data pages.
#core_collector makedumpfile -d 1 # throw out zero pages (containing no data)
#core_collector makedumpfile -d 31 # throw out all trival pages
#core_collector makedumpfile -c # compress all pages, but leave them all
core_collector makedumpfile -d 31 -c # throw out trival pages and compress (recommended)
Keep in mind that using the -d and -c options will marginally increase the ammount of time required to gather a cores.
Modify some boot parameters to reserve a chunk of memory for the capture kernel. For i386 and x86_64 architectures, edit /etc/grub.conf, and append crashkernel=128M@16M to the end of the kernel line.
Note: It may be possible to use less than 128M, but testing with only 64M has proven unreliable. This is an example of /etc/grub.conf with the kdump options added:
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE: You do not have a /boot partition. This means that
# all kernel and initrd paths are relative to /, eg.
# root (hd0,0)
# kernel /boot/vmlinuz-version ro root=/dev/hda1
# initrd /boot/initrd-version.img
#boot=/dev/hda
default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Client (2.6.17-1.2519.4.21.el5)
root (hd0,0)
kernel /boot/vmlinuz-2.6.17-1.2519.4.21.el5 ro root=LABEL=/ rhgb quiet crashkernel=128M@16M
initrd /boot/initrd-2.6.17-1.2519.4.21.el5.img
After making the above changes, reboot the system. The 128M of memory (starting 16M into the memory) is left untouched by the normal system, reserved for the capture kernel. Take note that the output of free -m shows 128M less memory than without this parameter, which is expected.
Now that the reserved memory region is set up, turn on the kdump init script and start the service:
# chkconfig kdump on
# service kdump start
This will create a /boot/initrd-kdump.img, leaving the system ready to capture a vmcore upon crashing. To test this, force-crash the system using sysrq:
Warning: This will panic your kernel, killing all services on the machine
# echo "c" > /proc/sysrq-trigger
This causes the kernel to panic, followed by the system restarting into the kdump kernel. When the boot process gets to the point where it starts the kdump service, the vmcore should be copied out to disk to the location you specified in the /etc/kdump.conf file.
NOTE: Console frame-buffers and X are not properly supported. On a system typically run with something like "vga=791" in the kernel config line or with X running, console video will be garbled when a kernel is booted via kexec. Note that the kdump kernel should still be able to create a dump, and when the system reboots, video should be restored to normal.
NOTE: After any changes to /etc/kdump.conf the kdump service needs to be restarted to load the new kdump settings in the kernel.
----------------------------
service kdump restart
Part 2
----------------------------
This page describes the configuration for open sharedroot clusters with kernel dump functionality enabled. Howto configure it.
With introduction of RHEL5 the way of dumping kernel memory on disk when a kernel crashes completely changed and substituted the old kernelmodules diskdump and netdump with a generic way of booting a second kernel into a memory segment previously allocated (at boottime). In this segment a new kernel is booted with the kexec utils whenever a system crashes. Now a more relyable and flexible way of writing vmcores on many different storagedevice is available as a completely new kernel is booted and can load any type of module needed.
First of all the kernel has to be given at boottime a bootparameter in order to allocate that memory segment holding the rescue kernel. The parameter should be specified with your favourite bootloader. The parameter is called crashkernel and the default is adviced to be 128M@16M. That means the kernel allocates 128MByte or RAM at the offset 16MByte at which this reservation starts. An example for a valid open sharedroot cluster kernel cmdline could look as follows
[root@test ~]# cat /proc/cmdline
root=/dev/vg_streaming_sr/lv_sharedroot rhgb quiet com-debug crashkernel=128M@16M
As a second prerequesit it might be good to have the debuginfo rpms and crash utility for the kernels installed. If you are using the the core_collector (see below) these rpms are required. In order to install it with RHEL5 do the following
yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
For the time being it is supposed that there is a diskpartition formated with ext3 that can hold the crashimages. This could be also reconfigured to be a nfs-share or the like. But is not yet tested. Let's say the partition would be /dev/sdd1 then the filesytem would be created with mkfs.ext3 -Lcrash /dev/sdd1.
The only option that should be configured in this file is the place where the kdump kernel and initrd should be found. This is because an open sharedroot cluster has no /boot filesystem mounted. Because of this we create a directory called /var/lib/kdump and setup the option KDUMP_BOOTDIR in /etc/sysconfig/kdump to point to there
#Where to find the boot image
KDUMP_BOOTDIR="/var/lib/kdump"
There are two files that influence the way how
This file holds the more interesting parameters to influence the kdump. The interesting parameters will be displayed below.
fs type partition: Will mount -t**fs type** partition /mnt and copy /proc/vmcore to /mnt/var/crash/%DATE/. NOTE: partition can be a device node, label or uuid. And the relevant modules to mount the filesystem should be specified also
extra_modules module(s): This directive allows you to specify extra kernel modules that you want to be loaded in the kdump initrd, typically used to set up access to non-boot-path dump targets that might otherwise not be accessible in the kdump environment. Multiple modules can be listed, separated by a space, and any dependent modules will automatically be included. NOTE: Even for ext3 you'll need ext3, jbd
core_collector makedumpfile options: This directive allows you to use the dump filtering program makedumpfile to retrieve your core, which on some arches can drastically reduce core file size. See /sbin/makedumpfile --help for a list of options. NOTE: that the -i and -g options are not needed here, as the initrd will automatically be populated with a config file appropriate for the running kernel.
dump |
zero |
cache |
cache |
user |
free |
level |
page |
page |
private |
data |
page |
0 |
|||||
1 |
X |
||||
2 |
X |
||||
4 |
X |
X |
|||
8 |
X |
||||
16 |
X |
||||
31 |
X |
X |
X |
X |
X |
The fence post fail delay should be adapted so that a dump can be written before the node is fenced. Then maximum time adviced by Redhat is somewhere around 30 seconds. Normally the maximum time to dump can be equated as follows with a average disk write rate of 50MB/sec. With a fully utilized memory the different dump level will not influence time. Nevertheless using 31 is at average the fastest rate.
time[sec] = Ramsize in MB / 50 MB/sec
6GB RAM would result in max: 123sec; 2min 3sec
ext3 LABEL=crash
core_collector makedumpfile -d 31
extra_modules cciss ext3 jbd
# Kernel Version string for the -kdump kernel, such as 2.6.13-1544.FC5kdump
# If no version is specified, then the init script will try to find a
# kdump kernel with the same version number as the running kernel.
KDUMP_KERNELVER=""
# The kdump commandline is the command line that needs to be passed off to
# the kdump kernel. This will likely match the contents of the grub kernel
# line. For example:
# KDUMP_COMMANDLINE="ro root=LABEL=/"
# If a command line is not specified, the default will be taken from
# /proc/cmdline
KDUMP_COMMANDLINE=""
# This variable lets us append arguments to the current kdump commandline
# As taken from either KDUMP_COMMANDLINE above, or from /proc/cmdline
KDUMP_COMMANDLINE_APPEND="irqpoll maxcpus=1"
# Any additional kexec arguments required. In most situations, this should
# be left empty
#
# Example:
# KEXEC_ARGS="--elf32-core-headers"
KEXEC_ARGS=" --args-linux"
#Where to find the boot image
KDUMP_BOOTDIR="/var/lib/kdump"
#What is the image type used for kdump
KDUMP_IMG="vmlinuz"
#What is the images extension. Relocatable kernels don't have one
KDUMP_IMG_EXT=""
dm_task_set_name: Device /dev/mapper/Groups-Volume not found
Command failed
connect() failed on local socket: Connection refused
Skipping clustered volume group vg_streaming_data
Skipping clustered volume group vg_streaming_sr
dm_task_set_name: Device /dev/mapper/-LV not found
Command failed
connect() failed on local socket: Connection refused
Skipping clustered volume group vg_streaming_data
Skipping clustered volume group vg_streaming_sr
connect() failed on local socket: Connection refused
Skipping clustered volume group vg_streaming_data
Skipping clustered volume group vg_streaming_sr
dm_task_set_name: Device /dev/mapper/-lv_tmp not found
Command failed
Saving to the local filesystem /dev/cciss/c0d0p2
e2fsck 1.38 (30-Jun-2005)
crash: clean, 13/977280 files, 67781/1954320 blocks
[100 %]
The dumpfile is saved to /mnt//var/crash/127.0.0.1-2008-02-18-17:59:55/vmcore-in
complete.
makedumpfile Completed.
Saving core complete