|
Linux Device Drivers, 2nd Edition
2nd Edition June 2001
0-59600-008-1, Order Number: 0081
586 pages, $39.95
|
Chapter 4
Debugging Techniques
Contents:
One of the most compelling problems for anyone writing kernel code is
how to approach debugging. Kernel code cannot be easily executed under
a debugger, nor can it be easily traced, because it is a set of
functionalities not related to a specific process. Kernel code errors
can also be exceedingly hard to reproduce and can bring down the
entire system with them, thus destroying much of the evidence that
could be used to track them down.
This chapter introduces techniques you can use to monitor kernel code
and trace errors under such trying circumstances.
-
-
-
-
Used to report error conditions; device drivers will often use
KERN_ERR to report hardware difficulties.
-
-
Situations that are normal, but still worthy of note. A number of
security-related conditions are reported at this level.
-
-
A printk statement with no specified priority
defaults to DEFAULT_MESSAGE_LOGLEVEL, specified in
kernel/printk.c as an integer. The default
loglevel value has changed several times during Linux development, so
we suggest that you always specify an explicit loglevel.
Based on the loglevel, the kernel may print the message to the current
console, be it a text-mode terminal, a serial line printer, or a
parallel printer. If the priority is less than the integer variable
console_loglevel, the message is displayed. If both
klogd and
syslogd are running on the system, kernel
messages are appended to /var/log/messages (or
otherwise treated depending on your syslogdconfiguration), independent of console_loglevel. If
klogd is not running, the message won't
reach user space unless you read /proc/kmsg.
The variable console_loglevel is initialized to
DEFAULT_CONSOLE_LOGLEVEL and can be modified
through the sys_syslog system call. One way to
change it is by specifying the -c switch
when invoking klogd, as specified in the
klogd manpage. Note that to change the
current value, you must first kill klogdand then restart it with the -c option.
Alternatively, you can write a program to change the console
loglevel. You'll find a version of such a program in
misc-progs/setlevel.c in the source files
provided on the O'Reilly FTP site. The new level is specified as an
integer value between 1 and 8, inclusive. If it is set to 1, only
messages of level 0 (KERN_EMERG) will reach the
console; if it is set to 8, all messages, including debugging ones,
will be displayed.
You'll probably want to lower the loglevel if you work on the console
and you experience a kernel fault (see "Debugging System Faults"
later in this chapter), because the fault-handling code raises the
console_loglevel to its maximum value, causing
every subsequent message to appear on the console. You'll want to
raise the loglevel if you need to see your debugging messages; this is
useful if you are developing kernel code remotely and the text console
is not being used for an interactive session.
From version 2.1.31 on it is possible to read and modify the console
loglevel using the text file
/proc/sys/kernel/printk. The file hosts four
integer values. You may be interested in the first two: the current
console loglevel and the default level for messages.
With recent kernels, for instance, you can cause all kernel messages
to appear at the console by simply entering
Linux allows for some flexibility in console logging policies by
letting you send messages to a specific virtual console (if your
console lives on the text screen). By default, the "console" is the
current virtual terminal. To select a different virtual terminal to
receive messages, you can issue ioctl(TIOCLINUX) on
any console device. The following program,
setconsole, can be used to choose which
console receives kernel messages; it must be run by the superuser and
is available in the misc-progs directory.
The printk function writes messages into a
circular buffer that is LOG_BUF_LEN (defined in
kernel/printk.c) bytes long. It then wakes any
process that is waiting for messages, that is, any process that is
sleeping in the syslog system call or that
is reading /proc/kmsg. These two interfaces to
the logging engine are almost equivalent, but note that reading from
/proc/kmsg consumes the data from the log buffer,
whereas the syslog system call can optionally
return log data while leaving it for other processes as well. In
general, reading the /proc file is easier, which
is why it is the default behavior for
klogd.
If the circular buffer fills up, printk wraps
around and starts adding new data to the beginning of the buffer,
overwriting the oldest data. The logging process thus loses the
oldest data. This problem is negligible compared with the advantages
of using such a circular buffer. For example, a circular buffer allows
the system to run even without a logging process, while minimizing
memory waste by overwriting old data should nobody read it. Another
feature of the Linux approach to messaging is that
printk can be invoked from anywhere, even from an
interrupt handler, with no limit on how much data can be printed. The
only disadvantage is the possibility of losing some data.
If the klogd process is running, it
retrieves kernel messages and dispatches them to
syslogd, which in turn checks
/etc/syslog.conf to find out how to deal with
them. syslogd differentiates between
messages according to a facility and a priority; allowable values for
both the facility and the priority are defined in
. Kernel messages are logged
by the LOG_KERN facility, at a priority
corresponding to the one used in printk (for
example, LOG_ERR is used for
KERN_ERR messages). If
klogd isn't running, data remains in the
circular buffer until someone reads it or the buffer overflows.
During the early stages of driver development,
printk can help considerably in debugging and
testing new code. When you officially release the driver, on the
other hand, you should remove, or at least disable, such print
statements. Unfortunately, you're likely to find that as soon as you
think you no longer need the messages and remove them, you'll
implement a new feature in the driver (or somebody will find a bug)
and you'll want to turn at least one of the messages back on. There
are several ways to solve both issues, to globally enable or disable
your debug messages and to turn individual messages on or off.
The following code fragment implements these features and comes
directly from the header scull.h.
#undef PDEBUG /* undef it, just in case */
#ifdef SCULL_DEBUG
# ifdef __KERNEL__
/* This one if debugging is on, and kernel space */
# define PDEBUG(fmt, args...) printk( KERN_DEBUG "scull: " fmt,
## args)
# else
/* This one for user space */
# define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args)
# endif
#else
# define PDEBUG(fmt, args...) /* not debugging: nothing */
#endif
#undef PDEBUGG
#define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */
But every driver has its own features and monitoring needs. The art of
good programming is in choosing the best trade-off between flexibility
and efficiency, and we can't tell what is the best for you. Remember
that preprocessor conditionals (as well as constant expressions in the
code) are executed at compile time, so you must recompile to turn
messages on or off. A possible alternative is to use C conditionals,
which are executed at runtime and therefore permit you to turn
messaging on and off during program execution. This is a nice
feature, but it requires additional processing every time the code is
executed, which can affect performance even when the messages are
disabled. Sometimes this performance hit is unacceptable.
Two main techniques are available to driver developers for querying
the system: creating a file in the /procfilesystem and using the ioctl driver method. You
may use devfs as an alternative to
/proc, but /proc is an
easier tool to use for information retrieval.
The /proc filesystem is a special,
software-created filesystem that is used by the kernel to export
information to the world. Each file under /procis tied to a kernel function that generates the file's "contents" on
the fly when the file is read. We have already seen some of these
files in action; /proc/modules, for example,
always returns a list of the currently loaded modules.
/proc is heavily used in the Linux system. Many
utilities on a modern Linux distribution, such as
ps, top, and
uptime, get their information from
/proc. Some device drivers also export
information via /proc, and yours can do so as
well. The /proc filesystem is dynamic, so your
module can add or remove entries at any time.
-
-
Time for an example. Here is a simple read_procimplementation for the scull device:
int scull_read_procmem(char *buf, char **start, off_t offset,
int count, int *eof, void *data)
{
int i, j, len = 0;
int limit = count - 80; /* Don't print more than this */
for (i = 0; i < scull_nr_devs && len <= limit; i++) {
Scull_Dev *d = &scull_devices[i];
if (down_interruptible(&d->sem))
return -ERESTARTSYS;
len += sprintf(buf+len,"\nDevice %i: qset %i, q %i, sz %li\n",
i, d->qset, d->quantum, d->size);
for (; d && len <= limit; d = d->next) { /* scan the list */
len += sprintf(buf+len, " item at %p, qset at %p\n", d,
d->data);
if (d->data && !d->next) /* dump only the last item
- save space */
for (j = 0; j < d->qset; j++) {
if (d->data[j])
len += sprintf(buf+len," % 4i: %8p\n",
j,d->data[j]);
}
}
up(&scull_devices[i].sem);
}
*eof = 1;
return len;
}
Once you have a read_proc function defined, you
need to connect it to an entry in the /prochierarchy. There are two ways of setting up this connection, depending
on what versions of the kernel you wish to support. The easiest
method, only available in the 2.4 kernel (and 2.2 too if you use our
sysdep.h header), is to simply call
create_proc_read_entry. Here is the call used by
scull to make its
/proc function available as
/proc/scullmem:
create_proc_read_entry("scullmem",
0 /* default mode */,
NULL /* parent dir */,
scull_read_procmem,
NULL /* client data */);
The directory entry pointer can be used to create entire directory
hierarchies under /proc. Note, however, that an
entry may be more easily placed in a subdirectory of
/proc simply by giving the directory name as part
of the name of the entry -- as long as the directory itself already
exists. For example, an emerging convention says that
/proc entries associated with device drivers
should go in the subdirectory driver/;
scull could place its entry there simply by
giving its name as driver/scullmem.
remove_proc_entry("scullmem", NULL /* parent dir */);
The alternative method for creating a /proc entry
is to create and initialize a proc_dir_entry
structure and pass it to proc_register_dynamic(version 2.0) or proc_register (version 2.2,
which assumes a dynamic file if the inode number in the structure is
0). As an example, consider the following code that
scull uses when compiled against 2.0
headers:
static int scull_get_info(char *buf, char **start, off_t offset,
int len, int unused)
{
int eof = 0;
return scull_read_procmem (buf, start, offset, len, &eof, NULL);
}
struct proc_dir_entry scull_proc_entry = {
namelen: 8,
name: "scullmem",
mode: S_IFREG | S_IRUGO,
nlink: 1,
get_info: scull_get_info,
};
static void scull_create_proc()
{
proc_register_dynamic(&proc_root, &scull_proc_entry);
}
static void scull_remove_proc()
{
proc_unregister(&proc_root, scull_proc_entry.low_ino);
}
Using ioctl this way to get information is
somewhat more difficult than using /proc, because
you need another program to issue the ioctl and
display the results. This program must be written, compiled, and kept
in sync with the module you're testing. On the other hand, the
driver's code is easier than what is needed to implement a
/proc file
Another interesting advantage of the ioctlapproach is that information-retrieval commands can be left in the
driver even when debugging would otherwise be disabled. Unlike a
/proc file, which is visible to anyone who looks
in the directory (and too many people are likely to wonder "what that
strange file is"), undocumented ioctl commands
are likely to remain unnoticed. In addition, they will still be there
should something weird happen to the driver. The only drawback is that
the module will be slightly bigger.
Sometimes minor problems can be tracked down by watching the behavior
of an application in user space. Watching programs can also help in
building confidence that a driver is working correctly. For example,
we were able to feel confident about scullafter looking at how its read implementation
reacted to read requests for different amounts of data.
There are various ways to watch a user-space program working. You can
run a debugger on it to step through its functions, add print
statements, or run the program under
strace. Here we'll discuss just the last
technique, which is most interesting when the real goal is examining
kernel code.
The strace command is a powerful tool that
shows all the system calls issued by a user-space program. Not only
does it show the calls, but it can also show the arguments to the
calls, as well as return values in symbolic form. When a system call
fails, both the symbolic value of the error (e.g.,
ENOMEM) and the corresponding string (Out
of memory) are displayed. stracehas many command-line options; the most useful of which are
-t to display the time
when each call is executed,
-T to display the time spent
in the call, -e to limit the
types of calls traced, and -o to redirect
the output to a file. By default, straceprints tracing information on stderr.
The trace information is often used to support bug reports sent to
application developers, but it's also invaluable to kernel
programmers. We've seen how driver code executes by reacting to system
calls; strace allows us to check the
consistency of input and output data of each call.
For example,the following screen dump shows the last lines of running
the command strace ls /dev >
/dev/scull0:
[...]
open("/dev", O_RDONLY|O_NONBLOCK) = 4
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
brk(0x8055000) = 0x8055000
lseek(4, 0, SEEK_CUR) = 0
getdents(4, /* 70 entries */, 3933) = 1260
[...]
getdents(4, /* 0 entries */, 3933) = 0
close(4) = 0
fstat(1, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0
ioctl(1, TCGETS, 0xbffffa5c) = -1 ENOTTY (Inappropriate ioctl
for device)
write(1, "MAKEDEV\natibm\naudio\naudio1\na"..., 4096) = 4000
write(1, "d2\nsdd3\nsdd4\nsdd5\nsdd6\nsdd7"..., 96) = 96
write(1, "4\nsde5\nsde6\nsde7\nsde8\nsde9\n"..., 3325) = 3325
close(1) = 0
_exit(0) = ?
It's apparent in the first write call that after
ls finished looking in the target
directory, it tried to write 4 KB. Strangely (for
ls), only four thousand bytes were written,
and the operation was retried. However, we know that the
write implementation in
scull writes a single quantum at a time, so
we could have expected the partial write. After a few steps,
everything sweeps through, and the program exits successfully.
As another example, let's read the
scull device (using the
wc command):
[...]
open("/dev/scull0", O_RDONLY) = 4
fstat(4, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0
read(4, "MAKEDEV\natibm\naudio\naudio1\na"..., 16384) = 4000
read(4, "d2\nsdd3\nsdd4\nsdd5\nsdd6\nsdd7"..., 16384) = 3421
read(4, "", 16384) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(3, 7), ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, " 7421 /dev/scull0\n", 20) = 20
close(4) = 0
_exit(0) = ?
Even if you've used all the monitoring and debugging techniques,
sometimes bugs remain in the driver, and the system faults when the
driver is executed. When this happens it's important to be able to
collect as much information as possible to solve the problem.
Note that "fault" doesn't mean "panic." The Linux code is robust
enough to respond gracefully to most errors: a fault usually results
in the destruction of the current process while the system goes on
working. The system can panic, and it may if a
fault happens outside of a process's context, or if some vital part of
the system is compromised. But when the problem is due to a driver
error, it usually results only in the sudden death of the process
unlucky enough to be using the driver. The only unrecoverable damage
when a process is destroyed is that some memory allocated to the
process's context is lost; for instance, dynamic lists allocated by
the driver through kmalloc might be lost.
However, since the kernel calls the closeoperation for any open device when a process dies, your driver can
release what was allocated by the open method.
We've already said that when kernel code misbehaves, an informative
message is printed on the console. The next section explains how to
decode and use such messages. Even though they appear rather obscure
to the novice, processor dumps are full of interesting information,
often sufficient to pinpoint a program bug without the need for
additional testing.
This message was generated by writing to a device owned by the
faulty module, a module built deliberately
to demonstrate failures. The implementation of the
write method of faulty.c is
trivial:
As you can see, what we do here is dereference a
NULL pointer. Since 0 is never a valid pointer
value, a fault occurs, which the kernel turns into the oops message
shown earlier. The calling process is then killed.
The main problem with users dealing with oops messages is in the
little intrinsic meaning carried by hexadecimal values; to be
meaningful to the programmer they need to be resolved to symbols. A
couple of utilities are available to perform this resolution for
developers: klogd and
ksymoops. The former tool performs symbol
decoding by itself whenever it is running; the latter needs to be
purposely invoked by the user. In the following discussion we use the
data generated in our first oops example by dereferencing a
NULL pointer.
The klogd daemon can decode oops messages
before they reach the log files. In many situations,
klogd can provide all the information a
developer needs to track down a problem, though sometimes the
developer must give it a little help.
klogd provides most of the necessary
information to track down the problem. In this case we see that the
instruction pointer (EIP) was executing in the
function faulty_write, so we know where to start
looking. The 3/576 string tells us that the
processor was at byte 3 of a function that appears to be 576 bytes
long. Note that the values are decimal, not hex.
The developer must exercise some care, however, to get useful
information for errors that occur within loadable modules.
klogd loads all of the available symbol
information when it starts, and uses those symbols thereafter. If you
load a module after klogd has initialized
itself (usually at system boot), klogd will
not have your module's symbol information. To force
klogd to go out and get that information,
send the klogd process a
SIGUSR1 signal after your module has been loaded
(or reloaded), and before you do anything that could cause it to oops.
It is also possible to run klogd with the
-p ("paranoid") option, which will cause
it to reread symbol information anytime it sees an oops message. The
klogd manpage recommends against this mode
of operation, however, since it makes klogdquery the kernel for information after the problem has occurred.
Information obtained after an error could be plain wrong.
For klogd to work properly, it must have a
current copy of the System.map symbol table file.
Normally this file is found in /boot; if you have
built and installed a kernel from a nonstandard location you may have
to copy System.map into
/boot, or tell klogdto look elsewhere. klogd refuses to decode
symbols if the symbol table doesn't match the current kernel. If a
symbol is decoded on the system log, you can be reasonably sure it is
decoded correctly.
Prior to the 2.3 development series,
ksymoops was distributed with the kernel
source, in the scripts directory. It now lives
on its own FTP site and is maintained independently of the kernel.
Even if you are working with an older kernel, you probably should go
to and get an
updated version of the tool.
To operate at its best, ksymoops needs a
lot of information in addition to the error message; you can use
command-line options to tell it where to find the various items. The
program needs the following items:
- A System.map file
-
This map must correspond to the kernel that was running at the time
the oops occurred. The default is
/usr/src/linux/System.map.
-
ksymoops needs to know what modules were
loaded when the oops occurred, in order to extract symbolic
information from them. If you do not supply this list,
ksymoops will look at
/proc/modules.
- A list of kernel symbols defined when the oops
occurred
-
- A copy of the kernel image that was running
-
Note that ksymoops needs a straight kernel
image, not the compressed version (vmlinuz,
zImage, or bzImage) that
most systems boot. The default is to use no kernel image because most
people don't keep it. If you have the exact image handy, you should
tell the program where it is by using the -voption.
-
ksymoops will look in the standard
directories for modules, but during development you will almost
certainly have to tell it where your module lives using the
-o option
Although ksymoops will go to files in
/proc for some of its needed information, the
results can be unreliable. The system, of course, will almost
certainly have been rebooted between the time the oops occurs and when
ksymoops is run, and the information from
/proc may not match the state of affairs when the
failure occurred. When possible, it is better to save copies of
/proc/modules and
/proc/ksyms prior to causing the oops to
happen.
We urge driver developers to read the manual page for
ksymoops because it is a very informative
document.
The last argument on the tool's command line is the location of the
oops message; if it is missing, the tool will read
stdin in the best Unix tradition. The message can
be recovered from the system logs with luck; in the case of a very bad
crash you may end up writing it down off the screen and typing it back
in (unless you were using a serial console, a nice tool for kernel
developers).
Note that ksymoops will be confused by an
oops message that has already been processed by
klogd. If you are running
klogd, and your system is still running after an
oops occurs, a clean oops message can often be obtained by invoking
the dmesg command.
In this case, moreover, you also get an assembly language dump of the
code where the fault occurred. This information can often be used to
figure out exactly what was happening; here it's clearly an
instruction that writes a 0 to address 0.
Note how the instruction dump doesn't start from the instruction that
caused the fault but three instructions earlier: that's because the
RISC platforms execute several instructions in parallel and may
generate deferred exceptions, so one must be able to look back at the
last few instructions.
Learning to decode an oops message requires some practice
and an understanding of the target processor you are using, as well as
of the conventions used to represent assembly language, but it's worth
doing. The time spent learning will be quickly repaid. Even if you
have previous expertise with the PC assembly language under non-Unix
operating systems, you may need to devote some time to learning,
because the Unix syntax is different from Intel syntax. (A good
description of the differences is in the Info documentation file for
as, in the chapter called
"i386-specific.")
You can prevent an endless loop by inserting
schedule invocations at strategic points. The
schedule call (as you might guess) invokes the
scheduler and thus allows other processes to steal CPU time from the
current process. If a process is looping in kernel space due to a bug
in your driver, the schedule calls enable you to
kill the process, after tracing what is happening.
If the keyboard isn't accepting input, the best thing to do is log
into the system through your network and kill any offending processes,
or reset the keyboard (with kbd_mode -a).
However, discovering that the hang is only a keyboard lockup is of
little use if you don't have a network available to help you recover.
If this is the case, you could set up alternative input devices to be
able at least to reboot the system cleanly. A shutdown and reboot
cycle is easier on your computer than hitting the so-called big red
button, and it saves you from the lengthy
fsck scanning of your disks.
Such an alternative input device can be, for example, the
mouse. Version 1.10 or newer of the gpmmouse server features a command-line option to enable a similar
capability, but it works only in text mode. If you don't have a
network connection and run in graphics mode, we suggest running some
custom solution, like a switch connected to the DCD pin of the serial
line and a script that polls for status change.
-
-
Invokes the "secure attention" (SAK) function. SAK will kill all
processes running on the current console, leaving you with a clean
terminal.
-
-
-
-
Prints the current register information.
-
Prints the current task list.
-
Other magic SysRq functions exist; see sysrq.txtin the Documentation directory of the kernel
source for the full list. Note that magic SysRq must be explicitly
enabled in the kernel configuration, and that most distributions do
not enable it, for obvious security reasons. For a system used to
develop drivers, however, enabling magic SysRq is worth the trouble of
building a new kernel in itself. Magic SysRq must be enabled at
runtime with a command like the following:
Another precaution to use when reproducing system hangs is to mount
all your disks as read-only (or unmount them). If the disks are
read-only or unmounted, there's no risk of damaging the filesystem or
leaving it in an inconsistent state. Another possibility is using a
computer that mounts all of its filesystems via NFS, the network file
system. The "NFS-Root" capability must be enabled in the kernel, and
special parameters must be passed at boot time. In this case you'll
avoid any filesystem corruption without even resorting to SysRq,
because filesystem coherence is managed by the NFS server, which is
not brought down by your device driver.
The debugger must be invoked as though the kernel were an application.
In addition to specifying the filename for the uncompressed kernel
image, you need to provide the name of a core file on the command
line. For a running kernel, that core file is the kernel core image,
/proc/kcore. A typical invocation of
gdb looks like the following:
The first argument is the name of the uncompressed kernel executable,
not the zImage or bzImage or
anything compressed.
The second argument on the gdb command line
is the name of the core file. Like any file in
/proc, /proc/kcore is
generated when it is read. When the read system
call executes in the /proc filesystem, it maps to
a data-generation function rather than a data-retrieval one; we've
already exploited this feature in "Using the /proc Filesystem"
earlier in this chapter. kcore is used to
represent the kernel "executable" in the format of a core file; it
is a huge file because it represents the whole kernel address space,
which corresponds to all physical memory. From within
gdb, you can look at kernel variables by
issuing the standard gdb commands. For
example, p jiffies prints the number of clock
ticks from system boot to the current time.
When you print data from gdb, the kernel is
still running, and the various data items have different values at
different times; gdb, however, optimizes
access to the core file by caching data that has already been read. If
you try to look at the jiffies variable once again,
you'll get the same answer as before. Caching values to avoid extra
disk access is a correct behavior for conventional core files, but is
inconvenient when a "dynamic" core image is used. The solution is to
issue the command core-file /proc/kcore whenever
you want to flush the gdb cache; the
debugger prepares to use a new core file and discards any old
information. You won't, however, always need to issue
core-file when reading a new datum;
gdb reads the core in chunks of a few
kilobytes and caches only chunks it has already referenced.
Numerous capabilities normally provided by
gdb are not available when you are working
with the kernel. For example, gdb is not
able to modify kernel data; it expects to be running a program to be
debugged under its own control before playing with its memory image.
It is also not possible to set breakpoints or watchpoints, or to
single-step through kernel functions.
On non-PC computers, the game is different. On the Alpha,
make boot strips the kernel before creating the
bootable image, so you end up with both the
vmlinux and the vmlinux.gzfiles. The former is usable by gdb, and
you can boot from the latter. On the SPARC, the kernel (at least the
2.0 kernel) is not stripped by default.
When you compile the kernel with -gand run the debugger using vmlinux together with
/proc/kcore, gdb can
return a lot of information about the kernel internals. You can, for
example, use commands such as p *module_list,
p *module_list->next, and p
*chrdevs[4]->fops to dump structures. To get the best out
of p, you'll need to keep a kernel map and the
source code handy.
Another useful task that gdb performs on
the running kernel is disassembling functions, via the
disassemble command (which can be abbreviated to
disass) or the "examine instructions"
(x/i) command. The
disassemble command can take as its argument
either a function name or a memory range, whereas
x/i takes a single memory address, also in the
form of a symbol name. You can invoke, for example,
x/20i to disassemble 20 instructions. Note that
you can't disassemble a module function, because the debugger is
acting on vmlinux, which doesn't know about your
module. If you try to disassemble a module by address,
gdb is most likely to reply "Cannot access
memory at xxxx." For the same reason, you can't look at data items
belonging to a module. They can be read from
/dev/mem if you know the address of your
variables, but it's hard to make sense out of raw data extracted from
system RAM.
If you want to disassemble a module function, you're better off
running the objdump utility on the module
object file. Unfortunately, the tool runs on the disk copy of the
file, not the running one; therefore, the addresses as shown by
objdump will be the addresses before
relocation, unrelated to the module's execution environment. Another
disadvantage of disassembling an unlinked object file is that function
calls are still unresolved, so you can't easily tell a call to
printk from a call to
kmalloc.
As you see, gdb is a useful tool when your
aim is to peek into the running kernel, but it lacks some features
that are vital to debugging device drivers.
Other kernel developers, however, see an occasional use for
interactive debugging tools. One such tool is the
kdb built-in kernel debugger, available as
a nonofficial patch from oss.sgi.com. To use
kdb, you must obtain the patch (be sure to
get a version that matches your kernel version), apply it, and rebuild
and reinstall the kernel. Note that, as of this writing,
kdb works only on IA-32 (x86) systems
(though a version for the IA-64 existed for a while in the mainline
kernel source before being removed).
Note that just about everything the kernel does stops when
kdb is running. Nothing else should be
running on a system where you invoke kdb;
in particular, you should not have networking turned on -- unless,
of course, you are debugging a network driver. It is generally a good
idea to boot the system in single-user mode if you will be using
kdb.
As an example, consider a quick sculldebugging session. Assuming that the driver is already loaded, we can
tell kdb to set a breakpoint in
scull_read as follows:
[1]kdb> bp scull_read
Instruction(i) BP #0 at 0xc8833514 (scull_read)
is enabled on cpu 1
[1]kdb> go
The bp command tells
kdb to stop the next time the kernel enters
scull_read. We then type go to continue execution. After
putting something into one of the sculldevices, we can attempt to read it by running
cat under a shell on another terminal,
yielding the following:
Entering kdb (0xc3108000) on processor 0 due to Breakpoint @ 0xc8833515
Instruction(i) breakpoint #0 at 0xc8833514
scull_read+0x1: movl %esp,%ebp
[0]kdb>
We are now positioned at the beginning of
scull_read. To see how we got there, we can get
a stack trace:
[0]kdb> bt
EBP EIP Function(args)
0xc3109c5c 0xc8833515 scull_read+0x1
0xc3109fbc 0xfc458b10 scull_read+0x33c255fc( 0x3, 0x803ad78, 0x1000, 0x1000, 0x804ad78)
0xbffffc88 0xc010bec0 system_call
[0]kdb>
kdb attempts to print out the arguments to
every function in the call trace. It gets confused, however, by
optimization tricks used by the compiler. Thus it prints five
arguments for scull_read, which only has four.
Time to look at some data. The mds command
manipulates data; we can query the value of the
scull_devices pointer with a command like:
[0]kdb> mds scull_devices 1
c8836104: c4c125c0 ....
Here we asked for one (four-byte) word of data starting at the
location of scull_devices; the answer tells us that
our device array was allocated starting at the address
c4c125c0. To look at a device structure itself we
need to use that address:
The eight lines here correspond to the eight fields in the
Scull_Dev structure. Thus we see that the memory
for the first device is allocated at 0xc3785000,
that there is no next item in the list, that the quantum is
4000 (hex fa0) and the array size is 1000 (hex 3e8), that there are
154 bytes of data in the device (hex 9a), and so on.
kdb can change data as well. Suppose we
wanted to trim some of the data from the device:
A subsequent cat on the device will now
return less data than before.
kdb has a number of other capabilities,
including single-stepping (by instructions, not lines of C source
code), setting breakpoints on data access, disassembling code,
stepping through linked lists, accessing register data, and more.
After you have applied the kdb patch, a
full set of manual pages can be found in the
Documentation/kdb directory in your kernel source
tree.
A number of kernel developers have contributed to an unofficial patch
called the integrated kernel debugger, or IKD.
IKD provides a number of interesting kernel debugging facilities. The
x86 is the primary platform for this patch, but much of it works on
other architectures as well. As of this writing, the IKD patch can be
found at
.
It is a patch that must be applied to the source for your kernel; the
patch is version specific, so be sure to download the one that matches
the kernel you are working with.
One of the features of the IKD patch is a kernel stack debugger. If
you turn this feature on, the kernel will check the amount of free
space on the kernel stack at every function call, and force an oops if
it gets too small.
If something in your kernel is causing stack corruption, this tool may
help you to find it. There is also a "stack meter" feature that you
can use to see how close to filling up the stack you get at any
particular time.
Finally, IKD also includes a version of the
kdb debugger discussed in the previous
section. As of this writing, however, the version of
kdb included in the IKD patch is somewhat
old. If you need kdb, we recommend that you
go directly to the source at oss.sgi.com for the current version.
kgdb is a patch that allows the full
use of the gdb debugger on the Linux
kernel, but only on x86 systems. It works by hooking into the system
to be debugged via a serial line, with gdbrunning on the far end. You thus need two systems to use
kgdb -- one to run the debugger and one
to run the kernel of interest. Like kdb,
kgdb is currently available from
oss.sgi.com.
Setting up kgdb involves installing a
kernel patch and booting the modified kernel. You need to connect the
two systems with a serial cable (of the null modem variety) and to
install some support files on the gdb side
of the connection. The patch places detailed instructions in the file
Documentation/i386/gdb-serial.txt; we won't
reproduce them here. Be sure to read the instructions on debugging
modules: toward the end there are some nice
gdb macros that have been written for this
purpose.
Crash dump analyzers enable the system to record its state when an
oops occurs, so that it may be examined at leisure afterward. They can
be especially useful if you are supporting a driver for a user at a
different site. Users can be somewhat reluctant to copy down oops
messages for you so installing a crash dump system can let you get the
information you need to track down a user's problem without requiring
work from him. It is thus not surprising that the available crash dump
analyzers have been written by companies in the business of supporting
systems for users.
There are currently two crash dump analyzer patches available for
Linux. Both were relatively new when this section was written, and
both were in a state of flux. Rather than provide detailed
information that is likely to go out of date, we'll restrict ourselves
to providing an overview and pointers to where more information can be
found.
The first analyzer is LKCD (Linux Kernel Crash Dumps). It's
available, once again, from oss.sgi.com. When a
kernel oops occurs, LKCD will write a copy of the current system state
(memory, primarily) into the dump device you specified in advance. The
dump device must be a system swap area. A utility called
LCRASH is run on the next reboot (before swapping
is enabled) to generate a summary of the crash, and optionally to save
a copy of the dump in a conventional file.
LCRASH can be run interactively and provides a
number of debugger-like commands for querying the state of the
system.
LKCD is currently supported for the Intel 32-bit architecture only,
and only works with swap partitions on SCSI disks.
Having a copy of the kernel running as a user-mode process brings a
number of advantages. Because it is running on a constrained, virtual
processor, a buggy kernel cannot damage the "real" system.
Different hardware and software configurations can be tried easily on
the same box. And, perhaps most significantly for kernel developers,
the user-mode kernel can be easily manipulated with
gdb or another debugger. After all, it is
just another process. User-Mode Linux clearly has the potential to
accelerate kernel development.
). The
word is that it will be integrated into an early 2.4 release after
2.4.0; it may well be there by the time this book is published.
User-Mode Linux also has some significant limitations as of this
writing, most of which will likely be addressed soon. The virtual
processor currently works in a uniprocessor mode only; the port runs
on SMP systems without a problem, but it can only emulate a
uniprocessor host. The biggest problem for driver writers, though, is
that the user-mode kernel has no access to the host system's hardware.
Thus, while it can be useful for debugging most of the sample drivers
in this book, User-Mode Linux is not yet useful for debugging drivers
that have to deal with real hardware. Finally, User-Mode Linux only
runs on the IA-32 architecture.
Because work is under way to fix all of these problems, User-Mode
Linux will likely be an indispensable tool for Linux device driver
programmers in the very near future.
The Linux Trace Toolkit
LTT, along with extensive documentation, can be found on the Web at
.
Dynamic Probes (or DProbes) is a debugging tool released (under the
GPL) by IBM for Linux on the IA-32 architecture. It allows the
placement of a "probe" at almost any place in the system, in both
user and kernel space. The probe consists of some code (written in a
specialized, stack-oriented language) that is executed when control
hits the given point. This code can report information back to user
space, change registers, or do a number of other things. The useful
feature of DProbes is that once the capability has been built into the
kernel, probes can be inserted anywhere within a running system
without kernel builds or reboots. DProbes can also work with the
Linux Trace Toolkit to insert new tracing events at arbitrary
locations.
Back to:
|
|
|
|
|
|
© 2001, O'Reilly & Associates, Inc.
阅读(722) | 评论(0) | 转发(0) |