Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1707590
  • 博文数量: 607
  • 博客积分: 10031
  • 博客等级: 上将
  • 技术积分: 6633
  • 用 户 组: 普通用户
  • 注册时间: 2006-03-30 17:41
文章分类

全部博文(607)

文章存档

2011年(2)

2010年(15)

2009年(58)

2008年(172)

2007年(211)

2006年(149)

我的朋友

分类: LINUX

2006-09-20 14:46:03

Search the Catalog

Linux Device Drivers, 2nd Edition


2nd Edition June 2001
0-59600-008-1, Order Number: 0081
586 pages, $39.95

Chapter 4
Debugging Techniques

Contents:





One of the most compelling problems for anyone writing kernel code is how to approach debugging. Kernel code cannot be easily executed under a debugger, nor can it be easily traced, because it is a set of functionalities not related to a specific process. Kernel code errors can also be exceedingly hard to reproduce and can bring down the entire system with them, thus destroying much of the evidence that could be used to track them down.

This chapter introduces techniques you can use to monitor kernel code and trace errors under such trying circumstances.

Used to report error conditions; device drivers will often use KERN_ERR to report hardware difficulties.

Situations that are normal, but still worthy of note. A number of security-related conditions are reported at this level.

A printk statement with no specified priority defaults to DEFAULT_MESSAGE_LOGLEVEL, specified in kernel/printk.c as an integer. The default loglevel value has changed several times during Linux development, so we suggest that you always specify an explicit loglevel.

Based on the loglevel, the kernel may print the message to the current console, be it a text-mode terminal, a serial line printer, or a parallel printer. If the priority is less than the integer variable console_loglevel, the message is displayed. If both klogd and syslogd are running on the system, kernel messages are appended to /var/log/messages (or otherwise treated depending on your syslogdconfiguration), independent of console_loglevel. If klogd is not running, the message won't reach user space unless you read /proc/kmsg.

The variable console_loglevel is initialized to DEFAULT_CONSOLE_LOGLEVEL and can be modified through the sys_syslog system call. One way to change it is by specifying the -c switch when invoking klogd, as specified in the klogd manpage. Note that to change the current value, you must first kill klogdand then restart it with the -c option. Alternatively, you can write a program to change the console loglevel. You'll find a version of such a program in misc-progs/setlevel.c in the source files provided on the O'Reilly FTP site. The new level is specified as an integer value between 1 and 8, inclusive. If it is set to 1, only messages of level 0 (KERN_EMERG) will reach the console; if it is set to 8, all messages, including debugging ones, will be displayed.

You'll probably want to lower the loglevel if you work on the console and you experience a kernel fault (see "Debugging System Faults" later in this chapter), because the fault-handling code raises the console_loglevel to its maximum value, causing every subsequent message to appear on the console. You'll want to raise the loglevel if you need to see your debugging messages; this is useful if you are developing kernel code remotely and the text console is not being used for an interactive session.

From version 2.1.31 on it is possible to read and modify the console loglevel using the text file /proc/sys/kernel/printk. The file hosts four integer values. You may be interested in the first two: the current console loglevel and the default level for messages. With recent kernels, for instance, you can cause all kernel messages to appear at the console by simply entering

Linux allows for some flexibility in console logging policies by letting you send messages to a specific virtual console (if your console lives on the text screen). By default, the "console" is the current virtual terminal. To select a different virtual terminal to receive messages, you can issue ioctl(TIOCLINUX) on any console device. The following program, setconsole, can be used to choose which console receives kernel messages; it must be run by the superuser and is available in the misc-progs directory.

The printk function writes messages into a circular buffer that is LOG_BUF_LEN (defined in kernel/printk.c) bytes long. It then wakes any process that is waiting for messages, that is, any process that is sleeping in the syslog system call or that is reading /proc/kmsg. These two interfaces to the logging engine are almost equivalent, but note that reading from /proc/kmsg consumes the data from the log buffer, whereas the syslog system call can optionally return log data while leaving it for other processes as well. In general, reading the /proc file is easier, which is why it is the default behavior for klogd.

If the circular buffer fills up, printk wraps around and starts adding new data to the beginning of the buffer, overwriting the oldest data. The logging process thus loses the oldest data. This problem is negligible compared with the advantages of using such a circular buffer. For example, a circular buffer allows the system to run even without a logging process, while minimizing memory waste by overwriting old data should nobody read it. Another feature of the Linux approach to messaging is that printk can be invoked from anywhere, even from an interrupt handler, with no limit on how much data can be printed. The only disadvantage is the possibility of losing some data.

If the klogd process is running, it retrieves kernel messages and dispatches them to syslogd, which in turn checks /etc/syslog.conf to find out how to deal with them. syslogd differentiates between messages according to a facility and a priority; allowable values for both the facility and the priority are defined in . Kernel messages are logged by the LOG_KERN facility, at a priority corresponding to the one used in printk (for example, LOG_ERR is used for KERN_ERR messages). If klogd isn't running, data remains in the circular buffer until someone reads it or the buffer overflows.

During the early stages of driver development, printk can help considerably in debugging and testing new code. When you officially release the driver, on the other hand, you should remove, or at least disable, such print statements. Unfortunately, you're likely to find that as soon as you think you no longer need the messages and remove them, you'll implement a new feature in the driver (or somebody will find a bug) and you'll want to turn at least one of the messages back on. There are several ways to solve both issues, to globally enable or disable your debug messages and to turn individual messages on or off.

The following code fragment implements these features and comes directly from the header scull.h.

 
#undef PDEBUG /* undef it, just in case */
#ifdef SCULL_DEBUG
# ifdef __KERNEL__
/* This one if debugging is on, and kernel space */
# define PDEBUG(fmt, args...) printk( KERN_DEBUG "scull: " fmt,
## args)
# else
/* This one for user space */
# define PDEBUG(fmt, args...) fprintf(stderr, fmt, ## args)
# endif
#else
# define PDEBUG(fmt, args...) /* not debugging: nothing */
#endif

#undef PDEBUGG
#define PDEBUGG(fmt, args...) /* nothing: it's a placeholder */

But every driver has its own features and monitoring needs. The art of good programming is in choosing the best trade-off between flexibility and efficiency, and we can't tell what is the best for you. Remember that preprocessor conditionals (as well as constant expressions in the code) are executed at compile time, so you must recompile to turn messages on or off. A possible alternative is to use C conditionals, which are executed at runtime and therefore permit you to turn messaging on and off during program execution. This is a nice feature, but it requires additional processing every time the code is executed, which can affect performance even when the messages are disabled. Sometimes this performance hit is unacceptable.

[22]The minus is a "magic" marker to prevent syslogd from flushing the file to disk at every new message, documented in syslog.conf(5), a manual page worth reading.

Two main techniques are available to driver developers for querying the system: creating a file in the /procfilesystem and using the ioctl driver method. You may use devfs as an alternative to /proc, but /proc is an easier tool to use for information retrieval.

The /proc filesystem is a special, software-created filesystem that is used by the kernel to export information to the world. Each file under /procis tied to a kernel function that generates the file's "contents" on the fly when the file is read. We have already seen some of these files in action; /proc/modules, for example, always returns a list of the currently loaded modules.

/proc is heavily used in the Linux system. Many utilities on a modern Linux distribution, such as ps, top, and uptime, get their information from /proc. Some device drivers also export information via /proc, and yours can do so as well. The /proc filesystem is dynamic, so your module can add or remove entries at any time.

Time for an example. Here is a simple read_procimplementation for the scull device:

 
int scull_read_procmem(char *buf, char **start, off_t offset,
int count, int *eof, void *data)
{
int i, j, len = 0;
int limit = count - 80; /* Don't print more than this */

for (i = 0; i < scull_nr_devs && len <= limit; i++) {
Scull_Dev *d = &scull_devices[i];
if (down_interruptible(&d->sem))
return -ERESTARTSYS;
len += sprintf(buf+len,"\nDevice %i: qset %i, q %i, sz %li\n",
i, d->qset, d->quantum, d->size);
for (; d && len <= limit; d = d->next) { /* scan the list */
len += sprintf(buf+len, " item at %p, qset at %p\n", d,
d->data);
if (d->data && !d->next) /* dump only the last item
- save space */
for (j = 0; j < d->qset; j++) {
if (d->data[j])
len += sprintf(buf+len," % 4i: %8p\n",
j,d->data[j]);
}
}
up(&scull_devices[i].sem);
}
*eof = 1;
return len;
}

Once you have a read_proc function defined, you need to connect it to an entry in the /prochierarchy. There are two ways of setting up this connection, depending on what versions of the kernel you wish to support. The easiest method, only available in the 2.4 kernel (and 2.2 too if you use our sysdep.h header), is to simply call create_proc_read_entry. Here is the call used by scull to make its /proc function available as /proc/scullmem:

 
create_proc_read_entry("scullmem",
0 /* default mode */,
NULL /* parent dir */,
scull_read_procmem,
NULL /* client data */);

The directory entry pointer can be used to create entire directory hierarchies under /proc. Note, however, that an entry may be more easily placed in a subdirectory of /proc simply by giving the directory name as part of the name of the entry -- as long as the directory itself already exists. For example, an emerging convention says that /proc entries associated with device drivers should go in the subdirectory driver/; scull could place its entry there simply by giving its name as driver/scullmem.

 
remove_proc_entry("scullmem", NULL /* parent dir */);

The alternative method for creating a /proc entry is to create and initialize a proc_dir_entry structure and pass it to proc_register_dynamic(version 2.0) or proc_register (version 2.2, which assumes a dynamic file if the inode number in the structure is 0). As an example, consider the following code that scull uses when compiled against 2.0 headers:

 


static int scull_get_info(char *buf, char **start, off_t offset,
int len, int unused)
{
int eof = 0;
return scull_read_procmem (buf, start, offset, len, &eof, NULL);
}

struct proc_dir_entry scull_proc_entry = {
namelen: 8,
name: "scullmem",
mode: S_IFREG | S_IRUGO,
nlink: 1,
get_info: scull_get_info,
};

static void scull_create_proc()
{
proc_register_dynamic(&proc_root, &scull_proc_entry);
}

static void scull_remove_proc()
{
proc_unregister(&proc_root, scull_proc_entry.low_ino);
}

Using ioctl this way to get information is somewhat more difficult than using /proc, because you need another program to issue the ioctl and display the results. This program must be written, compiled, and kept in sync with the module you're testing. On the other hand, the driver's code is easier than what is needed to implement a /proc file

Another interesting advantage of the ioctlapproach is that information-retrieval commands can be left in the driver even when debugging would otherwise be disabled. Unlike a /proc file, which is visible to anyone who looks in the directory (and too many people are likely to wonder "what that strange file is"), undocumented ioctl commands are likely to remain unnoticed. In addition, they will still be there should something weird happen to the driver. The only drawback is that the module will be slightly bigger.

Sometimes minor problems can be tracked down by watching the behavior of an application in user space. Watching programs can also help in building confidence that a driver is working correctly. For example, we were able to feel confident about scullafter looking at how its read implementation reacted to read requests for different amounts of data.

There are various ways to watch a user-space program working. You can run a debugger on it to step through its functions, add print statements, or run the program under strace. Here we'll discuss just the last technique, which is most interesting when the real goal is examining kernel code.

The strace command is a powerful tool that shows all the system calls issued by a user-space program. Not only does it show the calls, but it can also show the arguments to the calls, as well as return values in symbolic form. When a system call fails, both the symbolic value of the error (e.g., ENOMEM) and the corresponding string (Out of memory) are displayed. stracehas many command-line options; the most useful of which are -t to display the time when each call is executed, -T to display the time spent in the call, -e to limit the types of calls traced, and -o to redirect the output to a file. By default, straceprints tracing information on stderr.

The trace information is often used to support bug reports sent to application developers, but it's also invaluable to kernel programmers. We've seen how driver code executes by reacting to system calls; strace allows us to check the consistency of input and output data of each call.

For example,the following screen dump shows the last lines of running the command strace ls /dev > /dev/scull0:

[...]
open("/dev", O_RDONLY|O_NONBLOCK) = 4
fcntl(4, F_SETFD, FD_CLOEXEC) = 0
brk(0x8055000) = 0x8055000
lseek(4, 0, SEEK_CUR) = 0
getdents(4, /* 70 entries */, 3933) = 1260
[...]
getdents(4, /* 0 entries */, 3933) = 0
close(4) = 0
fstat(1, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0
ioctl(1, TCGETS, 0xbffffa5c) = -1 ENOTTY (Inappropriate ioctl
for device)
write(1, "MAKEDEV\natibm\naudio\naudio1\na"..., 4096) = 4000
write(1, "d2\nsdd3\nsdd4\nsdd5\nsdd6\nsdd7"..., 96) = 96
write(1, "4\nsde5\nsde6\nsde7\nsde8\nsde9\n"..., 3325) = 3325
close(1) = 0
_exit(0) = ?

It's apparent in the first write call that after ls finished looking in the target directory, it tried to write 4 KB. Strangely (for ls), only four thousand bytes were written, and the operation was retried. However, we know that the write implementation in scull writes a single quantum at a time, so we could have expected the partial write. After a few steps, everything sweeps through, and the program exits successfully.

As another example, let's read the scull device (using the wc command):

[...]
open("/dev/scull0", O_RDONLY) = 4
fstat(4, {st_mode=S_IFCHR|0664, st_rdev=makedev(253, 0), ...}) = 0
read(4, "MAKEDEV\natibm\naudio\naudio1\na"..., 16384) = 4000
read(4, "d2\nsdd3\nsdd4\nsdd5\nsdd6\nsdd7"..., 16384) = 3421
read(4, "", 16384) = 0
fstat(1, {st_mode=S_IFCHR|0600, st_rdev=makedev(3, 7), ...}) = 0
ioctl(1, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, " 7421 /dev/scull0\n", 20) = 20
close(4) = 0
_exit(0) = ?

Even if you've used all the monitoring and debugging techniques, sometimes bugs remain in the driver, and the system faults when the driver is executed. When this happens it's important to be able to collect as much information as possible to solve the problem.

Note that "fault" doesn't mean "panic." The Linux code is robust enough to respond gracefully to most errors: a fault usually results in the destruction of the current process while the system goes on working. The system can panic, and it may if a fault happens outside of a process's context, or if some vital part of the system is compromised. But when the problem is due to a driver error, it usually results only in the sudden death of the process unlucky enough to be using the driver. The only unrecoverable damage when a process is destroyed is that some memory allocated to the process's context is lost; for instance, dynamic lists allocated by the driver through kmalloc might be lost. However, since the kernel calls the closeoperation for any open device when a process dies, your driver can release what was allocated by the open method.

We've already said that when kernel code misbehaves, an informative message is printed on the console. The next section explains how to decode and use such messages. Even though they appear rather obscure to the novice, processor dumps are full of interesting information, often sufficient to pinpoint a program bug without the need for additional testing.

This message was generated by writing to a device owned by the faulty module, a module built deliberately to demonstrate failures. The implementation of the write method of faulty.c is trivial:

As you can see, what we do here is dereference a NULL pointer. Since 0 is never a valid pointer value, a fault occurs, which the kernel turns into the oops message shown earlier. The calling process is then killed.

The main problem with users dealing with oops messages is in the little intrinsic meaning carried by hexadecimal values; to be meaningful to the programmer they need to be resolved to symbols. A couple of utilities are available to perform this resolution for developers: klogd and ksymoops. The former tool performs symbol decoding by itself whenever it is running; the latter needs to be purposely invoked by the user. In the following discussion we use the data generated in our first oops example by dereferencing a NULL pointer.

The klogd daemon can decode oops messages before they reach the log files. In many situations, klogd can provide all the information a developer needs to track down a problem, though sometimes the developer must give it a little help.

klogd provides most of the necessary information to track down the problem. In this case we see that the instruction pointer (EIP) was executing in the function faulty_write, so we know where to start looking. The 3/576 string tells us that the processor was at byte 3 of a function that appears to be 576 bytes long. Note that the values are decimal, not hex.

The developer must exercise some care, however, to get useful information for errors that occur within loadable modules. klogd loads all of the available symbol information when it starts, and uses those symbols thereafter. If you load a module after klogd has initialized itself (usually at system boot), klogd will not have your module's symbol information. To force klogd to go out and get that information, send the klogd process a SIGUSR1 signal after your module has been loaded (or reloaded), and before you do anything that could cause it to oops.

It is also possible to run klogd with the -p ("paranoid") option, which will cause it to reread symbol information anytime it sees an oops message. The klogd manpage recommends against this mode of operation, however, since it makes klogdquery the kernel for information after the problem has occurred. Information obtained after an error could be plain wrong.

For klogd to work properly, it must have a current copy of the System.map symbol table file. Normally this file is found in /boot; if you have built and installed a kernel from a nonstandard location you may have to copy System.map into /boot, or tell klogdto look elsewhere. klogd refuses to decode symbols if the symbol table doesn't match the current kernel. If a symbol is decoded on the system log, you can be reasonably sure it is decoded correctly.

Prior to the 2.3 development series, ksymoops was distributed with the kernel source, in the scripts directory. It now lives on its own FTP site and is maintained independently of the kernel. Even if you are working with an older kernel, you probably should go to and get an updated version of the tool.

To operate at its best, ksymoops needs a lot of information in addition to the error message; you can use command-line options to tell it where to find the various items. The program needs the following items:

A System.map file

This map must correspond to the kernel that was running at the time the oops occurred. The default is /usr/src/linux/System.map.

ksymoops needs to know what modules were loaded when the oops occurred, in order to extract symbolic information from them. If you do not supply this list, ksymoops will look at /proc/modules.

A list of kernel symbols defined when the oops occurred

A copy of the kernel image that was running

Note that ksymoops needs a straight kernel image, not the compressed version (vmlinuz, zImage, or bzImage) that most systems boot. The default is to use no kernel image because most people don't keep it. If you have the exact image handy, you should tell the program where it is by using the -voption.

ksymoops will look in the standard directories for modules, but during development you will almost certainly have to tell it where your module lives using the -o option

Although ksymoops will go to files in /proc for some of its needed information, the results can be unreliable. The system, of course, will almost certainly have been rebooted between the time the oops occurs and when ksymoops is run, and the information from /proc may not match the state of affairs when the failure occurred. When possible, it is better to save copies of /proc/modules and /proc/ksyms prior to causing the oops to happen.

We urge driver developers to read the manual page for ksymoops because it is a very informative document.

The last argument on the tool's command line is the location of the oops message; if it is missing, the tool will read stdin in the best Unix tradition. The message can be recovered from the system logs with luck; in the case of a very bad crash you may end up writing it down off the screen and typing it back in (unless you were using a serial console, a nice tool for kernel developers).

Note that ksymoops will be confused by an oops message that has already been processed by klogd. If you are running klogd, and your system is still running after an oops occurs, a clean oops message can often be obtained by invoking the dmesg command.

In this case, moreover, you also get an assembly language dump of the code where the fault occurred. This information can often be used to figure out exactly what was happening; here it's clearly an instruction that writes a 0 to address 0.

Note how the instruction dump doesn't start from the instruction that caused the fault but three instructions earlier: that's because the RISC platforms execute several instructions in parallel and may generate deferred exceptions, so one must be able to look back at the last few instructions.

Learning to decode an oops message requires some practice and an understanding of the target processor you are using, as well as of the conventions used to represent assembly language, but it's worth doing. The time spent learning will be quickly repaid. Even if you have previous expertise with the PC assembly language under non-Unix operating systems, you may need to devote some time to learning, because the Unix syntax is different from Intel syntax. (A good description of the differences is in the Info documentation file for as, in the chapter called "i386-specific.")

You can prevent an endless loop by inserting schedule invocations at strategic points. The schedule call (as you might guess) invokes the scheduler and thus allows other processes to steal CPU time from the current process. If a process is looping in kernel space due to a bug in your driver, the schedule calls enable you to kill the process, after tracing what is happening.

If the keyboard isn't accepting input, the best thing to do is log into the system through your network and kill any offending processes, or reset the keyboard (with kbd_mode -a). However, discovering that the hang is only a keyboard lockup is of little use if you don't have a network available to help you recover. If this is the case, you could set up alternative input devices to be able at least to reboot the system cleanly. A shutdown and reboot cycle is easier on your computer than hitting the so-called big red button, and it saves you from the lengthy fsck scanning of your disks.

Such an alternative input device can be, for example, the mouse. Version 1.10 or newer of the gpmmouse server features a command-line option to enable a similar capability, but it works only in text mode. If you don't have a network connection and run in graphics mode, we suggest running some custom solution, like a switch connected to the DCD pin of the serial line and a script that polls for status change.

Invokes the "secure attention" (SAK) function. SAK will kill all processes running on the current console, leaving you with a clean terminal.

Prints the current register information.

Prints the current task list.

Other magic SysRq functions exist; see sysrq.txtin the Documentation directory of the kernel source for the full list. Note that magic SysRq must be explicitly enabled in the kernel configuration, and that most distributions do not enable it, for obvious security reasons. For a system used to develop drivers, however, enabling magic SysRq is worth the trouble of building a new kernel in itself. Magic SysRq must be enabled at runtime with a command like the following:

Another precaution to use when reproducing system hangs is to mount all your disks as read-only (or unmount them). If the disks are read-only or unmounted, there's no risk of damaging the filesystem or leaving it in an inconsistent state. Another possibility is using a computer that mounts all of its filesystems via NFS, the network file system. The "NFS-Root" capability must be enabled in the kernel, and special parameters must be passed at boot time. In this case you'll avoid any filesystem corruption without even resorting to SysRq, because filesystem coherence is managed by the NFS server, which is not brought down by your device driver.

The debugger must be invoked as though the kernel were an application. In addition to specifying the filename for the uncompressed kernel image, you need to provide the name of a core file on the command line. For a running kernel, that core file is the kernel core image, /proc/kcore. A typical invocation of gdb looks like the following:

The first argument is the name of the uncompressed kernel executable, not the zImage or bzImage or anything compressed.

The second argument on the gdb command line is the name of the core file. Like any file in /proc, /proc/kcore is generated when it is read. When the read system call executes in the /proc filesystem, it maps to a data-generation function rather than a data-retrieval one; we've already exploited this feature in "Using the /proc Filesystem" earlier in this chapter. kcore is used to represent the kernel "executable" in the format of a core file; it is a huge file because it represents the whole kernel address space, which corresponds to all physical memory. From within gdb, you can look at kernel variables by issuing the standard gdb commands. For example, p jiffies prints the number of clock ticks from system boot to the current time.

When you print data from gdb, the kernel is still running, and the various data items have different values at different times; gdb, however, optimizes access to the core file by caching data that has already been read. If you try to look at the jiffies variable once again, you'll get the same answer as before. Caching values to avoid extra disk access is a correct behavior for conventional core files, but is inconvenient when a "dynamic" core image is used. The solution is to issue the command core-file /proc/kcore whenever you want to flush the gdb cache; the debugger prepares to use a new core file and discards any old information. You won't, however, always need to issue core-file when reading a new datum; gdb reads the core in chunks of a few kilobytes and caches only chunks it has already referenced.

Numerous capabilities normally provided by gdb are not available when you are working with the kernel. For example, gdb is not able to modify kernel data; it expects to be running a program to be debugged under its own control before playing with its memory image. It is also not possible to set breakpoints or watchpoints, or to single-step through kernel functions.

On non-PC computers, the game is different. On the Alpha, make boot strips the kernel before creating the bootable image, so you end up with both the vmlinux and the vmlinux.gzfiles. The former is usable by gdb, and you can boot from the latter. On the SPARC, the kernel (at least the 2.0 kernel) is not stripped by default.

When you compile the kernel with -gand run the debugger using vmlinux together with /proc/kcore, gdb can return a lot of information about the kernel internals. You can, for example, use commands such as p *module_list, p *module_list->next, and p *chrdevs[4]->fops to dump structures. To get the best out of p, you'll need to keep a kernel map and the source code handy.

Another useful task that gdb performs on the running kernel is disassembling functions, via the disassemble command (which can be abbreviated to disass) or the "examine instructions" (x/i) command. The disassemble command can take as its argument either a function name or a memory range, whereas x/i takes a single memory address, also in the form of a symbol name. You can invoke, for example, x/20i to disassemble 20 instructions. Note that you can't disassemble a module function, because the debugger is acting on vmlinux, which doesn't know about your module. If you try to disassemble a module by address, gdb is most likely to reply "Cannot access memory at xxxx." For the same reason, you can't look at data items belonging to a module. They can be read from /dev/mem if you know the address of your variables, but it's hard to make sense out of raw data extracted from system RAM.

If you want to disassemble a module function, you're better off running the objdump utility on the module object file. Unfortunately, the tool runs on the disk copy of the file, not the running one; therefore, the addresses as shown by objdump will be the addresses before relocation, unrelated to the module's execution environment. Another disadvantage of disassembling an unlinked object file is that function calls are still unresolved, so you can't easily tell a call to printk from a call to kmalloc.

As you see, gdb is a useful tool when your aim is to peek into the running kernel, but it lacks some features that are vital to debugging device drivers.

Other kernel developers, however, see an occasional use for interactive debugging tools. One such tool is the kdb built-in kernel debugger, available as a nonofficial patch from oss.sgi.com. To use kdb, you must obtain the patch (be sure to get a version that matches your kernel version), apply it, and rebuild and reinstall the kernel. Note that, as of this writing, kdb works only on IA-32 (x86) systems (though a version for the IA-64 existed for a while in the mainline kernel source before being removed).

Note that just about everything the kernel does stops when kdb is running. Nothing else should be running on a system where you invoke kdb; in particular, you should not have networking turned on -- unless, of course, you are debugging a network driver. It is generally a good idea to boot the system in single-user mode if you will be using kdb.

As an example, consider a quick sculldebugging session. Assuming that the driver is already loaded, we can tell kdb to set a breakpoint in scull_read as follows:

[1]kdb> bp scull_read
Instruction(i) BP #0 at 0xc8833514 (scull_read)
is enabled on cpu 1
[1]kdb> go

The bp command tells kdb to stop the next time the kernel enters scull_read. We then type go to continue execution. After putting something into one of the sculldevices, we can attempt to read it by running cat under a shell on another terminal, yielding the following:

Entering kdb (0xc3108000) on processor 0 due to Breakpoint @ 0xc8833515
Instruction(i) breakpoint #0 at 0xc8833514
scull_read+0x1: movl %esp,%ebp
[0]kdb>

We are now positioned at the beginning of scull_read. To see how we got there, we can get a stack trace:

[0]kdb> bt
EBP EIP Function(args)
0xc3109c5c 0xc8833515 scull_read+0x1
0xc3109fbc 0xfc458b10 scull_read+0x33c255fc( 0x3, 0x803ad78, 0x1000, 0x1000, 0x804ad78)
0xbffffc88 0xc010bec0 system_call
[0]kdb>

kdb attempts to print out the arguments to every function in the call trace. It gets confused, however, by optimization tricks used by the compiler. Thus it prints five arguments for scull_read, which only has four.

Time to look at some data. The mds command manipulates data; we can query the value of the scull_devices pointer with a command like:

[0]kdb> mds scull_devices 1
c8836104: c4c125c0 ....

Here we asked for one (four-byte) word of data starting at the location of scull_devices; the answer tells us that our device array was allocated starting at the address c4c125c0. To look at a device structure itself we need to use that address:

The eight lines here correspond to the eight fields in the Scull_Dev structure. Thus we see that the memory for the first device is allocated at 0xc3785000, that there is no next item in the list, that the quantum is 4000 (hex fa0) and the array size is 1000 (hex 3e8), that there are 154 bytes of data in the device (hex 9a), and so on.

kdb can change data as well. Suppose we wanted to trim some of the data from the device:

A subsequent cat on the device will now return less data than before.

kdb has a number of other capabilities, including single-stepping (by instructions, not lines of C source code), setting breakpoints on data access, disassembling code, stepping through linked lists, accessing register data, and more. After you have applied the kdb patch, a full set of manual pages can be found in the Documentation/kdb directory in your kernel source tree.

A number of kernel developers have contributed to an unofficial patch called the integrated kernel debugger, or IKD. IKD provides a number of interesting kernel debugging facilities. The x86 is the primary platform for this patch, but much of it works on other architectures as well. As of this writing, the IKD patch can be found at . It is a patch that must be applied to the source for your kernel; the patch is version specific, so be sure to download the one that matches the kernel you are working with.

One of the features of the IKD patch is a kernel stack debugger. If you turn this feature on, the kernel will check the amount of free space on the kernel stack at every function call, and force an oops if it gets too small. If something in your kernel is causing stack corruption, this tool may help you to find it. There is also a "stack meter" feature that you can use to see how close to filling up the stack you get at any particular time.

Finally, IKD also includes a version of the kdb debugger discussed in the previous section. As of this writing, however, the version of kdb included in the IKD patch is somewhat old. If you need kdb, we recommend that you go directly to the source at oss.sgi.com for the current version.

kgdb is a patch that allows the full use of the gdb debugger on the Linux kernel, but only on x86 systems. It works by hooking into the system to be debugged via a serial line, with gdbrunning on the far end. You thus need two systems to use kgdb -- one to run the debugger and one to run the kernel of interest. Like kdb, kgdb is currently available from oss.sgi.com.

Setting up kgdb involves installing a kernel patch and booting the modified kernel. You need to connect the two systems with a serial cable (of the null modem variety) and to install some support files on the gdb side of the connection. The patch places detailed instructions in the file Documentation/i386/gdb-serial.txt; we won't reproduce them here. Be sure to read the instructions on debugging modules: toward the end there are some nice gdb macros that have been written for this purpose.

Crash dump analyzers enable the system to record its state when an oops occurs, so that it may be examined at leisure afterward. They can be especially useful if you are supporting a driver for a user at a different site. Users can be somewhat reluctant to copy down oops messages for you so installing a crash dump system can let you get the information you need to track down a user's problem without requiring work from him. It is thus not surprising that the available crash dump analyzers have been written by companies in the business of supporting systems for users.

There are currently two crash dump analyzer patches available for Linux. Both were relatively new when this section was written, and both were in a state of flux. Rather than provide detailed information that is likely to go out of date, we'll restrict ourselves to providing an overview and pointers to where more information can be found.

The first analyzer is LKCD (Linux Kernel Crash Dumps). It's available, once again, from oss.sgi.com. When a kernel oops occurs, LKCD will write a copy of the current system state (memory, primarily) into the dump device you specified in advance. The dump device must be a system swap area. A utility called LCRASH is run on the next reboot (before swapping is enabled) to generate a summary of the crash, and optionally to save a copy of the dump in a conventional file. LCRASH can be run interactively and provides a number of debugger-like commands for querying the state of the system.

LKCD is currently supported for the Intel 32-bit architecture only, and only works with swap partitions on SCSI disks.

Having a copy of the kernel running as a user-mode process brings a number of advantages. Because it is running on a constrained, virtual processor, a buggy kernel cannot damage the "real" system. Different hardware and software configurations can be tried easily on the same box. And, perhaps most significantly for kernel developers, the user-mode kernel can be easily manipulated with gdb or another debugger. After all, it is just another process. User-Mode Linux clearly has the potential to accelerate kernel development.

). The word is that it will be integrated into an early 2.4 release after 2.4.0; it may well be there by the time this book is published.

User-Mode Linux also has some significant limitations as of this writing, most of which will likely be addressed soon. The virtual processor currently works in a uniprocessor mode only; the port runs on SMP systems without a problem, but it can only emulate a uniprocessor host. The biggest problem for driver writers, though, is that the user-mode kernel has no access to the host system's hardware. Thus, while it can be useful for debugging most of the sample drivers in this book, User-Mode Linux is not yet useful for debugging drivers that have to deal with real hardware. Finally, User-Mode Linux only runs on the IA-32 architecture.

Because work is under way to fix all of these problems, User-Mode Linux will likely be an indispensable tool for Linux device driver programmers in the very near future.

The Linux Trace Toolkit

LTT, along with extensive documentation, can be found on the Web at .

Dynamic Probes (or DProbes) is a debugging tool released (under the GPL) by IBM for Linux on the IA-32 architecture. It allows the placement of a "probe" at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. This code can report information back to user space, change registers, or do a number of other things. The useful feature of DProbes is that once the capability has been built into the kernel, probes can be inserted anywhere within a running system without kernel builds or reboots. DProbes can also work with the Linux Trace Toolkit to insert new tracing events at arbitrary locations.



Back to:


| | | | | |

© 2001, O'Reilly & Associates, Inc.

阅读(734) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~