下面的文章,对于kernel
thread有深入浅出的讨论,此文来自于, That's a good linux technique web
site.
*******************************************************************************
Kernel Thread
Gearheads Written by Sreekrishnan Venkateswaran
Thursday, 15 September 2005
Threads are programming abstractions used in concurrent
processing. A kernel thread is a way to implement background
tasks inside the kernel. A background task can be busy handling
asynchronous events or can be asleep, waiting for an event to occur. Kernel
threads are similar to user processes, except that they live in kernel space and
have access to kernel functions and data structures. Like user processes, kernel
threads appear to monopolize the processor because of preemptive
scheduling.
In this month’s “Gearheads,” let’s discuss kernel
threads and develop an example that also demonstrates such as process states,
wait queues, and user-mode helpers.
Built-in Kernel Threads
To see the kernel threads (also called kernel processes)
running on your system, run the command ps –ef. You should see something similar
to Figure One.
FIGURE ONE: A typical list of Linux kernel
threads
$
ps –ef
UID PID
PPID C STIME TTY TIME
CMD
root 1 0 0 22:36
? 00:00:00 init [3]
root 2
1 0 22:36 ? 00:00:00
[ksoftirqd/0]
root 3 1 0
22:36 ? 00:00:00 [events/0]
root
38 3 0 22:36 ?
00:00:00 [pdflush]
root 39
3 0 22:36 ? 00:00:00
[pdflush]
root 29 1 0
22:36 ? 00:00:00 [khubd]
root
695 1 0 22:36 ?
00:00:00 [kjournald]
…
root 3914
1 0 22:37 ? 00:00:00
[nfsd]
root 3915 1 0 22:37
? 00:00:00 [nfsd]
…
root 4015
3364 0 22:55 tty3 00:00:00
-bash
root 4066 4015 0 22:59
tty3 00:00:00 ps -ef
The output of ps –ef is a list of user and kernel
processes running on your system. Kernel
process names are surrounded by square brackets
([]).
The [ksoftirqd/0] kernel thread is an aid to implement
soft IRQs. Soft
IRQs are raised by interrupt handlers to request “bottom half” processing
of portions of the interrupt handler whose execution can be deferred. The idea
is to minimize the code inside interrupt handlersm which results in reduced
interrupt-off times in the system, thus resulting in lower latencies. ksoftirqd
ensures that a high load of soft IRQs neither starves the soft IRQs nor
overwhelms the system. (On Symmetric Multi-Processing (SMP) machines, where
multiple thread instances can run on different processors in parallel, one
instance of ksoftirqd is created per processor to improve throughput. On SMP
machines, the kernel processes are named ksoftirqd/ n, where n is the processor
number.)
The events/n
threads (where n is the processor number) help implement work queues,
which are another way of deferring work in the kernel. If a part of the kernel
wants to defer execution of work, it can either create its own work queue or
make use of the default events/ n worker thread.
The pdflush
kernel thread flushes dirty pages from the page cache. The page cache
buffers accesses to the disk. To improve performance, actual writes to the disk
are delayed until the pdflush daemon writes out dirtied data to disk. This is
done if the available free memory dips below a threshold or if the page has
remained dirty for a sufficiently long time. In the 2.4.* kernels, these two
tasks were respectively performed by separate kernel threads, bdflush
and kupdated.
You may have noticed that there are two instances of
pdflush in the ps output. A new instance is created if the kernel senses that
existing instances are becoming intolerably busy servicing disk queues.
Launching new instances of pdflush improves throughput, especially if your
system has multiple disks and many of them are busy.
The khubd
thread, part of the Linux USB core, monitors the machine’s USB hub and
configures USB devices when they are hot-plugged into the system. kjournald is
the generic kernel journaling thread, which is used by file systems like ext3.
The Linux Network File System (NFS) server is implemented using a set of kernel
threads named nfsd.
Creating a Kernel Thread
To illustrate kernel threads, let’s implement a simple
example. Assume that you’d like the kernel to asynchronously invoke a user-mode
program to send you a page or an email alert whenever it senses that the health
of certain kernel data structures is unsatisfactory — for instance, free space
in network receive buffers has dipped below a low
watermark.
This is a candidate for a kernel thread
because:
*It’s a background task, since it has to wait for
asynchronous events.
*It needs access to kernel data structures, since the
actual detection of events must be done by other parts of the
kernel.
*It has to invoke a user-mode helper program, which is a
time consuming operation.
The kernel thread relinquishes the processor till it
gets woken up by parts of the kernel that are responsible for monitoring the
data structures of interest. It then invokes the user-mode helper program and
passes on the appropriate identity code to the program’s environment. The
user-mode program is registered with the kernel via the /proc file
system.
Listing One creates the kernel
thread.
Listing
One: Creating a Linux kernel thread
ret
= kernel_thread (mykthread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND |
SIGCHLD);
The thread can be created in an appropriate place, for
example, in init/main.c. The flags specify the resources to be shared between
the parent and child threads: CLONE_FILES specifies that open files are to be
shared, while CLONE_SIGHAND requests that signal handlers be
shared.
Listing Two is the actual kernel thread. daemonize()
creates the thread without attached user resources, while reparent_to_init()
changes the parent of the calling thread to the init task.
Each
Linux thread has a single parent. If a parent process dies without waiting for its child
to exit, the child becomes a zombie process and wastes resources. Re-parenting
the child to the init task avoids this. In the 2.6 kernel, the daemonize()
function itself internally invokes reparent_to_init.
Since
daemonize() blocks all signals by default, you have to call allow_signal()
to enable
delivery if your thread desires to handle a particular signal. There
are no signal handlers
inside the kernel, so
use signal_pending() to check for signals and perform the
appropriate action. For debugging purposes, the code in Listing Two
requests delivery of SIGKILL and dies if it’s received.
Listing
Two: Implementing the Kernel Thread
static
DECLARE_WAIT_QUEUE_HEAD (myevent_waitqueue);
rwlock_t
myevent_lock;
static int mykthread (void *unused)
{
unsigned int
event_id = 0;
DECLARE_WAITQUEUE (wait, current);
/* The stuff
required to become a kernel thread
* without attached user resources
*/
daemonize ("mykthread");
reparent_to_init (); /* In 2.4 kernels
*/
/* Request delivery of SIGKILL */
allow_signal (SIGKILL);
/*
The thread will sleep on this wait queue till it is
* woken up by parts of
the kernel in charge of sensing
* the health of data structures of interest
*/
add_wait_queue (&myevent_waitqueue, &wait);
for (;;)
{
/* Relinquish the processor till the event occurs
*/
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Die if I
receive SIGKILL */
if (signal_pending (current)) break;
/* Control
gets here when the thread is woken up */
read_lock (&myevent_lock);
/* Critical section starts */
if (myevent_id) { /* Guard against spurious
wakeups */
event_id = myevent_id;
read_unlock (&myevent_lock); /*
Critical section ends */
/* Invoke the registered user-mode helper
and
* pass the identity code in its environment */
run_umode_handler
(event_id); /* See Listing Five */
} else {
read_unlock
(&myevent_lock);
}
}
set_current_state
(TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue,
&wait);
return 0;
}
If you compile this as part of the kernel, you can see
the newly created thread, mykthread, in the ps output, as shown in Figure
Two.
FIGURE
TWO: The new thread, mykthread, is a child of init
$
ps –ef
UID PID PPID C
STIME TTY TIME CMD
root
1 0 0 21:56 ? 00:00:00
init [3]
root 2 1 0 22:36
? 00:00:00 [ksoftirqd/0]
…
root
111 1 0 21:56 ?
00:00:00 [mykthread]
…
Before delving further into the thread implementation,
let’s look at a code snippet that detects the event and awakens mykthread. Refer
to Listing Three.
Listing
Three: Waking up the kernel thread
/*
Executed by parts of the kernel that own the
data structures whose health
you want to monitor */
/* ... */
if
(my_key_datastructure looks troubled) {
write_lock (&myevent_lock);
/* Fill in the identity of the data structure */
myevent_id =
datastructure_id;
write_unlock (&myevent_lock);
/* Wake up
mykthread */
wake_up_interruptible (&myevent_waitqueue);
}
/*
... */
The kernel accomplishes useful work using a combination
of process contexts and interrupt contexts. Process contexts aren’t tied to any
interrupt context and vice versa. Listing Two executes in a process context,
while Listing Three can run from both process and interrupt
contexts.
Process and interrupt contexts communicate via kernel
data structures. In the example, myevent_id and myevent_waitqueue are used for
this communication. myevent_id contains the identity of the data structure
that’s in trouble. Access to myevent_id is serialized using spin
locks.
(Kernel
threads are preemptible only if CONFIG_PREEMPT is turned on during compile
time. If
CONFIG_PREEMPT is off or if you are running a 2.4 kernel without the
preemption patch, your
thread will freeze the system if it doesn’t go to sleep. If you comment
out schedule() in Listing Two and disable CONFIG_PREEMPT in your kernel
configuration, your system will lock up, too.)
Process States and Wait Queues
Let’s take a closer look at the code snippet that puts
mykthread to sleep while waiting for events. The snippet is shown in Listing
Four.
LISTING
FOUR: How to put a thread to sleep
add_wait_queue
(&myevent_waitqueue, &wait);
for (;;) {
/* ..
*/
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Point A
*/
/* .. */
}
set_current_state
(TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue,
&wait);
Wait queues hold threads that need to wait for an event
or a system resource. A thread in a wait queue sleeps until it’s woken by
another thread or an interrupt handler that’s responsible for detecting the
event. Queuing and de-queuing are done using the add_wait_queue() and
remove_wait_queue() functions, while waking up queued tasks is accomplished via
the wake_up_interruptible() routine.
In the above code snippet, set_current_state() is used
to set the run state of the kernel thread. A kernel thread (or a normal process)
can be in either of the following states: running, interruptible,
uninterruptible, zombie, stopped, traced, or dead. These states are defined in
include/linux/sched.h.
*A process in the running state (TASK_RUNNING) is in the
scheduler run queue and is a candidate for CPU time according to the scheduling
algorithm.
*A task in the interruptible state (TASK_INTERRUPTIBLE)
is waiting for an event to occur and isn’t in the scheduler run queue. When the
task gets woken up or if a signal is delivered to it, it re-enters the run
queue.
*The uninterruptible state (TASK_UNINTERRUPTIBLE) is
similar to the interruptible state except that receipt of a signal won’t put the
task back into the run queue.
*A task in the zombie state (EXIT_ZOMBIE) has
terminated, but its parent did not wait for the task to
complete.
*A stopped task (TASK_STOPPED) has stopped execution due
to receipt of certain signals.
mykthread sleeps on a wait queue (myevent_waitqueue) and
changes its state to TASK_INTERRUPTIBLE, signaling that it desires to opt out of
the scheduler run queue. The call to schedule() asks the scheduler to choose and
run a new task from its run queue.
When another part of the kernel awakens mykthread using
wake_up_interruptible() as shown in Listing Three, the thread is put back into
the scheduler run queue. The process state also gets changed to TASK_RUNNING, so
there’s no race condition even if the wake up occurs between the time the task
state is set to TASK_INTERRUPTIBLE and the schedule() function is called. The
thread also gets back into the run queue if a SIGKILL signal is delivered to it.
When the scheduler subsequently picks mykthread from the run queue, execution
resumes at Point A.
User-Mode Helpers
The kernel supports a mechanism for invoking user-mode
programs to help perform certain functions. For example, if module auto-loading
is enabled, the kernel dynamically loads necessary modules on demand using a
user-mode module loader. The default loader is /sbin/modprobe, but you can
change it by registering your own loader in /proc/sys/kernel/modprobe.
Similarly, the kernel notifies user space about hot-plug events by invoking the
program registered in /proc/sys/kernel/hotplug, which is by default
/sbin/hotplug.
Listing Four contains the function used by mykthread to
notify user space about detected events. The user-mode program to invoke can be
registered via the sysctl
interface
in the /proc file system. To do this, make sure that CONFIG_SYSCTL is enabled in
your kernel configuration and add an entry to the kern_table array in
kernel/sysctl.c:
{KERN_MYEVENT_HANDLER, "myevent_handler",
&myevent_handler, 256,
0644, NULL,
&proc_dostring,
&sysctl_string}
This creates an entry /proc/sys/kernel/myevent_handler
in the /proc file system. To register your user-mode helper, do the
following:
$ echo /path/to/helper > \
/proc/sys/kernel/myevent_handler
This makes /path/to/helper execute when the function in
Listing Five runs.
Listing
Five: Invoking User Mode Helpers
/*
Called from Listing Two */
static void run_umode_handler (int
event_id)
{
int i = 0;
char *argv[2], *envp[4], *buffer = NULL;
int
value;
argv[i++] = myevent_handler; /* Defined earlier in kernel/sysctl.c
*/
/* Fill in the id corresponding to the data structure in trouble
*/
if (!(buffer = kmalloc (32, GFP_KERNEL))) return;
sprintf (buffer,
"TROUBLED_DS=%d", event_id);
/* If no user-mode handlers are found,
return */
if (!argv[0]) return;
argv[i] = 0;
/* Prepare the
environment for /path/to/helper */
i = 0;
envp[i++] =
"HOME=/";
envp[i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
envp[i++] =
buffer;
envp[i] = 0;
/* Execute the user-mode program, /path/to/helper
*/
value = call_usermodehelper (argv[0], argv, envp, 0);
/* Check
return values */
…
kfree
(buffer);
}
The identity of the troubled kernel data structure is
passed as an environment variable (TROUBLED_DS) to the user-mode helper. The
helper can be a simple script like the following that sends you an email alert
containing the information that it gleaned from its
environment:
#!/bin/bash
echo Kernel datastructure $TROUBLED_DS \
is in trouble | mail –s Alert root
call_usermodehelper() has to be executed from a process
context and runs with root capabilities. It’s implemented using a work queue in
2.6 kernels.
Looking at the Sources
In the 2.6 source tree, the ksoftirqd, pdflush, and
khubd kernel threads live in kernel/softirq.c, mm/pdflush.c, and
drivers/usb/core/hub.c, respectively.
The daemonize() function can be found in kernel/exit.c
in the 2.6 sources and in kernel/sched.c in the 2.4 sources. For the
implementation of invoking user-mode helpers, look at
kernel/kmod.c.
Sreekrishnan Venkateswaran has been working for IBM
India since 1996. His recent Linux projects include putting Linux onto a
wristwatch, a PDA, and a pacemaker programmer. You can reach Krishnan at .
*************************************************************************************