kernel Thread
Gearheads Written by Sreekrishnan Venkateswaran Thursday, 15 September 2005
Threads are programming abstractions used in concurrent processing. A
kernel thread is a way to implement background tasks inside the kernel. A
background task can be busy handling asynchronous events or can be
asleep, waiting for an event to occur. Kernel threads are similar to
user processes, except that they live in kernel space and have access to
kernel functions and data structures. Like user processes, kernel
threads appear to monopolize the processor because of preemptive
scheduling.
In this month’s “Gearheads,” let’s discuss kernel threads and develop an
example that also demonstrates such as process states, wait queues, and
user-mode helpers.
Built-in Kernel Threads
To see the kernel threads (also called kernel processes) running on your
system, run the command ps –ef. You should see something similar to
Figure One.
FIGURE ONE: A typical list of Linux kernel threads
$ ps –ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 22:36 ? 00:00:00 init [3]
root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]
root 3 1 0 22:36 ? 00:00:00 [events/0]
root 38 3 0 22:36 ? 00:00:00 [pdflush]
root 39 3 0 22:36 ? 00:00:00 [pdflush]
root 29 1 0 22:36 ? 00:00:00 [khubd]
root 695 1 0 22:36 ? 00:00:00 [kjournald]
…
root 3914 1 0 22:37 ? 00:00:00 [nfsd]
root 3915 1 0 22:37 ? 00:00:00 [nfsd]
…
root 4015 3364 0 22:55 tty3 00:00:00 -bash
root 4066 4015 0 22:59 tty3 00:00:00 ps -ef
The output of ps –ef is a list of user and kernel processes running on
your system. Kernel process names are surrounded by square brackets
([]).
The [ksoftirqd/0] kernel thread is an aid to implement soft IRQs. Soft
IRQs are raised by interrupt handlers to request “bottom half”
processing of portions of the interrupt handler whose execution can be
deferred. The idea is to minimize the code inside interrupt handlersm
which results in reduced interrupt-off times in the system, thus
resulting in lower latencies. ksoftirqd ensures that a high load of soft
IRQs neither starves the soft IRQs nor overwhelms the system. (On
Symmetric Multi-Processing (SMP) machines, where multiple thread
instances can run on different processors in parallel, one instance of
ksoftirqd is created per processor to improve throughput. On SMP
machines, the kernel processes are named ksoftirqd/ n, where n is the
processor number.)
The events/n threads (where n is the processor number) help implement
work queues, which are another way of deferring work in the kernel. If a
part of the kernel wants to defer execution of work, it can either
create its own work queue or make use of the default events/ n worker
thread.
The pdflush kernel thread flushes dirty pages from the page cache. The
page cache buffers accesses to the disk. To improve performance, actual
writes to the disk are delayed until the pdflush daemon writes out
dirtied data to disk. This is done if the available free memory dips
below a threshold or if the page has remained dirty for a sufficiently
long time. In the 2.4.* kernels, these two tasks were respectively
performed by separate kernel threads, bdflush and kupdated.
You may have noticed that there are two instances of pdflush in the ps
output. A new instance is created if the kernel senses that existing
instances are becoming intolerably busy servicing disk queues. Launching
new instances of pdflush improves throughput, especially if your system
has multiple disks and many of them are busy.
The khubd thread, part of the Linux USB core, monitors the machine’s USB
hub and configures USB devices when they are hot-plugged into the
system. kjournald is the generic kernel journaling thread, which is used
by file systems like ext3. The Linux Network File System (NFS) server
is implemented using a set of kernel threads named nfsd.
Creating a Kernel Thread
To illustrate kernel threads, let’s implement a simple example. Assume
that you’d like the kernel to asynchronously invoke a user-mode program
to send you a page or an email alert whenever it senses that the health
of certain kernel data structures is unsatisfactory — for instance, free
space in network receive buffers has dipped below a low watermark.
This is a candidate for a kernel thread because:
*It’s a background task, since it has to wait for asynchronous events.
*It needs access to kernel data structures, since the actual detection of events must be done by other parts of the kernel.
*It has to invoke a user-mode helper program, which is a time consuming operation.
The kernel thread relinquishes the processor till it gets woken up by
parts of the kernel that are responsible for monitoring the data
structures of interest. It then invokes the user-mode helper program and
passes on the appropriate identity code to the program’s environment.
The user-mode program is registered with the kernel via the /proc file
system.
Listing One creates the kernel thread.
Listing One: Creating a Linux kernel thread
ret = kernel_thread (mykthread, NULL,
CLONE_FS | CLONE_FILES | CLONE_SIGHAND | SIGCHLD);
The thread can be created in an appropriate place, for example, in
init/main.c. The flags specify the resources to be shared between the
parent and child threads: CLONE_FILES specifies that open files are to
be shared, while CLONE_SIGHAND requests that signal handlers be shared.
Listing Two is the actual kernel thread. daemonize() creates the thread
without attached user resources, while reparent_to_init() changes the
parent of the calling thread to the init task.
Each Linux thread has a single parent. If a parent process dies without
waiting for its child to exit, the child becomes a zombie process and
wastes resources. Re-parenting the child to the init task avoids this.
In the 2.6 kernel, the daemonize() function itself internally invokes
reparent_to_init.
Since daemonize() blocks all signals by default, you have to call
allow_signal() to enable delivery if your thread desires to handle a
particular signal. There are no signal handlers inside the kernel, so
use signal_pending() to check for signals and perform the appropriate
action. For debugging purposes, the code in Listing Two requests
delivery of SIGKILL and dies if it’s received.
Listing Two: Implementing the Kernel Thread
static DECLARE_WAIT_QUEUE_HEAD (myevent_waitqueue);
rwlock_t myevent_lock;
static int mykthread (void *unused)
{
unsigned int event_id = 0;
DECLARE_WAITQUEUE (wait, current);
/* The stuff required to become a kernel thread
* without attached user resources */
daemonize ("mykthread");
reparent_to_init (); /* In 2.4 kernels */
/* Request delivery of SIGKILL */
allow_signal (SIGKILL);
/* The thread will sleep on this wait queue till it is
* woken up by parts of the kernel in charge of sensing
* the health of data structures of interest */
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* Relinquish the processor till the event occurs */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Die if I receive SIGKILL */
if (signal_pending (current)) break;
/* Control gets here when the thread is woken up */
read_lock (&myevent_lock); /* Critical section starts */
if (myevent_id) { /* Guard against spurious wakeups */
event_id = myevent_id;
read_unlock (&myevent_lock); /* Critical section ends */
/* Invoke the registered user-mode helper and
* pass the identity code in its environment */
run_umode_handler (event_id); /* See Listing Five */
} else {
read_unlock (&myevent_lock);
}
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
return 0;
}
If you compile this as part of the kernel, you can see the newly created
thread, mykthread, in the ps output, as shown in Figure Two.
FIGURE TWO: The new thread, mykthread, is a child of init
$ ps –ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 21:56 ? 00:00:00 init [3]
root 2 1 0 22:36 ? 00:00:00 [ksoftirqd/0]
…
root 111 1 0 21:56 ? 00:00:00 [mykthread]
…
Before delving further into the thread implementation, let’s look at a
code snippet that detects the event and awakens mykthread. Refer to
Listing Three.
Listing Three: Waking up the kernel thread
/* Executed by parts of the kernel that own the
data structures whose health you want to monitor */
/* ... */
if (my_key_datastructure looks troubled) {
write_lock (&myevent_lock);
/* Fill in the identity of the data structure */
myevent_id = datastructure_id;
write_unlock (&myevent_lock);
/* Wake up mykthread */
wake_up_interruptible (&myevent_waitqueue);
}
/* ... */
The kernel accomplishes useful work using a combination of process
contexts and interrupt contexts. Process contexts aren’t tied to any
interrupt context and vice versa. Listing Two executes in a process
context, while Listing Three can run from both process and interrupt
contexts.
Process and interrupt contexts communicate via kernel data structures.
In the example, myevent_id and myevent_waitqueue are used for this
communication. myevent_id contains the identity of the data structure
that’s in trouble. Access to myevent_id is serialized using spin locks.
(Kernel threads are preemptible only if CONFIG_PREEMPT is turned on
during compile time. If CONFIG_PREEMPT is off or if you are running a
2.4 kernel without the preemption patch, your thread will freeze the
system if it doesn’t go to sleep. If you comment out schedule() in
Listing Two and disable CONFIG_PREEMPT in your kernel configuration,
your system will lock up, too.)
Process States and Wait Queues
Let’s take a closer look at the code snippet that puts mykthread to
sleep while waiting for events. The snippet is shown in Listing Four.
LISTING FOUR: How to put a thread to sleep
add_wait_queue (&myevent_waitqueue, &wait);
for (;;) {
/* .. */
set_current_state (TASK_INTERRUPTIBLE);
schedule ();
/* Point A */
/* .. */
}
set_current_state (TASK_RUNNING);
remove_wait_queue (&myevent_waitqueue, &wait);
Wait queues hold threads that need to wait for an event or a system
resource. A thread in a wait queue sleeps until it’s woken by another
thread or an interrupt handler that’s responsible for detecting the
event. Queuing and de-queuing are done using the add_wait_queue() and
remove_wait_queue() functions, while waking up queued tasks is
accomplished via the wake_up_interruptible() routine.
In the above code snippet, set_current_state() is used to set the run
state of the kernel thread. A kernel thread (or a normal process) can be
in either of the following states: running, interruptible,
uninterruptible, zombie, stopped, traced, or dead. These states are
defined in include/linux/sched.h.
*A process in the running state (TASK_RUNNING) is in the scheduler run
queue and is a candidate for CPU time according to the scheduling
algorithm.
*A task in the interruptible state (TASK_INTERRUPTIBLE) is waiting for
an event to occur and isn’t in the scheduler run queue. When the task
gets woken up or if a signal is delivered to it, it re-enters the run
queue.
*The uninterruptible state (TASK_UNINTERRUPTIBLE) is similar to the
interruptible state except that receipt of a signal won’t put the task
back into the run queue.
*A task in the zombie state (EXIT_ZOMBIE) has terminated, but its parent did not wait for the task to complete.
*A stopped task (TASK_STOPPED) has stopped execution due to receipt of certain signals.
mykthread sleeps on a wait queue (myevent_waitqueue) and changes its
state to TASK_INTERRUPTIBLE, signaling that it desires to opt out of the
scheduler run queue. The call to schedule() asks the scheduler to
choose and run a new task from its run queue.
When another part of the kernel awakens mykthread using
wake_up_interruptible() as shown in Listing Three, the thread is put
back into the scheduler run queue. The process state also gets changed
to TASK_RUNNING, so there’s no race condition even if the wake up occurs
between the time the task state is set to TASK_INTERRUPTIBLE and the
schedule() function is called. The thread also gets back into the run
queue if a SIGKILL signal is delivered to it. When the scheduler
subsequently picks mykthread from the run queue, execution resumes at
Point A.
User-Mode Helpers
The kernel supports a mechanism for invoking user-mode programs to help
perform certain functions. For example, if module auto-loading is
enabled, the kernel dynamically loads necessary modules on demand using a
user-mode module loader. The default loader is /sbin/modprobe, but you
can change it by registering your own loader in
/proc/sys/kernel/modprobe. Similarly, the kernel notifies user space
about hot-plug events by invoking the program registered in
/proc/sys/kernel/hotplug, which is by default /sbin/hotplug.
Listing Four contains the function used by mykthread to notify user
space about detected events. The user-mode program to invoke can be
registered via the sysctl interface in the /proc file system. To do
this, make sure that CONFIG_SYSCTL is enabled in your kernel
configuration and add an entry to the kern_table array in
kernel/sysctl.c:
{KERN_MYEVENT_HANDLER, "myevent_handler",
&myevent_handler, 256,
0644, NULL, &proc_dostring,
&sysctl_string}
This creates an entry /proc/sys/kernel/myevent_handler in the /proc file
system. To register your user-mode helper, do the following:
$ echo /path/to/helper > \
/proc/sys/kernel/myevent_handler
This makes /path/to/helper execute when the function in Listing Five runs.
Listing Five: Invoking User Mode Helpers
/* Called from Listing Two */
static void run_umode_handler (int event_id)
{
int i = 0;
char *argv[2], *envp[4], *buffer = NULL;
int value;
argv[i++] = myevent_handler; /* Defined earlier in kernel/sysctl.c */
/* Fill in the id corresponding to the data structure in trouble */
if (!(buffer = kmalloc (32, GFP_KERNEL))) return;
sprintf (buffer, "TROUBLED_DS=%d", event_id);
/* If no user-mode handlers are found, return */
if (!argv[0]) return;
argv = 0;
/* Prepare the environment for /path/to/helper */
i = 0;
envp[i++] = "HOME=/";
envp[i++] = "PATH=/sbin:/bin:/usr/sbin:/usr/bin";
envp[i++] = buffer;
envp = 0;
/* Execute the user-mode program, /path/to/helper */
value = call_usermodehelper (argv[0], argv, envp, 0);
/* Check return values */
…
kfree (buffer);
}
The identity of the troubled kernel data structure is passed as an
environment variable (TROUBLED_DS) to the user-mode helper. The helper
can be a simple script like the following that sends you an email alert
containing the information that it gleaned from its environment:
#!/bin/bash
echo Kernel datastructure $TROUBLED_DS \
is in trouble | mail –s Alert root
call_usermodehelper() has to be executed from a process context and runs
with root capabilities. It’s implemented using a work queue in 2.6
kernels.
阅读(2864) | 评论(0) | 转发(0) |