原文地址:http://blog.chinaunix.net/uid-20737871-id-1881244.html
因为所涉及的话题在代码的实现上是体系架构相关的,所以本贴基于ARM架构。
这里所谓的内核线程,实际上是由kernel_thread函数创建的一个进程,有自己独立的task_struct结构并可被调度器调度,这种进程的特殊之处在于它只在内核态运行。
在Linux source code中, init/main.c中的rest_init()中就开始调用kernel_thread来构造内核线程了,比如:
kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND);
kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES);
我们在源代码中通过跟踪kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES)的调用来揭示Linux中的这种特殊的内核态进程的背后秘密。
在ARM中,kernel_thread定义如下:
/*
* Create a kernel thread.
*/
pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags)
{
struct pt_regs regs;
memset(®s, 0, sizeof(regs));
regs.ARM_r1 = (unsigned long)arg;
regs.ARM_r2 = (unsigned long)fn;
regs.ARM_r3 = (unsigned long)do_exit;
regs.ARM_pc = (unsigned long)kernel_thread_helper;
regs.ARM_cpsr = SVC_MODE;
return do_fork(flags|CLONE_VM|CLONE_UNTRACED, 0, ®s, 0, NULL, NULL);
}
[注:这里有个调试方面的小技巧。当初海豚用BDI3000调试BALI板的时候,为了调试生成的内核线程代码,需要将上述do_fork中的CLONE_UNTRACED flag移除,重新编译内核,调试器才可停在内核线程函数代码所设的断点上]
kernel_thread函数的最后一行是调用do_fork来生成一个进程框架(主体结构是task_struct),在do_fork中会将新生成 的进程执行入口点设置为ret_from_fork()。这样,当新进程被调度器调度时,将从ret_from_fork()函数开始执行。在 ret_from_fork中,会调用kernel_thread函数中设置的ARM_pc,也就是说调用kernel_thread_helper.
kernel_thread_helper
/*
* Shuffle the argument into the correct register before calling the
* thread function. r1 is the thread argument, r2 is the pointer to
* the thread function, and r3 points to the exit function.
*/
extern void kernel_thread_helper(void);
asm( ".section .text\n"
" .align\n"
" .type kernel_thread_helper, #function\n"
"kernel_thread_helper:\n"
" mov r0, r1\n"
" mov lr, r3\n"
" mov pc, r2\n"
" .size kernel_thread_helper, . - kernel_thread_helper\n"
" .previous");
这段汇编代码将r1赋给r0,r0在函数调用时作为传递参数寄存器。在1楼的kernel_thread函数中,regs.ARM_r1 = (unsigned long)arg;
r3给了lr,实际上就是保存内核线程函数返回时的调用地址,在本例中,也就是kthreadd返回后所调用的函数,该函数为do_exit,这意味着当内核线程函数退出后,其所在的进程将会被销毁。所以,内核线程函数一般都不会轻易退出。
mov pc, r2代码的执行将会调用kthreadd内核线程函数。
kthreadd
int kthreadd(void *unused)
{
struct task_struct *tsk = current;
/* Setup a clean context for our children to inherit. */
set_task_comm(tsk, "kthreadd");
ignore_signals(tsk);
set_user_nice(tsk, KTHREAD_NICE_LEVEL);
set_cpus_allowed(tsk, CPU_MASK_ALL);
current->flags |= PF_NOFREEZE;
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
if (list_empty(&kthread_create_list))
schedule();
__set_current_state(TASK_RUNNING);
spin_lock(&kthread_create_lock);
while (!list_empty(&kthread_create_list)) {
struct kthread_create_info *create;
create = list_entry(kthread_create_list.next,
struct kthread_create_info, list);
list_del_init(&create->list);
spin_unlock(&kthread_create_lock);
create_kthread(create);
spin_lock(&kthread_create_lock);
}
spin_unlock(&kthread_create_lock);
}
return 0;
}
kthreadd的核心是一for和while循环体。在for循环中,如果发现kthread_create_list是一空链表,则调用 schedule调度函数,因为此前已经将该进程的状态设置为TASK_INTERRUPTIBLE,所以schedule的调用将会使当前进程进入睡 眠。如果kthread_create_list不为空,则进入while循环,在该循环体中会遍历该kthread_create_list列表,对于 该列表上的每一个entry,都会得到对应的类型为struct kthread_create_info的节点的指针create.
然后函数在kthread_create_list中删除create对应的列表entry,接下来以create指针为参数调用create_kthread(create).
在create_kthread()函数中,会调用kernel_thread来生成一个新的进程,该进程的内核函数为kthread,调用参数为create:
kernel_thread(kthread, create, CLONE_FS | CLONE_FILES | SIGCHLD);
kthread--
static int kthread(void *_create)
{
struct kthread_create_info *create = _create;
int (*threadfn)(void *data);
void *data;
int ret = -EINTR;
/* Copy data: it's on kthread's stack */
threadfn = create->threadfn;
data = create->data;
/* OK, tell user we're spawned, wait for stop or wakeup */
__set_current_state(TASK_UNINTERRUPTIBLE);
complete(&create->started);
schedule();
if (!kthread_should_stop())
ret = threadfn(data);
/* It might have exited on its own, w/o kthread_stop. Check. */
if (kthread_should_stop()) {
kthread_stop_info.err = ret;
complete(&kthread_stop_info.done);
}
return 0;
}
kthread会将其所在进程的状态设为TASK_UNINTERRUPTIBLE,然后调用schedule函数。所以,kthread将会使其所在的 进程进入休眠状态,直到被别的进程唤醒。如果被唤醒,将会调用create->threadfn(create->data);
其中的kthread_should_stop()如果返回真,表明对于当前进程p,有别的进程调用了kthread_stop(p),否则kthread_should_stop返回假。
Summary--
kthreadd will work on a global list named kthread_create_list, if the list is empty then the kthreadd will sleep until someone else wake it up.
Now let's see which one will update the kthread_create_list meaning insert a node into the list. kthread_create() will insert a node namedcreate into the list. After it insert the create into the kthread_create_list, it will call wake_up_process(kthreadd_task) to wake up the process which kernel thread function is kthreadd. In this case, kthreadd will create a new process which the initial state is TASK_UNINTERRUPTIBLE, so the new process will enter into sleep until someone wake it up.
The work queue make use of the kthread_create.
Then comes to the last question, who will wake up the process created by the kthreaddd?
The question is, for the work queue, when the driver call __create_workqueue_key, the latter will call start_workqueue_thread to wake up the process created by the kthreadd. worker_thread will sleep on a wait queue until the driver call queue_work to insert a working node into this wait queue, and the worker_thread will be waked up by the queue_work meanwhile...
worker_thread has been woken up, that doesn't mean the worker_thread will be called immediatelly after queque_work being called, queue_work just change the state of worker_thread process to TASK_RUNNING, this worker_thread function will be called until next schedule point, because of the higher schedule priority, so the worker_thread will be called quickly upon the coming schedule point.
So-- from the work queue point of view, __create_workqueue_key() can be divided into 2 major parts: The first one will create a new process in the system with the help of kthreadd, which has a kernel thread function named worker_thread. The new process will enter into sleep state. The second one will call start_workqueue_thread() to wake up the process created in the first part, once woken up, the worker_thread will be executed, but it will enter sleep again because the wait queue is empty.
When driver call queue_work, it will insert a working node into the wait queue and wake up the worker_thread (put the worker_thread process into TASK_RUNNING state). In the coming schedule point, worker_thread will be called to handle all the nodes in the wait queue for which it's waiting.
关于worker_thread内核线程的创立过程:
最开始由void __init init_workqueues(void)发起(Workqueue.c),依次的关键调用节点分别是(为了便于叙述,在内核代码基础上略有改动,但不影响核心调用过程)
create_workqueue("events");
__create_workqueue((("events"), 0, 0);
__create_workqueue_key("events", 0, 0, NULL, NULL);
在__create_workqueue_key("events", 0, 0, NULL, NULL)中:
1).首先创建一个struct workqueue_struct指针型变量*wq, wq->name = "events".这个新建的队列指针将被记录在全局变量keventd_wq中.
workqueue_struct定义如下:
struct workqueue_struct {
struct cpu_workqueue_struct *cpu_wq;
struct list_head list;
const char *name;
int singlethread;
int freezeable; /* Freeze threads during suspend */
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
};
2).把wq所在的队列节点加入到一名为workqueues的全局变量中(list_add(&wq->list, &workqueues)).
3).调用create_workqueue_thread(),最终调用kthread_create(worker_thread, cwq, fmt, wq->name, cpu)来生成内核线程worker_thread.
Linux内核中最终是通过do_fork来生成内核线程(正如前面所说,这其实是个能被调度的进程,拥有自己的task_struct结构),这个过程 在内核中是个比较复杂的过程,比较重要的节点总结如下:在当前进程下调用do_fork来生成一个新进程时,会大量copy当前进程的 task_struct结构到新进程的task_struct变量中,新进程如果被调度运行,入口点(pc值)是 kernel_thread_helper函数,在该函数中会再次将pc值设置为kernel_thread()中的function指针,也就是在调用 kernel_thread函数时第一参数所表示的函数。
所以,在Linux系统初始化期间,会生成一个新进程,该进程的执行线程/函数为worker_thread,该进程被创建出来之后的状态是STOP,这 意味着该线程无法进入调度队列直到针对该进程调用wake_up_process(),该进程才会真正进入调度队列,如果被调度,则开始运行 worker_thread函数。新进程被赋予的调度优先级为KTHREAD_NICE_LEVEL(-5),这个标志的实际含义将在Linux进程调度 的帖子里去写。然后对于worker_thread线程函数本身,会进一步调整调度优先级(set_user_nice(current, -5)),这样worker_thread所在进程的优先值将为-10,在可抢占式的Linux内核中,如此高的调度优先级极易导致一个调度时点,即使当 前进程也许正运行在内核态,也可能被切换出CPU,代之以worker_thread.
在kernel_thread_helper函数中,在调用内核线程函数之前设定返回地址为do_exit()函数,所以当内核线程函数退出的话将导致该进程的消失。
比如ARM中的kernel_thread_helper函数代码:
extern void kernel_thread_helper(void);
asm( ".section .text\n"
" .align\n"
" .type kernel_thread_helper, #function\n"
"kernel_thread_helper:\n"
" mov r0, r1\n"
" mov lr, r3\n"
" mov pc, r2\n"
" .size kernel_thread_helper, . - kernel_thread_helper\n"
" .previous");
mov lr, r3设定内核线程函数返回后的返回地址,Linux内核代码设定为do_exit().
可以想象,象worker_thread这种内核线程函数,一般不会轻易退出,除非对内核线程函数所在的进程上调用kthread_stop函数。
上述的kernel_thread_helper函数中的mov pc, r2将会导致worker_thread()函数被调用,该函数定义如下:
static int worker_thread(void *__cwq)
{
struct cpu_workqueue_struct *cwq = __cwq;
DEFINE_WAIT(wait);
if (cwq->wq->freezeable)
set_freezable();
set_user_nice(current, -5);
for (;;) {
prepare_to_wait(&cwq->more_work, &wait, TASK_INTERRUPTIBLE);
if (!freezing(current) &&
!kthread_should_stop() &&
list_empty(&cwq->worklist))
schedule();
finish_wait(&cwq->more_work, &wait);
try_to_freeze();
if (kthread_should_stop())
break;
run_workqueue(cwq);
}
}
正如前面猜测的那样,该函数不会轻易退出,其核心是一for循环,如果任务队列中有新的节点,则执行该节点上的函数(在run_workqueue()内部),否则worker_thread所在的进程将会继续休眠。
对work queue的深入理解需要了解Linux的调度机制,最基本的是Linux内核的调度时机,因为这关系到驱动程序开发者能对自己注册到任务队列函数的执行 时机有大体的了解。在2.4内核中,除了进程主动调用schedule()这种主动的调度方式外,调度发生在由内核态向用户态转变(从中断和系统调用返 回)的时刻,因为内核不可抢占性,所以内核态到内核态的转变时调度不会发生。而在2.6内核中,因为内核可抢占已经被支持,这意味着调度的时机除了发生在 内核态向用户态转变时,运行在内核态的进程也完全有可能被调度出处理器,比如当前进程重新允许抢占(调用preempt_enable())。在内核态的 进程允许被抢占,意味着对高优先级进程的调度粒度更细:如果当前进程允许被抢占,那么一旦当前调度队列中有比当前进程优先级更高的进程,当前进程将被切换 出处理器(最常见的情况是在系统调用的代码中接收到中断,当中断返回时,2.4代码会继续运行被中断的系统调用,而2.6代码的可抢占性会导致一个调度时 点,原先被中断的系统调用所在的进程可能会被调度队列中更高优先级的进程所取代)。关于进程的调度,会在另外的帖子中详细介绍。
create_singlethread_workqueue(name)与create_workqueue(name)
Driver调用这两个宏来创建自己的工作队列以及相应的内核进程(其内核线程函数为worker_thread,下来为了方便叙述,就简称该进程为worker_thread进程)
1. create_singlethread_workqueue(name)
该函数的实现机制如下图所示,函数返回一个类型为struct workqueue_struct的指针变量,该指针变量所指向的内存地址在函数内部调用kzalloc动态生成。所以driver在不再使用该work queue的情况下调用void destroy_workqueue(struct workqueue_struct *wq)来释放此处的内存地址。
图中的cwq是一per-CPU类型的地址空间。对于create_singlethread_workqueue而言,即使是对于多CPU系统,内核也 只负责创建一个worker_thread内核进程。该内核进程被创建之后,会先定义一个图中的wait节点,然后在一循环体中检查cwq中的 worklist,如果该队列为空,那么就会把wait节点加入到cwq中的more_work中,然后休眠在该等待队列中。
Driver调用queue_work(struct workqueue_struct *wq, struct work_struct *work)向wq中加入工作节点。work会依次加在cwq->worklist所指向的链表中。queue_work向 cwq->worklist中加入一个work节点,同时会调用wake_up来唤醒休眠在cwq->more_work上的 worker_thread进程。wake_up会先调用wait节点上的autoremove_wake_function函数,然后将wait节点从 cwq->more_work中移走。
worker_thread再次被调度,开始处理cwq->worklist中的所有work节点...当所有work节点处理完 毕,worker_thread重新将wait节点加入到cwq->more_work,然后再次休眠在该等待队列中直到Driver调用 queue_work...
create_workqueue
本帖最后由 Dolphin 于 2010-7-16 10:18 编辑
相对于create_singlethread_workqueue, create_workqueue同样会分配一个wq的工作队列,但是不同之处在于,对于多CPU系统而言,对每一个CPU,都会为之创建一个per- CPU的cwq结构,对应每一个cwq,都会生成一个新的worker_thread进程。但是当用queue_work向cwq上提交work节点时, 是哪个CPU调用该函数,那么便向该CPU对应的cwq上的worklist上增加work节点。
阅读(2210) | 评论(0) | 转发(0) |