Processes（八）The pidhash table and chained lists-thomas

Linux Study

首页　| 　博文目录　| 　关于我

thomas_ar

博客访问： 56225
博文数量： 47
博客积分： 2095
博客等级：大尉
技术积分： 560
用户组：普通用户
注册时间： 2008-04-01 18:42

文章分类

全部博文（47）

Tools（1）
Tips（4）
杂项（0）
ULK笔记（41）

Interrupts and E（7）

Processes（18）

Memory Addressin（16）
未分配的博文（1）

文章存档

2011年（1）

2008年（46）

我的朋友

相关博文

Processes（八）The pidhash table and chained lists

分类： LINUX

2008-04-19 14:44:39

The pidhash table and chained lists

In several circumstances, the kernel must be able to derive the process descriptor pointer corresponding to a PID. This occurs, for instance, in servicing the kill( ) system call. When process P1 wishes to send a signal to another process, P2, it invokes the kill( ) system call specifying the PID of P2 as the parameter. The kernel derives the process descriptor pointer from the PID and then extracts the pointer to the data structure that records the pending signals from P2's process descriptor.

在某些情形下，内核需要能够根据PID得到对应的进程描述符。比如，在提供 kill()系统调用时。当进程P1希望发送一个信号给另一个进程P2，它就会以P2的PID作为参数调用kill()系统调用。内核根据PID得到进程描述符，然后从P2的进程描述符中取出指向记录待处理信号的数据结构。（没说完啊，然后内核会将P1发送的信号挂在这个数据结构后面）

Scanning the process list sequentially and checking the pid fields of the process descriptors is feasible but rather inefficient. To speed up the search, four hash tables have been introduced. Why multiple hash tables? Simply because the process descriptor includes fields that represent different types of PID (see Table 3-5), and each type of PID requires its own hash table.

依次遍历进程链表，检查进程描述符的pid成员是可行的但是相当不划算。为了加速搜索，引入了4个hash表。为什么是多个hash表？很简单，因为进程描述符包含了代表不同类型PID的成员，每种类型的PID需要自己的hash表。

Table 3-5. The four hash tables and corresponding fields in the process descriptor

Hash table type	Field name	Description
PIDTYPE_PID	pid	PID of the process 本进程的PID
PIDTYPE_TGID	tgid	PID of thread group leader process 线程组组长的PID
PIDTYPE_PGID	pgrp	PID of the group leader process 进程组组长的PID
PIDTYPE_SID	session	PID of the session leader process 对话组组长的PID

The four hash tables are dynamically allocated during the kernel initialization phase, and their addresses are stored in the pid_hash array. The size of a single hash table depends on the amount of available RAM; for example, for systems having 512 MB of RAM, each hash table is stored in four page frames and includes 2,048 entries.

这四个hash表是在内核初始化阶段动态分配的，它们的地址存储在pid_hash数组中。一个hash表的大小取决于可用RAM的数量；比如， RAM为512MB的系统中，每个hash表存储在4个页帧中，包含2,048个元素。

The PID is transformed into a table index using the pid_hashfn macro, which expands to:
使用pid_hashfn将PID值宏转换为表下标：

#define pid_hashfn(x) hash_long((unsigned long) x, pidhash_shift)

The pidhash_shift variable stores the length in bits of a table index (11, in our example). The hash_long( ) function is used by many hash functions; on a 32-bit architecture it is essentially equivalent to:

pidhash_shift变量存放了表索引的bit长度（在我们的例子中是11）。hash_long()函数用在很多hash函数中；在32bit架构上，它近似：

unsigned long hash_long(unsigned long val, unsigned int bits) { unsigned long hash = val * 0x9e370001UL; return hash >> (32 - bits); }

Because in our example pidhash_shift is equal to 11, pid_hashfn yields values ranging between 0 and 211 - 1 = 2047.

因为我们例子中pidhash_shift等于11，pid_hashfn取值在0到2047（2^11-1）之间。

As every basic computer science course explains, a hash function does not always ensure a one-to-one correspondence between PIDs and table indexes. Two different PIDs that hash into the same table index are said to be colliding.

正如基础计算机课程中解释的，hash函数不能总是保证PID值与表下标之间一对一的关系。两个不同PID拥有同一个表下标的情况叫做冲突。

Linux uses chaining to handle colliding PIDs; each table entry is the head of a doubly linked list of colliding process descriptors. Figure 3-5 illustrates a PID hash table with two lists. The processes having PIDs 2,890 and 29,384 hash into the 200th element of the table, while the process having PID 29,385 hashes into the 1,466th element of the table.

Linux使用chain处理冲突的PID；每个表项是一个冲突进程描述符list的头。图示了一个有2个list的PID hash表。PID为2,890和29,384的进程hash到了表的第200个元素，而PID为29,385的进程hash到了表的第1,466个元素。

Hashing with chaining is preferable to a linear transformation from PIDs to table indexes because at any given instance, the number of processes in the system is usually far below 32,768 (the maximum number of allowed PIDs). It would be a waste of storage to define a table consisting of 32,768 entries, if, at any given instance, most such entries are unused.

带chain的hash表，对于将PID线性地转换为表下标是可取的，因为在任意情形下，系统中进程数目总是远远小于32,768（允许的PID的最大值）的。如果，在大多数情况下，hash表的表项是空的，那么定义一个包含32.768个项的表就是存储上的浪费。

The data structures used in the PID hash tables are quite sophisticated, because they must keep track of the relationships between the processes. As an example, suppose that the kernel must retrieve all processes belonging to a given thread group, that is, all processes whose tgid field is equal to a given number. Looking in the hash table for the given thread group number returns just one process descriptor, that is, the descriptor of the thread group leader. To quickly retrieve the other processes in the group, the kernel must maintain a list of processes for each thread group. The same situation arises when looking for the processes belonging to a given login session or belonging to a given process group.

用在PID hash表中的数据结构有一点复杂，因为它们必须能追踪进程间的关系。比如，假设内核必须获得属于一个给定线程组的所有进程，也就是说，所有tgid成员等于给定值的进程。在hash 表中查找给定线程组值，仅返回一个进程描述符，即线程组组长的进程描述符。为了快速地获得组内的其他进程，内核必须为每个线程组维护一个进程list。相同的情形发生在当查找属于给定对话组或就存在的进程时。

The PID hash tables' data structures solve all these problems, because they allow the definition of a list of processes for any PID number included in a hash table. The core data structure is an array of four pid structures embedded in the pids field of the process descriptor; the fields of the pid structure are shown in Table 3-6.

PID hash表的数据结构解决了所有这些问题，因为它们允许为hash表中的每个PID值（每个下标）定义一个进程链表。核心的数据结构是一个有四个pid结构组成的数组，pid结构内嵌在进程描述符的pids成员中；pid结构的成员如下：

Table 3-6. The fields of the pid data structures

Type	Name	Description
int	nr	The PID number PID值
struct hlist_node	pid_chain	The links to the nex t and previous elements in the hash chain list hash chain中指向下一个和前一个元素的链接
struct list_head	pid_list	The head of the per-PID list 每PID list的头

Figure 3-6 shows an example based on the PIDTYPE_TGID hash table. The second entry of the pid_hash array stores the address of the hash table, that is, the array of hlist_head structures representing the heads of the chain lists. In the chain list rooted at the 71st entry of the hash table, there are two process descriptors corresponding to the PID numbers 246 and 4,351 (double-arrow lines represent a couple of forward and backward pointers). The PID numbers are stored in the nr field of the pid structure embedded in the process descriptor (by the way, because the thread group number coincides with the PID of its leader, these numbers also are stored in the pid field of the process descriptors). Let us consider the per-PID list of the thread group 4,351: the head of the list is stored in the pid_list field of the process descriptor included in the hash table, while the links to the next and previous elements of the per-PID list also are stored in the pid_list field of each list element.

图中展示了一个基于PIDTYPE_TGID的hash表。pid_hash数组的第二项存储了hash表的地址，即那个hlist_head结构的数组，它代表所有 chain 的头。在这个hash表的第70个元素的chain，上面有2个对应PID值为246和4,351的进程描述符。PID值存储在内嵌在进程描述符的pid 结构中的nr成员（另外，因为线程组值等于它组长的PID，这些值也存储在进程描述符的pid成员中）。让我们考虑线程组4,351的每PID list：这个list的头存储在hash表中进程描述符的pid_list成员，而链接到每PID list中下一个和前一个的元素的指针也存放在每个list元素的pid_list成员。

The following functions and macros are used to handle the PID hash tables:

do_each_task_pid(nr, type, task)
while_each_task_pid(nr, type, task)

Mark begin and end of a do-while loop that iterates over the per-PID list associated with the PID number nr of type type; in any iteration, task points to the process descriptor of the currently scanned element.

标记一个do-while循环的开始和结束，这个循环迭代对应给定类型为type、PID值为nr整个每PID链表；在每次迭代中，指针task指向当前扫描的进程描述符。

find_task_by_pid_type(type, nr)

Looks for the process having PID nr in the hash table of type type. The function returns a process descriptor pointer if a match is found, otherwise it returns NULL.

在类型为type的hash表中寻找PID值为nr的进程。如果找到的话，这个函数返回一个进程描述符指针，否则返回NULL。

find_task_by_pid(nr)

Same as find_task_by_pid_type(PIDTYPE_PID, nr).

attach_pid(task, type, nr)

Inserts the process descriptor pointed to by task in the PID hash table of type type according to the PID number nr; if a process descriptor having PID nr is already in the hash table, the function simply inserts task in the per-PID list of the already present process.

将tash指向的进程描述符，通过其PID值nr插入类型为type的hash表中；如果PID为nr的进程描述符已经存在在hash表中，这个函数会将task插入已存在进程的每PID list中。

detach_pid(task, type)

Removes the process descriptor pointed to by task from the per-PID list of type type to which the descriptor belongs. If the per-PID list does not become empty, the function terminates. Otherwise, the function removes the process descriptor from the hash table of type type; finally, if the PID number does not occur in any other hash table, the function clears the corresponding bit in the PID bitmap, so that the number can be recycled.

从这个描述符属于的类型为type的每PID list中移除由task指向的进程描述符。如果这个每PID list没有变为空，函数结束。否则函数将进程描述符从类型为type的hash表中移除；最后，如果这个PID值没有出现在任何其他hash表，函数清空PID bitmap中对应的bit，使得这个值可以被重用。

next_thread(task)

Returns the process descriptor address of the lightweight process that follows task in the hash table list of type PIDTYPE_TGID. Because the hash table list is circular, when applied to a conventional process the macro returns the descriptor address of the process itself.

返回类型为PIDTYPE_TGID的hash表list中跟随task的轻进程的进程描述符的地址。因为hash表list是环状的，当应用在一个普通的进程上时，这个宏返回进程本身的描述符地址。

阅读(533) | 评论(0) | 转发(0) |

上一篇：Processes（七）Relationships Among Processes

下一篇：Processes（九）How Processes Are Organized

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6