分类: LINUX
2011-04-02 15:28:12
主要数据结构:
struct kmem_cache_s {
struct array_cache *array[NR_CPUS];
unsigned int batchcount;
unsigned int limit;
struct kmem_list3 lists;
unsigned int objsize;
unsigned int flags;
unsigned int num;
unsigned int free_limit;
spinlock_t spinlock;
unsigned int gfporder;
unsigned int gfpflags;
size_t colour;
unsigned int colour_off;
unsigned int colour_next;
kmem_cache_t *slabp_cache;
unsigned int slab_size;
unsigned int dflags;
void (*ctor)(void *, kmem_cache_t *, unsigned long);
void (*dtor)(void *, kmem_cache_t *, unsigned long);
const char *name;
struct list_head next;
#if STATS
unsigned long num_active;
unsigned long num_allocations;
unsigned long high_mark;
unsigned long grown;
unsigned long reaped;
unsigned long errors;
unsigned long max_freeable;
unsigned long node_allocs;
atomic_t allochit;
atomic_t allocmiss;
atomic_t freehit;
atomic_t freemiss;
#endif
#if DEBUG
int dbghead;
int reallen;
#endif
};
struct slab {
struct list_head list;
unsigned long colouroff;
void *s_mem;
unsigned int inuse;
kmem_bufctl_t free;
};
struct kmem_list3 {
struct list_head slabs_partial;
struct list_head slabs_full;
struct list_head slabs_free;
unsigned long free_objects;
int free_touched;
unsigned long next_reap;
struct array_cache *shared;
};
cache由 kmem_cache_t(struct kmem_cache_s) 来描述.
slab 由 struct slab 来描述
objec 由 kmem_bufctl_t 来描述
所有的kmem_cache_t 组成链表 cache_chain.该链表由信号量 cache_chain_sem来进行访问保护:
static struct semaphore cache_chain_sem;
static struct list_head cache_chain;
cache_chain 的第一个高速缓存是 cache_cache. 可以从名字看出来,这个是缓存的缓存.也就是说它存储的是其它缓存的缓存描述符.
高速缓存分为general 和 specific 两种.
对于general的高速缓存,就是cache_cache 和 13个 kmalloc caches.
其中这13个kmalloc caches 的大小呈几何分布:32,64, 128, ... 他们在malloc_sizes表中定义.
specific 的高速缓存是由kmem_cache_create()来创建的.
在内核启动的时候,初始化函数kmem_cache_init()来初始化cache_chain. 实际上它创建了general的高速缓存并初始化相应的内容.
整个的cache_chain的情形如下图所示:
cache_cache
+------------+
| |
+------------+
| gfporder |
+------------+
| lists |
+------------+ /-------------\
/---------->| next |<----------->| cache_chain |<----------------\
| +------------+ \-------------/ |
| | | |
| +------------+ |
| | | |
| +------------+ |
| |
| |
| |
| kmem_cache_t kmem_cache_t |
| +------------+ +------------+ |
| | | | | |
| +------------+ +------------+ |
| | gfporder | | gfporder | |
| +------------+ +------------+ |
| | lists | | lists | |
| +------------+ +------------+ |
\---->| next |<--------- ...... ---------->| next |<-----/
+------------+ +------------+
| | | |
+------------+ +------------+
| | | |
+------------+ +------------+
一个cache是包含若干个slab,每个slab又包含若干个object. 最终的object才是我们用来存储数据的memory
对于cache, slab, object
这里引用ULK3上的一张图来说明他们之间的关系:
对于每一个cache,其每个slab包含固定数目的连续页框, 由 cachep->gfporder 表示
对于指定的cache, 其object的大小都是固定的,由 cachep->objsize 表示
在cache中,每个slab中的object的个数也是固定的, 由 cachep->num 表示
因而,对于指定的cache,slab的大小也是固定的(sizeof(struct slab) + cachep->num * sizeof(kmem_bufctl_t)), 由 cachep->slab_size 表示
在创建cache的时候(kmem_cache_create()),我们只需要指定object的size,即 cachep->objsize是通过参数来确定的(kmem_cache_create()会根据对齐的要求对齐size)
cachep->gfporder, 一个slab中object的个数,以及slab的大小都是根据kmem_cache_t->objsize 计算出来的.
对于cache描述符kmem_cache_t, slab描述符, 和object之间的结构关系,如下图所示:
slab slab
+-----------+ +-----------+
/--->| list |<--------- ...... ---------->| list |-----\
| +-----------+ +-----------+ |
| | colouroff | | colouroff | |
| +-----------+ +-----------+ |
| | *s_mem | /-----| *s_mem | |
| +-----------+ | +-----------+ |
| | in_use | | | in_use | |
| +-----------+ | +-----------+ |
| | free | | | free | |
| +-----------+ | +-----------+ |
| v |
| +--------+-------- |
| | object | ..... |
| +--------+-------- |
| |
| |
| slab slab |
| +-----------+ +-----------+ |
| /--->| list |<--- ... --->| list |<----\ |
| | +-----------+ +-----------+ | |
| | | colouroff | | colouroff | | |
| | +-----------+ +-----------+ | |
| | | *s_mem | /---| *s_mem | | |
| | +-----------+ | +-----------+ | |
| | | in_use | | | in_use | | |
| | +-----------+ | +-----------+ | |
| | | free | | | free | | |
| | +-----------+ | +-----------+ | |
| | v | |
| | +--------+-------- | |
| | | object | ..... | |
| | +--------+-------- | |
| | | |
| | kmem_cache_t | |
| | +---------------------+ | |
| | | | | |
| | +---------------------+ | |
| | | gfporder | | |
| | +---------------------+ | |
| | | | | |
| | | kmem_list3 | | |
| | | +---------------+ | | |
| \-------------->| slabs_partial |<---------------/ |
| | +---------------+ | |
\---------------------->| slabs_full |<-------------------------/
| +---------------+ |
/----------------->| slabs_free |<----------------------\
| | +---------------+ | |
| | | free_objects | | |
| | +---------------+ | |
| | | free_touched | | |
| | +---------------+ | |
| | | next_reap | | |
| | +---------------+ | |
| | | *shared | | |
| | +---------------+ | |
| | | |
| +---------------------+ |
| | next | |
| +---------------------+ |
| | objsize | |
| +---------------------+ |
| | num | |
| +---------------------+ |
| | | |
| +---------------------+ |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| slab slab |
| +-----------+ +-----------+ |
\---->| list |<------ ...... ------>| list |-----/
+-----------+ +-----------+
| colouroff | | colouroff |
+-----------+ +-----------+
| *s_mem |----\ /------| *s_mem |
+-----------+ | | +-----------+
| in_use | | | | in_use |
+-----------+ | | +-----------+
| free | | | | free |
+-----------+ | | +-----------+
| |
| v
| +--------+--------
| | object | .....
| +--------+--------
|
v
+--------+---------+--------+
| object | ... | object |
+--------+---------+--------+
如果需要从一个cache中分配一个object,则通过调用 kmem_cache_alloc()来实现:
kmem_cache_alloc(kmem_cache_t *cachep, unsigned long flag);
前面我们介绍了高速缓存的基本概念,下面我们将介绍它和memory中的页框是如何关联到一起的.
slab alloctor是通过buddy alloctor来申请页框的.
这在 kmem_cache_alloc() 中,通过调用kmem_getpages();而kmem_getpages()又会相应的调用alloc_pages().
当一个页框被分配给slab之后,page->flag 中的 PG_slab会被置位.同时,该page描述符的lru会分别指向相应的cache和slab, 如下图所示:
+-------------+
| page |
| +------+ |
| | flag | |
prev | +------+ | next
/-----------------------| lru |---------------------\
| | +------+ | |
| | | | | |
| | +------+ | |
| +-------------+ |
| | | |
| | . | |
| | . | |
| | . | |
| | . | |
| | . | |
| | . | |
| | | |
| +-------------+ |
| | page | |
| | +------+ | |
| | | flag | | |
| prev | +------+ | next |
+-----------------------| lru |---------------------+
| | +------+ | |
| | | | | |
| | +------+ | |
| +-------------+ |
| |
| |
| |
| |
| |
| kmem_cache_t <----/
\----> slab +------------+
+-----------+ | |
| list | +------------+
+-----------+ | gfporder |
| colouroff | +------------+
+-----------+ | lists |
| s_mem | +------------+
+-----------+ | next |
| inuse | +------------+
+-----------+ | |
| free | +------------+
+-----------+ | |
+------------+
object 描述符会紧挨着slab描述符来存放,他们的大小为: cachep->slab_size
对于slab描述符,会根据 cachep->objsize 的大小来确定是存放到 slab的内部还是外部.
如果是存放在内部,则会在 kmem_cache_alloc() 分配的页框中存放slab描述符和object描述符
如果是存放在外部,则会根据 cachep->slab_size 的大小, 从13个 kmalloc caches 中选择一个合适的来存放.
object 描述符通常就是一个short int型的值, 对于一个slab,共有 cachep->num 个object 描述符
object描述符存放的是 下一个空闲object的下标, 它只有在object空闲的时候才有意义.
最后一个object描述符的值为 BUFCTL_END,用来标记object的结束.
slab->s_mem 指向第一个slab的地址,
slab->free 指向下一个空闲object的下标,如果没有空闲的object,则为BUFCTL_END
因而,我们可以通过 slab->s_mem + slab->free * cachep->objsize 来找到当前第一个空闲的object的地址
通过(kmem_bufctl_t *)(slab + 1)[slab->free]得到下一个空闲object的下标
下图是一个简单的示例,来说明slab的结构:
+-----------------+
| slab |
| +-----------+ |
| | list | |
| +-----------+ |
| | colouroff | |
| +-----------+ |
| | *s_mem | -------\
| +-----------+ | |
| | in_use | | |
| +-----------+ | |
| | free |--------------\
| +-----------+ | | |
+-----------------+ | |
| kmem_bufctl_t | | |
+-----------------+ | |
/-------| kmem_bufctl_t | | |
| +-----------------+ | |
| | kmem_bufctl_t | | |
| +-----------------+ | |
| | kmem_bufctl_t | | |
+----------+ +-----------------+ | |
|BUFCTL_END|<-------------| kmem_bufctl_t | | |
+----------+ | +-----------------+<----/ |
| | | |
| | | |
| | object 0 | 0 |
| | | |
| | | |
| +-----------------+ |
| | | |
| | |+-+ |
| | object 1 ||1|<-------/
| | (free) |+-+
| | |
| +-----------------+
| | |
| | |
| | object 2 | 2
| | |
| | |
| +-----------------+
| | |
| | |
| | object 3 | 3
| | |
| | |
| +-----------------+
| | |
| +-+ | |
\-->|4| | object 4 | 4
+-+ | (free) |
| |
+-----------------+
这里再引用ULK3上的一幅图来做补充说明:
====
http://blog.sina.com.cn/s/blog_5219094a0100b0qp.html