Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1210741
  • 博文数量: 56
  • 博客积分: 400
  • 博客等级: 一等列兵
  • 技术积分: 2800
  • 用 户 组: 普通用户
  • 注册时间: 2010-03-30 13:08
个人简介

一个人的差异在于业余时间

文章分类

全部博文(56)

文章存档

2023年(1)

2019年(1)

2018年(1)

2017年(1)

2016年(2)

2015年(20)

2014年(10)

2013年(7)

2012年(12)

2011年(1)

分类: LINUX

2015-02-10 20:35:44

   之前多多少少接触过cache之类的东西,总觉的很神秘,当然cache就是为了读写内存更高效。比如查看meminfo或者slabinfo的时候,你是否真的对内存机制理解的很清晰?
 参考内核linux 3.8.13
 我们看看调用它的函数接口:

点击(此处)折叠或打开

  1. /*
  2.  * Set up kernel memory allocators
  3.  */
  4. static void __init mm_init(void)
  5. {
  6.     /*
  7.      * page_cgroup requires contiguous pages,
  8.      * bigger than MAX_ORDER unless SPARSEMEM.
  9.      */
  10.     page_cgroup_init_flatmem();
  11.     mem_init();
  12.     kmem_cache_init();
  13.     percpu_init_late();
  14.     pgtable_cache_init();
  15.     vmalloc_init();
  16. }
这个函数在start_kernel里调用. 下面我们就看看 kmem_cache_init();  //默认slab分配器

点击(此处)折叠或打开

  1. /*
  2.  * Initialisation. Called after the page allocator have been initialised and
  3.  * before smp_init().
  4.  */
  5. void __init kmem_cache_init(void)
  6. {
  7.     struct cache_sizes *sizes;
  8.     struct cache_names *names;
  9.     int i;

  10.     kmem_cache = &kmem_cache_boot;
  11.     setup_nodelists_pointer(kmem_cache);    // 关于为什么要设置这个玩意,我找到一个patch说明
  12. 点击(此处)折叠或打开

    1. From 3c58346525d82625e68e24f071804c2dc057b6f4 Mon Sep 17 00:00:00 2001
    2. From: Christoph Lameter <cl@linux.com>
    3. Date: Wed, 28 Nov 2012 16:23:01 +0000
    4. Subject: [PATCH] slab: Simplify bootstrap

    5. The nodelists field in kmem_cache is pointing to the first unused
    6. object in the array field when bootstrap is complete.

    7. A problem with the current approach is that the statically sized
    8. kmem_cache structure use on boot can only contain NR_CPUS entries.
    9. If the number of nodes plus the number of cpus is greater then we
    10. would overwrite memory following the kmem_cache_boot definition.

    11. Increase the size of the array field to ensure that also the node
    12. pointers fit into the array field.

    13. Once we do that we no longer need the kmem_cache_nodelists
    14. array and we can then also use that structure elsewhere.

    15. Acked-by: Glauber Costa <glommer@parallels.com>
    16. Signed-off-by: Christoph Lameter <cl@linux.com>
    17. Signed-off-by: Pekka Enberg <penberg@kernel.org>



  13.     if (num_possible_nodes() == 1)
  14.         use_alien_caches = 0;

  15.     for (i = 0; i < NUM_INIT_LISTS; i++)
  16.         kmem_list3_init(&initkmem_list3[i]);

  17.     set_up_list3s(kmem_cache, CACHE_CACHE);

  18.     /*
  19.      * Fragmentation resistance on low memory - only use bigger
  20.      * page orders on machines with more than 32MB of memory if
  21.      * not overridden on the command line.
  22.      */
  23.     if (!slab_max_order_set && totalram_pages > (32 << 20) >> PAGE_SHIFT)
  24.         slab_max_order = SLAB_MAX_ORDER_HI;

  25.     /* Bootstrap is tricky, because several objects are allocated
  26.      * from caches that do not exist yet:
  27.      * 1) initialize the kmem_cache cache: it contains the struct
  28.      * kmem_cache structures of all caches, except kmem_cache itself:
  29.      * kmem_cache is statically allocated.
  30.      * Initially an __init data area is used for the head array and the
  31.      * kmem_list3 structures, it's replaced with a kmalloc allocated
  32.      * array at the end of the bootstrap.
  33.      * 2) Create the first kmalloc cache.
  34.      * The struct kmem_cache for the new cache is allocated normally.
  35.      * An __init data area is used for the head array.
  36.      * 3) Create the remaining kmalloc caches, with minimally sized
  37.      * head arrays.
  38.      * 4) Replace the __init data head arrays for kmem_cache and the first
  39.      * kmalloc cache with kmalloc allocated arrays.
  40.      * 5) Replace the __init data for kmem_list3 for kmem_cache and
  41.      * the other cache's with kmalloc allocated memory.
  42.      * 6) Resize the head arrays of the kmalloc caches to their final sizes.
  43.      */

  44.     /* 1) create the kmem_cache */

  45.     /*
  46.      * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
  47.      */
  48.     create_boot_cache(kmem_cache, "kmem_cache",
  49.         offsetof(struct kmem_cache, array[nr_cpu_ids]) +
  50.                  nr_node_ids * sizeof(struct kmem_list3 *),
  51.                  SLAB_HWCACHE_ALIGN);
  52.     list_add(&kmem_cache->list, &slab_caches);   // create kmem_cache后把它添加到slab_caches全局链表.

  53.     /* 2+3) create the kmalloc caches */
  54.     sizes = malloc_sizes;
  55.     names = cache_names;

  56.     /*
  57.      * Initialize the caches that provide memory for the array cache and the
  58.      * kmem_list3 structures first. Without this, further allocations will
  59.      * bug.
  60.      */

  61.     sizes[INDEX_AC].cs_cachep = create_kmalloc_cache(names[INDEX_AC].name,
  62.                     sizes[INDEX_AC].cs_size, ARCH_KMALLOC_FLAGS);

  63.     if (INDEX_AC != INDEX_L3)
  64.         sizes[INDEX_L3].cs_cachep =
  65.             create_kmalloc_cache(names[INDEX_L3].name,
  66.                 sizes[INDEX_L3].cs_size, ARCH_KMALLOC_FLAGS);

  67.     slab_early_init = 0;

  68.     while (sizes->cs_size != ULONG_MAX) {
  69.         /*
  70.          * For performance, all the general caches are L1 aligned.
  71.          * This should be particularly beneficial on SMP boxes, as it
  72.          * eliminates "false sharing".
  73.          * Note for systems short on memory removing the alignment will
  74.          * allow tighter packing of the smaller caches.
  75.          */
  76.         if (!sizes->cs_cachep)
  77.             sizes->cs_cachep = create_kmalloc_cache(names->name,
  78.                     sizes->cs_size, ARCH_KMALLOC_FLAGS);

  79. #ifdef CONFIG_ZONE_DMA
  80.         sizes->cs_dmacachep = create_kmalloc_cache(
  81.             names->name_dma, sizes->cs_size,
  82.             SLAB_CACHE_DMA|ARCH_KMALLOC_FLAGS);
  83. #endif
  84.         sizes++;
  85.         names++;
  86.     }
  87.     /* 4) Replace the bootstrap head arrays */
  88.     {
  89.         struct array_cache *ptr;

  90.         ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

  91.         memcpy(ptr, cpu_cache_get(kmem_cache),
  92.          sizeof(struct arraycache_init));
  93.         /*
  94.          * Do not assume that spinlocks can be initialized via memcpy:
  95.          */
  96.         spin_lock_init(&ptr->lock);

  97.         kmem_cache->array[smp_processor_id()] = ptr;

  98.         ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

  99.         BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)
  100.          != &initarray_generic.cache);
  101.         memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
  102.          sizeof(struct arraycache_init));
  103.         /*
  104.          * Do not assume that spinlocks can be initialized via memcpy:
  105.          */
  106.         spin_lock_init(&ptr->lock);

  107.         malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
  108.          ptr;
  109.     }
  110.     /* 5) Replace the bootstrap kmem_list3's */
  111.     {
  112.         int nid;

  113.         for_each_online_node(nid) {
  114.             init_list(kmem_cache, &initkmem_list3[CACHE_CACHE + nid], nid);

  115.             init_list(malloc_sizes[INDEX_AC].cs_cachep,
  116.                  &initkmem_list3[SIZE_AC + nid], nid);

  117.             if (INDEX_AC != INDEX_L3) {
  118.                 init_list(malloc_sizes[INDEX_L3].cs_cachep,
  119.                      &initkmem_list3[SIZE_L3 + nid], nid);
  120.             }
  121.         }
  122.     }

  123.     slab_state = UP;
  124. }
第一行来自一个全局的指针变量,即为创建第一个cache( kmem_cache)
在mm/slab_common.c中
struct kmem_cache *kmem_cache;
创建的所有cache都会挂在LIST_HEAD(slab_caches); 这个全局链表上.在cat /proc/slabinfo可以查看》
这里可以看看struct kmem_cache:在slab_def.h中

点击(此处)折叠或打开

  1. struct kmem_cache {
  2. /* 1) Cache tunables. Protected by cache_chain_mutex */
  3.     unsigned int batchcount;
  4.     unsigned int limit;
  5.     unsigned int shared;

  6.     unsigned int size;
  7.     u32 reciprocal_buffer_size;  
  8. /* 2) touched by every alloc & free from the backend */

  9.     unsigned int flags;        /* constant flags */
  10.     unsigned int num;        /* # of objs per slab */

  11. /* 3) cache_grow/shrink */
  12.     /* order of pgs per slab (2^n) */
  13.     unsigned int gfporder;

  14.     /* force GFP flags, e.g. GFP_DMA */
  15.     gfp_t allocflags;

  16.     size_t colour;            /* cache colouring range */
  17.     unsigned int colour_off;    /* colour offset */
  18.     struct kmem_cache *slabp_cache;
  19.     unsigned int slab_size;

  20.     /* constructor func */
  21.     void (*ctor)(void *obj);

  22. /* 4) cache creation/removal */
  23.     const char *name;
  24.     struct list_head list;
  25.     int refcount;
  26.     int object_size;
  27.     int align;

  28. /* 5) statistics */
  29. #ifdef CONFIG_DEBUG_SLAB
  30.     unsigned long num_active;
  31.     unsigned long num_allocations;
  32.     unsigned long high_mark;
  33.     unsigned long grown;
  34.     unsigned long reaped;
  35.     unsigned long errors;
  36.     unsigned long max_freeable;
  37.     unsigned long node_allocs;
  38.     unsigned long node_frees;
  39.     unsigned long node_overflow;
  40.     atomic_t allochit;
  41.     atomic_t allocmiss;
  42.     atomic_t freehit;
  43.     atomic_t freemiss;

  44.     /*
  45.      * If debugging is enabled, then the allocator can add additional
  46.      * fields and/or padding to every object. size contains the total
  47.      * object size including these internal fields, the following two
  48.      * variables contain the offset to the user object and its size.
  49.      */
  50.     int obj_offset;
  51. #endif /* CONFIG_DEBUG_SLAB */
  52. #ifdef CONFIG_MEMCG_KMEM
  53.     struct memcg_cache_params *memcg_params;
  54. #endif

  55. /* 6) per-cpu/per-node data, touched during every alloc/free */
  56.     /*
  57.      * We put array[] at the end of kmem_cache, because we want to size
  58.      * this array to nr_cpu_ids slots instead of NR_CPUS
  59.      * (see kmem_cache_init())
  60.      * We still use [NR_CPUS] and not [1] or [0] because cache_cache
  61.      * is statically defined, so we reserve the max number of cpus.
  62.      *
  63.      * We also need to guarantee that the list is able to accomodate a
  64.      * pointer for each node since "nodelists" uses the remainder of
  65.      * available pointers.
  66.      */
  67.     struct kmem_list3 **nodelists;
  68.     struct array_cache *array[NR_CPUS + MAX_NUMNODES];
  69.     /*
  70.      * Do not add fields after array[]
  71.      */
  72. }
这个结构体里面几个关键的元素之前在kmalloc里已经说到了。
kmem_cache_boot则是:

点击(此处)折叠或打开

  1. /* internal cache of cache description objs */
  2. static struct kmem_cache kmem_cache_boot = {
  3.     .batchcount = 1,
  4.     .limit = BOOT_CPUCACHE_ENTRIES,   // 默认为 1
  5.     .shared = 1,
  6.     .size = sizeof(struct kmem_cache),
  7.     .name = "kmem_cache",
  8. };
注释解释的已经很清晰了.
而setup_nodelists_pointer的作用就是把struct kmem_cache里array指针地址存放在nodelists.目的是为了便于操作指针.
对于一致性内存访问,inode只有一个. 

点击(此处)折叠或打开

  1. static struct kmem_list3 __initdata initkmem_list3[NUM_INIT_LISTS];
它是slab.c中静态全局变量

点击(此处)折叠或打开

  1. /*
  2.  * Need this for bootstrapping a per node allocator.
  3.  */
kmem_list3_init初始化slab的三个链表slabs_full、slabs_partial、slabs_free.为什么初始化这个和cache组成结构有关系,可以看个图:

这里CACHE_CACHE在文件的开头部分被定义为0.

点击(此处)折叠或打开

  1. /*
  2.  * For setting up all the kmem_list3s for cache whose buffer_size is same as
  3.  * size of kmem_list3.
  4.  */
  5. static void __init set_up_list3s(struct kmem_cache *cachep, int index)
  6. {
  7.     int node;

  8.     for_each_online_node(node) {
  9.         cachep->nodelists[node] = &initkmem_list3[index + node];
  10.         cachep->nodelists[node]->next_reap = jiffies +
  11.          REAPTIMEOUT_LIST3 +
  12.          ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
  13.     }
  14. }
接着就要开始了真正的创建cache的工作,并且给出了初始化步骤和说明:

点击(此处)折叠或打开

  1. /* Bootstrap is tricky, because several objects are allocated
  2.      * from caches that do not exist yet:
  3.      * 1) initialize the kmem_cache cache: it contains the struct
  4.      * kmem_cache structures of all caches, except kmem_cache itself:
  5.      * kmem_cache is statically allocated.
  6.      * Initially an __init data area is used for the head array and the
  7.      * kmem_list3 structures, it's replaced with a kmalloc allocated
  8.      * array at the end of the bootstrap.
  9.      * 2) Create the first kmalloc cache.
  10.      * The struct kmem_cache for the new cache is allocated normally.
  11.      * An __init data area is used for the head array.
  12.      * 3) Create the remaining kmalloc caches, with minimally sized
  13.      * head arrays.
  14.      * 4) Replace the __init data head arrays for kmem_cache and the first
  15.      * kmalloc cache with kmalloc allocated arrays.
  16.      * 5) Replace the __init data for kmem_list3 for kmem_cache and
  17.      * the other cache's with kmalloc allocated memory.
  18.      * 6) Resize the head arrays of the kmalloc caches to their final sizes.
  19.      */

  20.     /* 1) create the kmem_cache */

  21.     /*
  22.      * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
  23.      */
  24.     create_boot_cache(kmem_cache, "kmem_cache",
  25.         offsetof(struct kmem_cache, array[nr_cpu_ids]) +
  26.                  nr_node_ids * sizeof(struct kmem_list3 *),
  27.                  SLAB_HWCACHE_ALIGN);
  28.     list_add(&kmem_cache->list, &slab_caches);
首先创建第一个cache它名为kmem_cache,并且kmem_cache指针变量指向了kmem_cache_boot.
下面我们看看create_boot_cache函数

点击(此处)折叠或打开

  1. #ifndef CONFIG_SLOB
  2. /* Create a cache during boot when no slab services are available yet */
  3. void __init create_boot_cache(struct kmem_cache *s, const char *name, size_t size,
  4.         unsigned long flags)
  5. {
  6.     int err;

  7.     s->name = name;
  8.     s->size = s->object_size = size;
  9.     s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
  10.     err = __kmem_cache_create(s, flags);

  11.     if (err)
  12.         panic("Creation of kmalloc slab %s size=%zd failed. Reason %d\n",
  13.                     name, size, err);

  14.     s->refcount = -1;    /* Exempt from merging for now */
  15. }

  16. struct kmem_cache *__init create_kmalloc_cache(const char *name, size_t size,
  17.                 unsigned long flags)
  18. {
  19.     struct kmem_cache *s = kmem_cache_zalloc(kmem_cache, GFP_NOWAIT);

  20.     if (!s)
  21.         panic("Out of memory when creating slab %s\n", name);

  22.     create_boot_cache(s, name, size, flags);
  23.     list_add(&s->list, &slab_caches);
  24.     s->refcount = 1;
  25.     return s;
  26. }

  27. #endif /* !CONFIG_SLOB */
而它接着调用了__kmem_cache_create:这是最关键的函数

点击(此处)折叠或打开

  1. /**
  2.  * __kmem_cache_create - Create a cache.
  3.  * @cachep: cache management descriptor
  4.  * @flags: SLAB flags
  5.  *
  6.  * Returns a ptr to the cache on success, NULL on failure.
  7.  * Cannot be called within a int, but can be interrupted.
  8.  * The @ctor is run when new pages are allocated by the cache.
  9.  *
  10.  * The flags are
  11.  *
  12.  * %SLAB_POISON - Poison the slab with a known test pattern (a5a5a5a5)
  13.  * to catch references to uninitialised memory.
  14.  *
  15.  * %SLAB_RED_ZONE - Insert `Red' zones around the allocated memory to check
  16.  * for buffer overruns.
  17.  *
  18.  * %SLAB_HWCACHE_ALIGN - Align the objects in this cache to a hardware
  19.  * cacheline. This can be beneficial if you're counting cycles as closely
  20.  * as davem.
  21.  */
  22. int
  23. __kmem_cache_create (struct kmem_cache *cachep, unsigned long flags)
  24. {
  25.     size_t left_over, slab_size, ralign;
  26.     gfp_t gfp;
  27.     int err;
  28.     size_t size = cachep->size;

  29. #if DEBUG
  30. #if FORCED_DEBUG
  31.     /*
  32.      * Enable redzoning and last user accounting, except for caches with
  33.      * large objects, if the increased size would increase the object size
  34.      * above the next power of two: caches with object sizes just above a
  35.      * power of two have a significant amount of internal fragmentation.
  36.      */
  37.     if (size < 4096 || fls(size - 1) == fls(size-1 + REDZONE_ALIGN +
  38.                         2 * sizeof(unsigned long long)))
  39.         flags |= SLAB_RED_ZONE | SLAB_STORE_USER;
  40.     if (!(flags & SLAB_DESTROY_BY_RCU))
  41.         flags |= SLAB_POISON;
  42. #endif
  43.     if (flags & SLAB_DESTROY_BY_RCU)
  44.         BUG_ON(flags & SLAB_POISON);
  45. #endif

  46.     /*
  47.      * Check that size is in terms of words. This is needed to avoid
  48.      * unaligned accesses for some archs when redzoning is used, and makes
  49.      * sure any on-slab bufctl's are also correctly aligned.
  50.      */
  51.     if (size & (BYTES_PER_WORD - 1)) {
  52.         size += (BYTES_PER_WORD - 1);
  53.         size &= ~(BYTES_PER_WORD - 1);
  54.     } //4//四字节对齐

  55.     /*
  56.      * Redzoning and user store require word alignment or possibly larger.
  57.      * Note this will be overridden by architecture or caller mandated
  58.      * alignment if either is greater than BYTES_PER_WORD.
  59.      */
  60.     if (flags & SLAB_STORE_USER)
  61.         ralign = BYTES_PER_WORD;

  62.     if (flags & SLAB_RED_ZONE) {
  63.         ralign = REDZONE_ALIGN;
  64.         /* If redzoning, ensure that the second redzone is suitably
  65.          * aligned, by adjusting the object size accordingly. */
  66.         size += REDZONE_ALIGN - 1;
  67.         size &= ~(REDZONE_ALIGN - 1);
  68.     }

  69.     /* 3) caller mandated alignment */
  70.     if (ralign < cachep->align) {
  71.         ralign = cachep->align;
  72.     }
  73.     /* disable debug if necessary */
  74.     if (ralign > __alignof__(unsigned long long))
  75.         flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
  76.     /*
  77.      * 4) Store it.
  78.      */
  79.     cachep->align = ralign;

  80.     if (slab_is_available())     //  为什么要插入这一段注释,因为它就是判断slab_state的值,默认它的值没人初始化即为DOWN.

    点击(此处)折叠或打开

    1. /*
    2.  * State of the slab allocator.
    3.  *
    4.  * This is used to describe the states of the allocator during bootup.
    5.  * Allocators use this to gradually bootstrap themselves. Most allocators
    6.  * have the problem that the structures used for managing slab caches are
    7.  * allocated from slab caches themselves.
    8.  */
    9. enum slab_state {
    10.     DOWN,            /* No slab functionality yet */
    11.     PARTIAL,        /* SLUB: kmem_cache_node available */
    12.     PARTIAL_ARRAYCACHE,    /* SLAB: kmalloc size for arraycache available */
    13.     PARTIAL_L3,        /* SLAB: kmalloc size for l3 struct available */
    14.     UP,            /* Slab caches usable but not all extras yet */
    15.     FULL            /* Everything is working */
    16. };

  81.         gfp = GFP_KERNEL;
  82.     else
  83.         gfp = GFP_NOWAIT;
  84. //点击(此处)折叠或打开
    1. #define GFP_NOWAIT    (GFP_ATOMIC & ~__GFP_HIGH)


  85.     setup_nodelists_pointer(cachep);
  86. #if DEBUG

  87.     /*
  88.      * Both debugging options require word-alignment which is calculated
  89.      * into align above.
  90.      */
  91.     if (flags & SLAB_RED_ZONE) {
  92.         /* add space for red zone words */
  93.         cachep->obj_offset += sizeof(unsigned long long);
  94.         size += 2 * sizeof(unsigned long long);
  95.     }
  96.     if (flags & SLAB_STORE_USER) {
  97.         /* user store requires one word storage behind the end of
  98.          * the real object. But if the second red zone needs to be
  99.          * aligned to 64 bits, we must allow that much space.
  100.          */
  101.         if (flags & SLAB_RED_ZONE)
  102.             size += REDZONE_ALIGN;
  103.         else
  104.             size += BYTES_PER_WORD;
  105.     }
  106. #if FORCED_DEBUG && defined(CONFIG_DEBUG_PAGEALLOC)
  107.     if (size >= malloc_sizes[INDEX_L3 + 1].cs_size
  108.      && cachep->object_size > cache_line_size()
  109.      && ALIGN(size, cachep->align) < PAGE_SIZE) {
  110.         cachep->obj_offset += PAGE_SIZE - ALIGN(size, cachep->align);
  111.         size = PAGE_SIZE;
  112.     }
  113. #endif
  114. #endif

  115.     /*
  116.      * Determine if the slab management is 'on' or 'off' slab.
  117.      * (bootstrapping cannot cope with offslab caches so don't do    //  判断slab管理信息是否在slab分配的内存页上,判断条件见下面:
  118.      * it too early on. Always use on-slab management when            //  size  >=  (默认page =4k/8k) 512/1024  ; slab_early_init在创建kmem_cache的时候为1;当创建通用cache
  119.      * SLAB_NOLEAKTRACE to avoid recursive calls into kmemleak)       //的时才会把它初始化为0 . 而第一传递的flags为 SLAB_HWCACHE_ALIGN
  120.      */
  121.     if ((size >= (PAGE_SIZE >> 3)) && !slab_early_init &&
  122.      !(flags & SLAB_NOLEAKTRACE))
  123.         /*
  124.          * Size is large, assume best to place the slab management obj
  125.          * off-slab (should allow better packing of objs).
  126.          */
  127.         flags |= CFLGS_OFF_SLAB;

  128.     size = ALIGN(size, cachep->align);

  129.     left_over = calculate_slab_order(cachep, size, cachep->align, flags); //  根据obj size 计算申请page的个数即一个slab包含多少个pages,
  130.     if (!cachep->num)                                                      //  也包含了多少个obj,除去管理信息等 剩余的空间。很简单易懂.
  131.         return -E2BIG;

  132.     slab_size = ALIGN(cachep->num * sizeof(kmem_bufctl_t)
  133.              + sizeof(struct slab), cachep->align);

  134.     /*
  135.      * If the slab has been placed off-slab, and we have enough space then
  136.      * move it on-slab. This is at the expense of any extra colouring.
  137.      */
  138.     if (flags & CFLGS_OFF_SLAB && left_over >= slab_size) {
  139.         flags &= ~CFLGS_OFF_SLAB;
  140.         left_over -= slab_size;
  141.     }

  142.     if (flags & CFLGS_OFF_SLAB) {
  143.         /* really off slab. No need for manual alignment */
  144.         slab_size =
  145.          cachep->num * sizeof(kmem_bufctl_t) + sizeof(struct slab);

  146. #ifdef CONFIG_PAGE_POISONING
  147.         /* If we're going to use the generic kernel_map_pages()
  148.          * poisoning, then it's going to smash the contents of
  149.          * the redzone and userword anyhow, so switch them off.
  150.          */
  151.         if (size % PAGE_SIZE == 0 && flags & SLAB_POISON)
  152.             flags &= ~(SLAB_RED_ZONE | SLAB_STORE_USER);
  153. #endif
  154.     }

  155.     cachep->colour_off = cache_line_size(); //32B
  156.     /* Offset must be a multiple of the alignment. */
  157.     if (cachep->colour_off < cachep->align)
  158.         cachep->colour_off = cachep->align;
  159.     cachep->colour = left_over / cachep->colour_off;      // slab 着色的初始化工作.
  160.     cachep->slab_size = slab_size;
  161.     cachep->flags = flags;
  162.     cachep->allocflags = 0;
  163.     if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
  164.         cachep->allocflags |= GFP_DMA;
  165.     cachep->size = size;
  166.     cachep->reciprocal_buffer_size = reciprocal_value(size);

  167.     if (flags & CFLGS_OFF_SLAB) {
  168.         cachep->slabp_cache = kmem_find_general_cachep(slab_size, 0u);
  169.         /*
  170.          * This is a possibility for one of the malloc_sizes caches.
  171.          * But since we go off slab only for object size greater than
  172.          * PAGE_SIZE/8, and malloc_sizes gets created in ascending order,
  173.          * this should not happen at all.
  174.          * But leave a BUG_ON for some lucky dude.
  175.          */
  176.         BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));
  177.     }

  178.     err = setup_cpu_cache(cachep, gfp);
  179.     if (err) {
  180.         __kmem_cache_shutdown(cachep);
  181.         return err;
  182.     }

  183.     if (flags & SLAB_DEBUG_OBJECTS) {
  184.         /*
  185.          * Would deadlock through slab_destroy()->call_rcu()->
  186.          * debug_object_activate()->kmem_cache_alloc().
  187.          */
  188.         WARN_ON_ONCE(flags & SLAB_DESTROY_BY_RCU);

  189.         slab_set_debugobj_lock_classes(cachep);
  190.     } else if (!OFF_SLAB(cachep) && !(flags & SLAB_DESTROY_BY_RCU))
  191.         on_slab_lock_classes(cachep);

  192.     return 0;
  193. }
它里面有个很有趣的函数很关键的一个函数:它泄露了slab具体管理obj的布局和方法.

点击(此处)折叠或打开

  1. /**
  2.  * calculate_slab_order - calculate size (page order) of slabs
  3.  * @cachep: pointer to the cache that is being created
  4.  * @size: size of objects to be created in this cache.
  5.  * @align: required alignment for the objects.
  6.  * @flags: slab allocation flags
  7.  *
  8.  * Also calculates the number of objects per slab.
  9.  *
  10.  * This could be made much more intelligent. For now, try to avoid using
  11.  * high order pages for slabs. When the gfp() functions are more friendly
  12.  * towards high-order requests, this should be changed.
  13.  */
  14. static size_t calculate_slab_order(struct kmem_cache *cachep,
  15.             size_t size, size_t align, unsigned long flags)
  16. {
  17.     unsigned long offslab_limit;
  18.     size_t left_over = 0;
  19.     int gfporder;

  20.     for (gfporder = 0; gfporder <= KMALLOC_MAX_ORDER; gfporder++) {
  21.         unsigned int num;
  22.         size_t remainder;

  23.         cache_estimate(gfporder, size, align, flags, &remainder, &num); //  根据是off-slab 还是on-slab除去管理信息后多少个页面才能存下一个obj.以及其他信息,值得仔细看看.
  24.         if (!num)   // 必须保证slab至少能装下一个obj 
  25.             continue;

  26.         if (flags & CFLGS_OFF_SLAB) {
  27.             /*
  28.              * Max number of objs-per-slab for caches which
  29.              * use off-slab slabs. Needed to avoid a possible
  30.              * looping condition in cache_grow().
  31.              */
  32.             offslab_limit = size - sizeof(struct slab);
  33.             offslab_limit /= sizeof(kmem_bufctl_t);

  34.              if (num > offslab_limit)
  35.                 break;
  36.         }

  37.         /* Found something acceptable - save it away */
  38.         cachep->num = num;
  39.         cachep->gfporder = gfporder;
  40.         left_over = remainder;

  41.         /*
  42.          * A VFS-reclaimable slab tends to have most allocations
  43.          * as GFP_NOFS and we really don't want to have to be allocating
  44.          * higher-order pages when we are unable to shrink dcache.
  45.          */
  46.         if (flags & SLAB_RECLAIM_ACCOUNT)
  47.             break;

  48.         /*
  49.          * Large number of objects is good, but very large slabs are
  50.          * currently bad for the gfp()s.
  51.          */
  52.         if (gfporder >= slab_max_order)
  53.             break;

  54.         /*
  55.          * Acceptable internal fragmentation?
  56.          */
  57.         if (left_over * 8 <= (PAGE_SIZE << gfporder))
  58.             break;
  59.     }
  60.     return left_over;
  61. }
经过上面的初始化和设置,最后调用setup_cpu_cache就完成了一个创建cache的工作.接着进行第2、3步的工作:

点击(此处)折叠或打开

  1. /* 2+3) create the kmalloc caches */
  2.     sizes = malloc_sizes;
  3.     names = cache_names;

  4.     /*
  5.      * Initialize the caches that provide memory for the array cache and the
  6.      * kmem_list3 structures first. Without this, further allocations will
  7.      * bug.
  8.      */

  9.     sizes[INDEX_AC].cs_cachep = create_kmalloc_cache(names[INDEX_AC].name,   //  create obj size  为sizeof(struct arraycache_init) 的cache
  10.                     sizes[INDEX_AC].cs_size, ARCH_KMALLOC_FLAGS);

  11.     if (INDEX_AC != INDEX_L3)
  12.         sizes[INDEX_L3].cs_cachep =
  13.             create_kmalloc_cache(names[INDEX_L3].name,                         ////  create obj size  为sizeof(struct kmem_list3) 的cache
  14.                 sizes[INDEX_L3].cs_size, ARCH_KMALLOC_FLAGS);

  15.     slab_early_init = 0;

  16.     while (sizes->cs_size != ULONG_MAX) {         //创建通用cache 根据 malloc_sizes ,cache_names
  17.         /*
  18.          * For performance, all the general caches are L1 aligned.
  19.          * This should be particularly beneficial on SMP boxes, as it
  20.          * eliminates "false sharing".
  21.          * Note for systems short on memory removing the alignment will
  22.          * allow tighter packing of the smaller caches.
  23.          */
  24.         if (!sizes->cs_cachep)
  25.             sizes->cs_cachep = create_kmalloc_cache(names->name,
  26.                     sizes->cs_size, ARCH_KMALLOC_FLAGS);

  27. #ifdef CONFIG_ZONE_DMA
  28.         sizes->cs_dmacachep = create_kmalloc_cache(
  29.             names->name_dma, sizes->cs_size,
  30.             SLAB_CACHE_DMA|ARCH_KMALLOC_FLAGS);
  31. #endif
  32.         sizes++;
  33.         names++;
  34.     }
这里在说一下cache_names和malloc_sizes:

点击(此处)折叠或打开

  1. /*
  2.  * These are the default caches for kmalloc. Custom caches can have other sizes.
  3.  */
  4. struct cache_sizes malloc_sizes[] = {
  5. #define CACHE(x) { .cs_size = (x) },
  6. #include <linux/kmalloc_sizes.h>
  7.     CACHE(ULONG_MAX)
  8. #undef CACHE
  9. };
这里就不扩展开了.

点击(此处)折叠或打开

  1. /* Must match cache_sizes above. Out of line to keep cache footprint low. */
  2. struct cache_names {
  3.     char *name;
  4.     char *name_dma;
  5. };

  6. static struct cache_names __initdata cache_names[] = {
  7. #define CACHE(x) { .name = "size-" #x, .name_dma = "size-" #x "(DMA)" },
  8. #include <linux/kmalloc_sizes.h>
  9.     {NULL,}
  10. #undef CACHE
  11. };
create_kmalloc_cache实际上是调用create_boot_cache. 把kernel预定义的通用cache创建一遍.之后我们进入第四步、第5步:

点击(此处)折叠或打开

  1. /* 4) Replace the bootstrap head arrays */
  2.     {
  3.         struct array_cache *ptr;

  4.         ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

  5.         memcpy(ptr, cpu_cache_get(kmem_cache),
  6.          sizeof(struct arraycache_init));
  7.         /*
  8.          * Do not assume that spinlocks can be initialized via memcpy:
  9.          */
  10.         spin_lock_init(&ptr->lock);

  11.         kmem_cache->array[smp_processor_id()] = ptr;

  12.         ptr = kmalloc(sizeof(struct arraycache_init), GFP_NOWAIT);

  13.         BUG_ON(cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep)
  14.          != &initarray_generic.cache);
  15.         memcpy(ptr, cpu_cache_get(malloc_sizes[INDEX_AC].cs_cachep),
  16.          sizeof(struct arraycache_init));
  17.         /*
  18.          * Do not assume that spinlocks can be initialized via memcpy:
  19.          */
  20.         spin_lock_init(&ptr->lock);

  21.         malloc_sizes[INDEX_AC].cs_cachep->array[smp_processor_id()] =
  22.          ptr;
  23.     }

点击(此处)折叠或打开

  1. /* 5) Replace the bootstrap kmem_list3's */
  2.     {
  3.         int nid;

  4.         for_each_online_node(nid) {
  5.             init_list(kmem_cache, &initkmem_list3[CACHE_CACHE + nid], nid);

  6.             init_list(malloc_sizes[INDEX_AC].cs_cachep,
  7.                  &initkmem_list3[SIZE_AC + nid], nid);

  8.             if (INDEX_AC != INDEX_L3) {
  9.                 init_list(malloc_sizes[INDEX_L3].cs_cachep,
  10.                      &initkmem_list3[SIZE_L3 + nid], nid);
  11.             }
  12.         }
  13.     }

  14.     slab_state = UP;
最后把slab_state状态设置为up 即已经可以正常使用了。虽然上面大部分是代码,具体申请内存的流程前面kmalloc已经讲过了。仅仅是为了弄明白cache到底是个什么玩意,以及如何初始化的。
在kmem_cache_init后,还有一个kmem_cache_init_late函数.
它主要是调用了enable_cpucache和注册一个cpu通知连

点击(此处)折叠或打开

  1. /*
  2.      * Register a cpu startup notifier callback that initializes
  3.      * cpu_cache_get for all new cpus
  4.      */
  5.     register_cpu_notifier(&cpucache_notifier);
还记不记得之前我们分析batchcount的时候的矛盾点? 

点击(此处)折叠或打开

  1. /* Called with slab_mutex held always */
  2. static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp)
  3. {
  4.     int err;
  5.     int limit = 0;
  6.     int shared = 0;
  7.     int batchcount = 0;

  8.     if (!is_root_cache(cachep)) {
  9.         struct kmem_cache *root = memcg_root_cache(cachep);
  10.         limit = root->limit;
  11.         shared = root->shared;
  12.         batchcount = root->batchcount;
  13.     }

  14.     if (limit && shared && batchcount)
  15.         goto skip_setup;
  16.     /*
  17.      * The head array serves three purposes:
  18.      * - create a LIFO ordering, i.e. return objects that are cache-warm
  19.      * - reduce the number of spinlock operations.
  20.      * - reduce the number of linked list operations on the slab and
  21.      * bufctl chains: array operations are cheaper.
  22.      * The numbers are guessed, we should auto-tune as described by
  23.      * Bonwick.
  24.      */
  25.     if (cachep->size > 131072)
  26.         limit = 1;
  27.     else if (cachep->size > PAGE_SIZE)
  28.         limit = 8;
  29.     else if (cachep->size > 1024)
  30.         limit = 24;
  31.     else if (cachep->size > 256)
  32.         limit = 54;
  33.     else
  34.         limit = 120;

  35.     /*
  36.      * CPU bound tasks (e.g. network routing) can exhibit cpu bound
  37.      * allocation behaviour: Most allocs on one cpu, most free operations
  38.      * on another cpu. For these cases, an efficient object passing between
  39.      * cpus is necessary. This is provided by a shared array. The array
  40.      * replaces Bonwick's magazine layer.
  41.      * On uniprocessor, it's functionally equivalent (but less efficient)
  42.      * to a larger limit. Thus disabled by default.
  43.      */
  44.     shared = 0;
  45.     if (cachep->size <= PAGE_SIZE && num_possible_cpus() > 1)
  46.         shared = 8;

  47. #if DEBUG
  48.     /*
  49.      * With debugging enabled, large batchcount lead to excessively long
  50.      * periods with disabled local interrupts. Limit the batchcount
  51.      */
  52.     if (limit > 32)
  53.         limit = 32;
  54. #endif
  55.     batchcount = (limit + 1) / 2;
  56. skip_setup:
  57.     err = do_tune_cpucache(cachep, limit, batchcount, shared, gfp);
  58.     if (err)
  59.         printk(KERN_ERR "enable_cpucache failed for %s, error %d.\n",
  60.          cachep->name, -err);
  61.     return err;
  62. }
它会根据obj size 计算limit值 ,再去计算batchcount的值.

这个只是一个小小的开始吧,内存管理本来就博大精深,只有遇到具体问题具体分析,来加深理解了.























阅读(5857) | 评论(0) | 转发(3) |
给主人留下些什么吧!~~