Chinaunix首页 | 论坛 | 博客
  • 博客访问: 294415
  • 博文数量: 44
  • 博客积分: 10
  • 博客等级: 民兵
  • 技术积分: 1354
  • 用 户 组: 普通用户
  • 注册时间: 2012-04-08 15:38
个人简介

人生像是在跑马拉松,能够完赛的都是不断地坚持向前迈进;人生就是像在跑马拉松,不断调整步伐,把握好分分秒秒;人生还是像在跑马拉松,能力决定了能跑短程、半程还是全程。人生其实就是一场马拉松,坚持不懈,珍惜时间。

文章分类

分类: LINUX

2015-12-03 01:45:24

前面已经分析了slub算法的初始化、缓存区的创建、对象的分配、对象的回收,最后分析一下slub分配算法的slab销毁具体实现。

Slab销毁的入口函数为kmem_cache_destroy(),其实现:

  1. 【file:/mm/slab_common.c】
  2. void kmem_cache_destroy(struct kmem_cache *s)
  3. {
  4.     /* Destroy all the children caches if we aren't a memcg cache */
  5.     kmem_cache_destroy_memcg_children(s);
  6.  
  7.     get_online_cpus();
  8.     mutex_lock(&slab_mutex);
  9.     s->refcount--;
  10.     if (!s->refcount) {
  11.         list_del(&s->list);
  12.  
  13.         if (!__kmem_cache_shutdown(s)) {
  14.             memcg_unregister_cache(s);
  15.             mutex_unlock(&slab_mutex);
  16.             if (s->flags & SLAB_DESTROY_BY_RCU)
  17.                 rcu_barrier();
  18.  
  19.             memcg_free_cache_params(s);
  20.             kfree(s->name);
  21.             kmem_cache_free(kmem_cache, s);
  22.         } else {
  23.             list_add(&s->list, &slab_caches);
  24.             mutex_unlock(&slab_mutex);
  25.             printk(KERN_ERR "kmem_cache_destroy %s: Slab cache still has objects\n",
  26.                 s->name);
  27.             dump_stack();
  28.         }
  29.     } else {
  30.         mutex_unlock(&slab_mutex);
  31.     }
  32.     put_online_cpus();
  33. }

该函数中kmem_cache_destroy_memcg_children()删除memcg中相关联的子cache数据,而get_online_cpus()是对cpu_online_map的加锁,其与末尾的put_online_cpus()是配对使用的。接着的mutex_lock()用于获取slab_mutex互斥锁,该锁主要用于全局资源保护。然后对kmem_cache的引用计数refcount自减操作,如果自减后if (!s->refcount)true,即引用计数为0,表示该缓冲区不存在slab别名挂靠的情况,那么其kmem_cache结构可以删除,否则表示有其他缓冲区别名挂靠,仍有依赖,那么将会解锁slab_mutexput_online_cpus()释放cpu_online_map锁,然后退出。

if (!s->refcount)为true的分支中,先list_del()将该slab管理结构kmem_cacheslab_caches全局链表中摘除,然后__kmem_cache_shutdown()删除kmem_cache结构信息。如果__kmem_cache_shutdown()执行成功则将返回0,继而if (!__kmem_cache_shutdown(s))true,将会通过memcg_unregister_cache()去注册memcgcache,并且memcg_free_cache_params()释放创建时申请的memcg_params资源空间,而kfree()kmem_cache_free()释放slub的名称空间以及slab空间。如果__kmem_cache_shutdown()执行失败,那么将会把slab重新挂回至slab_caches链表,同时记录日志信息。

由此slab销毁完毕。

kmem_cache_destroy()的核心函数是__kmem_cache_shutdown(),深入分析__kmem_cache_shutdown()的实现:

  1. 【file:/mm/slub.c】
  2. int __kmem_cache_shutdown(struct kmem_cache *s)
  3. {
  4.     int rc = kmem_cache_close(s);
  5.  
  6.     if (!rc) {
  7.         /*
  8.          * We do the same lock strategy around sysfs_slab_add, see
  9.          * __kmem_cache_create. Because this is pretty much the last
  10.          * operation we do and the lock will be released shortly after
  11.          * that in slab_common.c, we could just move sysfs_slab_remove
  12.          * to a later point in common code. We should do that when we
  13.          * have a common sysfs framework for all allocators.
  14.          */
  15.         mutex_unlock(&slab_mutex);
  16.         sysfs_slab_remove(s);
  17.         mutex_lock(&slab_mutex);
  18.     }
  19.  
  20.     return rc;
  21. }

该函数主要通过kmem_cache_close()删除slab的管理数据kmem_cache,如果执行成功,继而进入if分支对sysfs模块的slab做移除操作。

具体看一下kmem_cache_close()的实现:

  1. 【file:/mm/slub.c】
  2. /*
  3.  * Release all resources used by a slab cache.
  4.  */
  5. static inline int kmem_cache_close(struct kmem_cache *s)
  6. {
  7.     int node;
  8.  
  9.     flush_all(s);
  10.     /* Attempt to free all objects */
  11.     for_each_node_state(node, N_NORMAL_MEMORY) {
  12.         struct kmem_cache_node *n = get_node(s, node);
  13.  
  14.         free_partial(s, n);
  15.         if (n->nr_partial || slabs_node(s, node))
  16.             return 1;
  17.     }
  18.     free_percpu(s->cpu_slab);
  19.     free_kmem_cache_nodes(s);
  20.     return 0;
  21. }

该函数通过flush_all()释放本地CPU的缓存区,即kmem_cache_cpu管理的缓存区空间;然后通过for_each_node_state()遍历各节点,转而get_node()获取节点下的kmem_cache_node管理结构,然后将其半满队列中的缓存区进行释放free_partial();最后将kmem_cache的每CPU缓存管理kmem_cache_cpu通过free_percpu()归还给系统,同时通过free_kmem_cache_nodes()释放各内存节点node的缓存管理结构kmem_cache_node占用的空间释放。

最后分析一下较为复杂的flush_all()的实现:

  1. 【file:/mm/slub.c】
  2. static void flush_all(struct kmem_cache *s)
  3. {
  4.     on_each_cpu_cond(has_cpu_slab, flush_cpu_slab, s, 1, GFP_ATOMIC);
  5. }

看似封装了on_each_cpu_cond()函数,实际上on_each_cpu_cond()并不执行任何与资源释放的操作,其主要是遍历各个CPU,然后执行作为入参传入的函数has_cpu_slab(),以判断各个处理器上的资源是否存在,如果存在,继而将会通过flush_cpu_slab()对该处理器上的资源进行释放处理。

照例,还是详细看一下on_each_cpu_cond()函数实现:

  1. 【file:/mm/slub.c】
  2. /*
  3.  * on_each_cpu_cond(): Call a function on each processor for which
  4.  * the supplied function cond_func returns true, optionally waiting
  5.  * for all the required CPUs to finish. This may include the local
  6.  * processor.
  7.  * @cond_func: A callback function that is passed a cpu id and
  8.  * the the info parameter. The function is called
  9.  * with preemption disabled. The function should
  10.  * return a blooean value indicating whether to IPI
  11.  * the specified CPU.
  12.  * @func: The function to run on all applicable CPUs.
  13.  * This must be fast and non-blocking.
  14.  * @info: An arbitrary pointer to pass to both functions.
  15.  * @wait: If true, wait (atomically) until function has
  16.  * completed on other CPUs.
  17.  * @gfp_flags: GFP flags to use when allocating the cpumask
  18.  * used internally by the function.
  19.  *
  20.  * The function might sleep if the GFP flags indicates a non
  21.  * atomic allocation is allowed.
  22.  *
  23.  * Preemption is disabled to protect against CPUs going offline but not online.
  24.  * CPUs going online during the call will not be seen or sent an IPI.
  25.  *
  26.  * You must not call this function with disabled interrupts or
  27.  * from a hardware interrupt handler or from a bottom half handler.
  28.  */
  29. void on_each_cpu_cond(bool (*cond_func)(int cpu, void *info),
  30.             smp_call_func_t func, void *info, bool wait,
  31.             gfp_t gfp_flags)
  32. {
  33.     cpumask_var_t cpus;
  34.     int cpu, ret;
  35.  
  36.     might_sleep_if(gfp_flags & __GFP_WAIT);
  37.  
  38.     if (likely(zalloc_cpumask_var(&cpus, (gfp_flags|__GFP_NOWARN)))) {
  39.         preempt_disable();
  40.         for_each_online_cpu(cpu)
  41.             if (cond_func(cpu, info))
  42.                 cpumask_set_cpu(cpu, cpus);
  43.         on_each_cpu_mask(cpus, func, info, wait);
  44.         preempt_enable();
  45.         free_cpumask_var(cpus);
  46.     } else {
  47.         /*
  48.          * No free cpumask, bother. No matter, we'll
  49.          * just have to IPI them one by one.
  50.          */
  51.         preempt_disable();
  52.         for_each_online_cpu(cpu)
  53.             if (cond_func(cpu, info)) {
  54.                 ret = smp_call_function_single(cpu, func,
  55.                                 info, wait);
  56.                 WARN_ON_ONCE(!ret);
  57.             }
  58.         preempt_enable();
  59.     }
  60. }

该函数的入参cond_func是一个钩子函数,用于根据调用者传入的CPU信息参数来判断是否需要打断该CPU以执行入参func的操作;而入参info是作为cond_funcfunc处理函数的入参;至于入参wait则是一个bool类型,用以判断是否需要等待func在各CPU上执行完毕,如果为true将会等待;最后的gfp_flags入参是作为申请cpumask空间的标识。

了解完参数的意思,那么具体看一下其实现,首先might_sleep_if()判断是否需要休眠等待,继而通过zalloc_cpumask_var()申请cpumask的空间;申请到空间后,preempt_disable()禁止内核抢占后,将for_each_online_cpu()遍历各个CPU,根据cond_func()(即has_cpu_slab())判断是否需要对该CPU进行打断处理,如果需要则cpumask_set_cpu()对该CPU进行标志;标志完后,根据前面的标志,通过on_each_cpu_mask()打断各个标志位对应的CPU去执行func()的操作(即flush_cpu_slab());完了将会恢复抢占,释放cpumask空间。至于zalloc_cpumask_var()申请不到空间,将会逐个处理器进行打断再进行处理,其最终功能和作用与申请到空间的情况都是一致的,具体实现就不分析了。

相应看一下作为on_each_cpu_cond()入参的钩子函数has_cpu_slab()的实现:

  1. 【file:/mm/slub.c】
  2. static bool has_cpu_slab(int cpu, void *info)
  3. {
  4.     struct kmem_cache *s = info;
  5.     struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
  6.  
  7.     return c->page || c->partial;
  8. }

可以看到该函数主要是用于判断本地CPU是否占有缓存区,如果有则返回true。也即意味着该CPU需要被打断去执行其本地的缓存区释放操作。

至于on_each_cpu_cond()另一钩子函数flush_cpu_slab()的实现:

  1. 【file:/mm/slub.c】
  2. static void flush_cpu_slab(void *d)
  3. {
  4.     struct kmem_cache *s = d;
  5.  
  6.     __flush_cpu_slab(s, smp_processor_id());
  7. }

该函数封装了__flush_cpu_slab(),实现为:

  1. 【file:/mm/slub.c】
  2. /*
  3.  * Flush cpu slab.
  4.  *
  5.  * Called from IPI handler with interrupts disabled.
  6.  */
  7. static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu)
  8. {
  9.     struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);
  10.  
  11.     if (likely(c)) {
  12.         if (c->page)
  13.             flush_slab(s, c);
  14.  
  15.         unfreeze_partials(s, c);
  16.     }
  17. }

函数实现很简单,主要用于将本地CPU的缓存区进行释放。其首先获取本地CPUkmem_cache_cpu管理结构,如果本地CPU存在缓存区的占用,将会通过flush_slab()去释放本地缓存区,继而通过unfreeze_partials()将本地CPU半满缓存列表进行释放。

flush_slab()具体实现:

  1. 【file:/mm/slub.c】
  2. static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
  3. {
  4.     stat(s, CPUSLAB_FLUSH);
  5.     deactivate_slab(s, c->page, c->freelist);
  6.  
  7.     c->tid = next_tid(c->tid);
  8.     c->page = NULL;
  9.     c->freelist = NULL;
  10. }

其主要是通过deactivate_slab()去激活本地缓存区,也即是将缓存区进行释放操作。具体deactivate_slab()的实现:

  1. 【file:/mm/slub.c】
  2. /*
  3.  * Remove the cpu slab
  4.  */
  5. static void deactivate_slab(struct kmem_cache *s, struct page *page,
  6.                 void *freelist)
  7. {
  8.     enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
  9.     struct kmem_cache_node *n = get_node(s, page_to_nid(page));
  10.     int lock = 0;
  11.     enum slab_modes l = M_NONE, m = M_NONE;
  12.     void *nextfree;
  13.     int tail = DEACTIVATE_TO_HEAD;
  14.     struct page new;
  15.     struct page old;
  16.  
  17.     if (page->freelist) {
  18.         stat(s, DEACTIVATE_REMOTE_FREES);
  19.         tail = DEACTIVATE_TO_TAIL;
  20.     }
  21.  
  22.     /*
  23.      * Stage one: Free all available per cpu objects back
  24.      * to the page freelist while it is still frozen. Leave the
  25.      * last one.
  26.      *
  27.      * There is no need to take the list->lock because the page
  28.      * is still frozen.
  29.      */
  30.     while (freelist && (nextfree = get_freepointer(s, freelist))) {
  31.         void *prior;
  32.         unsigned long counters;
  33.  
  34.         do {
  35.             prior = page->freelist;
  36.             counters = page->counters;
  37.             set_freepointer(s, freelist, prior);
  38.             new.counters = counters;
  39.             new.inuse--;
  40.             VM_BUG_ON(!new.frozen);
  41.  
  42.         } while (!__cmpxchg_double_slab(s, page,
  43.             prior, counters,
  44.             freelist, new.counters,
  45.             "drain percpu freelist"));
  46.  
  47.         freelist = nextfree;
  48.     }
  49.  
  50.     /*
  51.      * Stage two: Ensure that the page is unfrozen while the
  52.      * list presence reflects the actual number of objects
  53.      * during unfreeze.
  54.      *
  55.      * We setup the list membership and then perform a cmpxchg
  56.      * with the count. If there is a mismatch then the page
  57.      * is not unfrozen but the page is on the wrong list.
  58.      *
  59.      * Then we restart the process which may have to remove
  60.      * the page from the list that we just put it on again
  61.      * because the number of objects in the slab may have
  62.      * changed.
  63.      */
  64. redo:
  65.  
  66.     old.freelist = page->freelist;
  67.     old.counters = page->counters;
  68.     VM_BUG_ON(!old.frozen);
  69.  
  70.     /* Determine target state of the slab */
  71.     new.counters = old.counters;
  72.     if (freelist) {
  73.         new.inuse--;
  74.         set_freepointer(s, freelist, old.freelist);
  75.         new.freelist = freelist;
  76.     } else
  77.         new.freelist = old.freelist;
  78.  
  79.     new.frozen = 0;
  80.  
  81.     if (!new.inuse && n->nr_partial > s->min_partial)
  82.         m = M_FREE;
  83.     else if (new.freelist) {
  84.         m = M_PARTIAL;
  85.         if (!lock) {
  86.             lock = 1;
  87.             /*
  88.              * Taking the spinlock removes the possiblity
  89.              * that acquire_slab() will see a slab page that
  90.              * is frozen
  91.              */
  92.             spin_lock(&n->list_lock);
  93.         }
  94.     } else {
  95.         m = M_FULL;
  96.         if (kmem_cache_debug(s) && !lock) {
  97.             lock = 1;
  98.             /*
  99.              * This also ensures that the scanning of full
  100.              * slabs from diagnostic functions will not see
  101.              * any frozen slabs.
  102.              */
  103.             spin_lock(&n->list_lock);
  104.         }
  105.     }
  106.  
  107.     if (l != m) {
  108.  
  109.         if (l == M_PARTIAL)
  110.  
  111.             remove_partial(n, page);
  112.  
  113.         else if (l == M_FULL)
  114.  
  115.             remove_full(s, n, page);
  116.  
  117.         if (m == M_PARTIAL) {
  118.  
  119.             add_partial(n, page, tail);
  120.             stat(s, tail);
  121.  
  122.         } else if (m == M_FULL) {
  123.  
  124.             stat(s, DEACTIVATE_FULL);
  125.             add_full(s, n, page);
  126.  
  127.         }
  128.     }
  129.  
  130.     l = m;
  131.     if (!__cmpxchg_double_slab(s, page,
  132.                 old.freelist, old.counters,
  133.                 new.freelist, new.counters,
  134.                 "unfreezing slab"))
  135.         goto redo;
  136.  
  137.     if (lock)
  138.         spin_unlock(&n->list_lock);
  139.  
  140.     if (m == M_FREE) {
  141.         stat(s, DEACTIVATE_EMPTY);
  142.         discard_slab(s, page);
  143.         stat(s, FREE_SLAB);
  144.     }
  145. }

if (page->freelist)判断slab的空闲链表freelist是否为空,如果为空,意味着该缓存区的对象已经全部分配到了CPUkmem_cache_cpufreelist链表中;如果不为空,那么表示该CPUslab对象被其他CPU释放了,将会更新统计同时设置tail标识为DEACTIVATE_TO_TAIL

接下来的while循环是去激活本地CPUslab步骤一,其主要是通过while循环遍历CPU上的freelist链表get_freepointer()获取空闲对象,继而通过内部的do-while循环,借用__cmpxchg_double_slab()比较交换将对象以插入缓存区页面的freelist空闲链表头的方式归还回去。__cmpxchg_double_slab()前面已经介绍过了的原子操作,这里将不再赘述。不过有个点值得注意的是该步骤的释放操作,其并未将所有的对象都归还回去,这是由于nextfree = get_freepointer(s, freelist)该步骤取下一个空闲对象时得到空指针,那么将会退出while循环;也就意味着如果deactivate_slab()入参中freelist不为空,那么while循环退出时,其也必定不为空,其具体用意稍后再分析。简而言之该步骤其目的是,当页面还处于冻结状态,将会释放每CPU的所有可用的对象回到缓冲区的空闲列表中。

然后是步骤二,即redo标签以下的动作,其首先将缓存区的freelist以及counters信息存到临时old结构中以备后用,接着if (freelist)如果为true,将会把前面步骤一未被归还的那个对象归还到缓冲区中,同时更新new信息,此时new.freelist持有该缓存区的所有空闲对象。往下new.frozen = 0将临时缓存区状态设置为非冻结;然后if (!new.inuse && n->nr_partial > s->min_partial) 表示该slab缓存区中无对象被使用,且部分满slab个数大于最小值,意味着该缓存区需要被销毁,标识mM_FREE;而else if (new.freelist)表示freelist不为空,仅使用了部分对象,则标识mM_PARTIAL;至于最后的else分支,表示freelist为空,该缓存区所有对象均已被使用,m标识为M_FULL。再往下if (l != m)的比较是用于判断上一次的缓存区状态l与接下来的操作状态m是否一致,不一致则意味着需要发生变更,其将会先判断l的状态为M_PARTIALM_FULL,继而采取对应的remove_partial()remove_full()链表摘除操作;继而根据m的状态,往半满链表中添加add_partial()还是往满载链表中添加add_full(),接着将l的状态更新为m。现在到了if (!__cmpxchg_double_slab()),这里是用于判断自redo到此,缓存区是否发生过对象操作变更,如果没发生过的话,将会把new暂存的空闲对象挂载到缓存区中以及更新counters,否则将会跳转回redo标签重新执行前面的操作。至此,顺利的话,缓存区已经去激活完毕了。

最后如果m的状态为M_FREE,则表示该缓存区不需要再使用了,将通过discard_slab()将其销毁。

至此,slub算法分析完毕。

阅读(4154) | 评论(0) | 转发(1) |
给主人留下些什么吧!~~