memcached refcount 个人见解-djjsindy-ChinaUnix博客

djjsindydjjsindy.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

djjsindy

博客访问： 721396
博文数量： 31
博客积分： 330
博客等级：一等列兵
技术积分： 3004
用户组：普通用户
注册时间： 2012-09-05 22:38

个人简介

java开发工程师，专注于内核源码，算法，数据结构。 qq：630501400

文章分类

全部博文（31）

文章存档

2014年（2）

2013年（22）

2012年（7）

我的朋友

相关博文

memcached refcount 个人见解

分类： C/C++

2012-11-15 13:27:38

在多线程环境中，前面memcached item锁粒度中可以看出在thread.c中那些item_get，item_link等，对item的一些基本操作都会根据锁的类型（分段锁或者是全局锁），严格的保证在多线程环境中的原子性，但是memcached中通过libevent得到的客户端请求，然后处理命令，函数是process_command，可以看到这个函数是不加锁的，也就是说通过原子操作调用了item_get操作后得到了item，对item后会对他进行一系列操作都是不加锁的，这样如果两个线程在先后都得到了同一个item，一个是删除item，一个是读取item，删除的item的操作在时间点上先到达，那么直接释放item的空间给slab会产生问题。

个人认为通过item的refcount变量控制item，refcount主要是保证在多线程环境中item的内存空间，不被其他操作同一item的线程释放空间，我认为这个变量的值含义是：表示有多个线程的多少个地方引用着这块item空间，可想而知，在函数中取得了item，表示了有一个线程的函数引用了这块item内存区域。在函数退出后，lru和hash表中会同时引用这块空间，例如有3个线程同时引用这个item，1个从lru中取得这个item，另外两个从hash结构中取出item，那么这个refcount应该是3.在函数调用后应该是释放占用的这个refcount引用，通过调用item_remove函数，使得refcount减1.通过前面描述我们可以知道，正常情况下如果一个item被插入到了hash，lru中后，函数退出后，refcount=1，表示hash结构和lru队列依然对这块item内存区域保持引用，使得其他线程不能释放这块item内存。

对item的refcount操作是原子操作，代码：thread.c

85 unsigned short refcount_incr(unsigned short *refcount) {
86 #ifdef HAVE_GCC_ATOMICS //GCC内建原子操作
87 return __sync_add_and_fetch(refcount, 1);
88 #elif defined(__sun)
89 return atomic_inc_ushort_nv(refcount);
90 #else
91 unsigned short res;
92 mutex_lock(&atomics_mutex); //通过信号量实现原子性
93 (*refcount)++; //refcount加1
94 res = *refcount;
95 mutex_unlock(&atomics_mutex);
96 return res;
97 #endif
98 }

这样在多线程环境下，refcount_incr和refcount_decr都是原子性操作。后面看下memcached是如何利用refcount限制行为的。下面的代码会把item的refcount 变为1，不管是从slab中新分配的item或者是从lru中拿出来的item。

代码：items.c

187 if (!tried_alloc && (tries == 0 || search == NULL)) //没试过分配并且lru中没有取到item
188 it = slabs_alloc(ntotal, id); //从slab中重新分配
189
190 if (it == NULL) {
191 itemstats[id].outofmemory++;
192 mutex_unlock(&cache_lock);
193 return NULL;
194 }
195
196 assert(it->slabs_clsid == 0);
197 assert(it != heads[id]);
198
199 /* Item initialization can happen outside of the lock; the item's already
200 * been removed from the slab LRU.
201 */
202 it->refcount = 1; /* the caller will have a reference */
//统一refcount都为1，表示当前引用到了这块item内存

在插入lru队列和插入hash结构后，item的内存在这两种数据结构中都有引用，应该加引用加1

305 ITEM_set_cas(it, (settings.use_cas) ? get_cas_id() : 0);
306 assoc_insert(it, hv);
307 item_link_q(it);
308 refcount_incr(&it->refcount); //refcount引用加1
309 mutex_unlock(&cache_lock);

item_remove函数可以认为是在函数的最后，还原函数引用的item内存空间（因为在之前函数引用item时已经加了1，使用完之后需要减1）

345 void do_item_remove(item *it) {
346 MEMCACHED_ITEM_REMOVE(ITEM_key(it), it->nkey, it->nbytes);
347 assert((it->it_flags & ITEM_SLABBED) == 0);
348
349 if (refcount_decr(&it->refcount) == 0) { //如果只有一个引用item，就释放这块内存空间给slab
350 item_free(it);
351 }
352 }

在每次从hash结构中取得了item，取得了它的引用，这是需要对这个item的引用次数加1

521 item *do_item_get(const char *key, const size_t nkey, const uint32_t hv) {
522 //mutex_lock(&cache_lock);
523 item *it = assoc_find(key, nkey, hv);
524 if (it != NULL) {
525 refcount_incr(&it->refcount); //refcount加1

在slab_move中，对于slab_move需要了解下的可以看前面的blog，有对refcount的操作

527 refcount = refcount_incr(&it->refcount); //当前函数引用了item，所以refcount+1
528 if (refcount == 1) { /* item is unlinked, unused */
//等于1，证明加1之前是0，那么item就是未分配。
529 if (it->it_flags & ITEM_SLABBED) { //证明了这个已经在slab中了
530 /* remove from slab freelist */
531 if (s_cls->slots == it) {
532 s_cls->slots = it->next;
533 }
534 if (it->next) it->next->prev = it->prev;
535 if (it->prev) it->prev->next = it->next;
536 s_cls->sl_curr--;
537 status = MOVE_DONE;
538 } else {
539 status = MOVE_BUSY; //这个item 未初始化完成
540 }
541 } else if (refcount == 2) { /* item is linked but not busy */
//说明这个item 在lru和hash结构中
542 if ((it->it_flags & ITEM_LINKED) != 0) { //如果这个item还在这个lru，hash结构中
543 do_item_unlink_nolock(it, hash(ITEM_key(it), it->nkey, 0));
544 status = MOVE_DONE;
545 } else {
//这个说明item的标志已经改变了，但是还未从lru和hash中删除item
546 /* refcount == 1 + !ITEM_LINKED means the item is being
547 * uploaded to, or was just unlinked but hasn't been freed
548 * yet. Let it bleed off on its own and try again later */
549 status = MOVE_BUSY;

上面的这种情况545行这个分支如果在下面代码319-324行就会出现这种情况

314 void do_item_unlink(item *it, const uint32_t hv) {
315 MEMCACHED_ITEM_UNLINK(ITEM_key(it), it->nkey, it->nbytes);
316 mutex_lock(&cache_lock);
317 if ((it->it_flags & ITEM_LINKED) != 0) {
318 it->it_flags &= ~ITEM_LINKED; //item的标志已经改变了，但是引用还没减1
319 STATS_LOCK();
320 stats.curr_bytes -= ITEM_ntotal(it);
321 stats.curr_items -= 1;
322 STATS_UNLOCK();
323 assoc_delete(ITEM_key(it), it->nkey, hv);
324 item_unlink_q(it);
325 do_item_remove(it); //直到这里refcount才会减1
326 }
327 mutex_unlock(&cache_lock);
328 }

总结：通过上面的分析，个人认为refcount是为了防止在多线程环境中，某个线程删除了其他线程引用的那段item内存，认为这是一种折中的办法，不用在item的操作上面加锁，又在必要的时候避免了其他其他的线程改变了当前线程用的item那段内存，这样大大提高了item操作的并发性。

阅读(6399) | 评论(0) | 转发(1) |

上一篇：memcached中的扩容操作

下一篇：Memcached的网络模型

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6