对LINUX内核各种锁的理解-Larpenteur-ChinaUnix博客

尘世中一个迷途小书童riverhwp.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

Larpenteur

博客访问： 6426090
博文数量： 2759
博客积分： 1021
博客等级：中士
技术积分： 4091
用户组：普通用户
注册时间： 2012-03-11 14:14

文章分类

全部博文（2759）

Todo（1）
Advice（151）
Linux-未分类（223）
Ubuntu（47）
Database（145）
算法&DS（77）
Android（47）
Web（214）
Geek（237）
CPPC（296）
Java（113）
Python（99）
Matlab（19）
Git（19）
SVN（11）
Gnuplot（5）
面试（0）
机器-挖掘-AI（6）
开源项目（1）
Happy Drawe（9）
Programming（144）

Tools（23）

Shell（66）

Makefile（11）

GDB（26）

vim（18）
System（628）

Author（110）

Common（4）

Memory（66）

File system（82）

Driver（19）

IO（66）

Storage（45）

General（38）

Architecture（19）

Command（64）

Kernel（115）
Virtualization（39）
Cloud（33）
Hadoop（71）
Big Data（24）
未分配的博文（100）

文章存档

2019年（1）

2017年（84）

2016年（196）

2015年（204）

2014年（636）

2013年（1176）

2012年（463）

我的朋友

自旋锁

自旋锁是内核中最基础的锁机制。
自旋锁不会引起调用者睡眠，如果自旋锁已经被别的执行单元持有，调用者就一直循环在那里看是否该自旋锁的持有者已经释放了锁，"自旋"一词就是因此而得名。
自旋锁适用于锁使用者保持锁时间比较短的情况。

使用自旋锁需要注意有可能造成的死锁情况:

static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags;
spin_lock_irqsave(&xxx_lock, flags);
... critical section here ..
spin_unlock_irqrestore(&xxx_lock, flags);

代码中spin_lock_irqsave会禁止本地cpu中断的抢占。以上代码在任何情况下都是安全的。但问题是关中断的代价太大。
如果把spin_lock_irqsave/spin_unlock_irqrestore换成spin_lock/spin_unlock会有什么问题吗？
答案是，如果中断中调用了spin_lock，可能会引起死锁！
例如：

spin_lock(&lock);
...
<- interrupt comes in:
spin_lock(&lock);

值得注意的是，如果产生中断的cpu和进程中调用spin_lock的cpu不是同一个，则不会有问题。这也是irq版本的spin_lock函数实现时只需要禁止本地cpu中断的原因。

结论：要想在进程中用spin_lock代替spin_lock_irqsave，条件是中断中不会使用相应的spin_lock

何时使用自旋锁？
不允许睡眠的上下文且临界区操作较短时使用自旋锁。

读写自旋锁

如果读写锁当前没有读者，也没有写者，那么写者可以立刻获得读写锁，否则它必须自旋在那里，直到没有任何写者或读者。如果读写锁没有写者，那么读者可以立即获得该读写锁，否则读者必须自旋在那里，直到写者释放该读写锁。
读写锁适合于对数据结构的读次数比写次数多得多的情况。

注意：读写锁需要比spinlocks更多的访问原子内存操作，如果读临界区不是很大，最好别使用读写锁。

读写锁代码：

点击(此处)折叠或打开

rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
unsigned long flags;
read_lock_irqsave(&xxx_lock, flags);
.. critical section that only reads the info ...
read_unlock_irqrestore(&xxx_lock, flags);
write_lock_irqsave(&xxx_lock, flags);
.. read and write exclusive access to the info ...
write_unlock_irqrestore(&xxx_lock, flags);

读写锁比较适合链表等数据结构，特别是查找远多于修改的情况。

另外，可以灵活的使用read-write和irq版本的自旋锁。例如，如果中断中只是用了读锁，进程中就可以使用non-irq版本的读锁和irq版本的写锁。

注意：RCU比读写锁更适合遍历list，但需要更关注细节。目前kernel社区正在努力用RCU代替读写锁。

信号量

semaphore和spin lock的区别是semaphore会引起睡眠。
查看semaphore的数据结构可以发现，semaphore除了拥有spinlock，还有一个计数器和一个等待队列。当某个进程获取信号量的count值小于等于0时，被添加到wait_list中。

struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};

何时使用semaphore？
允许睡眠的上下文、临界区操作较长、计数值大于1时使用semaphore

信号量也有读写信号量，在此略过。

mutex

mutex可以理解成计数值只有0和1的semaphore

既然有了semaphore，内核为何还需要mutex？
因为内核中对二值信号量的需求很大，单独提供一个mutex更利于代码编写和清晰度。

mutex缺点：
为了实现某些性能上的优化，mutex数据结构比semaphore更大（这已经违背了mutex刚设计时的意愿），这也会消耗更多的CPU cache和memory footprint.

何时使用mutex？
允许睡眠的上下文、临界区操作较长、计数值只为0或1时使用mutex
kernel文档建议，在任何需要加锁且mutex可以满足需求的情况都应该使用mutex而不是其他锁。

RCU

RCU(Read-Copy Update)即读-拷贝，更新。对于用RCU保护的资源，读者不需要任何等待，而写者访问它时，需要先拷贝一个副本，然后对副本修改，最后在适当的时机把指向原来数据的指针指向新的数据。这个“适当的时机”指的是没有任何读者操作该资源时。

RCU相关API：

rcu_read_lock()
读者进入临界区
rcu_read_unlock()
读者退出临界区
synchronize_rcu()
由写者调用，当读者都退出老更新前的临界区后，写者才可以返回该函数。
call_rcu()
由写者调用，但不阻塞。该函数的参数中有一个回调函数，当读者都退出更新前的临界区后，调用该回调函数。
rcu_assign_pointer()
给临界区资源赋新值
rcu_dereference()
使用临界区资源

RCU 写者的典型流程：
1. 拷贝一份临界区资源，此时有两份临界区资源，这里称为老资源和新资源
2. 用新资源代替老资源，使得之后的读者访问的是新资源
3. 等待读取老临界区的读者全部退出
4. 此时，老资源已没有读者操作，释放该资源

内核提供了对list,hlist等常用数据结构的RCU版本。对于RCU，对共享数据的操作必须保证能够被没有使用同步机制的读者看到，所以内存栅是非常必要的。内存栅只在alpha架构上才使用。

RCU代替读写锁：

点击(此处)折叠或打开

@@ -13,15 +14,15 @@
struct list_head *lp;
struct el *p;
- read_lock();
- list_for_each_entry(p, head, lp) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(p, head, lp) {
if (p->key == key) {
*result = p->data;
- read_unlock();
+ rcu_read_unlock();
return 1;
}
}
- read_unlock();
+ rcu_read_unlock();
return 0;
}
@@ -29,15 +30,16 @@
{
struct el *p;
- write_lock(&listmutex);
+ spin_lock(&listmutex);
list_for_each_entry(p, head, lp) {
if (p->key == key) {
- list_del(&p->list);
- write_unlock(&listmutex);
+ list_del_rcu(&p->list);
+ spin_unlock(&listmutex);
+ synchronize_rcu();
kfree(p);
return 1;
}
}
- write_unlock(&listmutex);
+ spin_unlock(&listmutex);
return 0;
}

用RCU保护数据结构:

点击(此处)折叠或打开

struct foo {
int a;
char b;
long c;
};
DEFINE_SPINLOCK(foo_mutex);
struct foo *gbl_foo;
/*
* Create a new struct foo that is the same as the one currently
* pointed to by gbl_foo, except that field "a" is replaced
* with "new_a". Points gbl_foo to the new structure, and
* frees up the old structure after a grace period.
*
* Uses rcu_assign_pointer() to ensure that concurrent readers
* see the initialized version of the new structure.
*
* Uses synchronize_rcu() to ensure that any readers that might
* have references to the old structure complete before freeing
* the old structure.
*/
void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;
new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);
spin_unlock(&foo_mutex);
synchronize_rcu();
kfree(old_fp);
}
/*
* Return the value of field "a" of the current gbl_foo
* structure. Use rcu_read_lock() and rcu_read_unlock()
* to ensure that the structure does not get deleted out
* from under us, and use rcu_dereference() to ensure that
* we see the initialized version of the structure (important
* for DEC Alpha and for people reading the code).
*/
int foo_get_a(void)
{
int retval;
rcu_read_lock();
retval = rcu_dereference(gbl_foo)->a;
rcu_read_unlock();
return retval;
}

以下是写者不阻塞的代码：

点击(此处)折叠或打开

struct foo {
int a;
char b;
long c;
struct rcu_head rcu;
};
/*
* Create a new struct foo that is the same as the one currently
* pointed to by gbl_foo, except that field "a" is replaced
* with "new_a". Points gbl_foo to the new structure, and
* frees up the old structure after a grace period.
*
* Uses rcu_assign_pointer() to ensure that concurrent readers
* see the initialized version of the new structure.
*
* Uses call_rcu() to ensure that any readers that might have
* references to the old structure complete before freeing the
* old structure.
*/
void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;
new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);
spin_unlock(&foo_mutex);
call_rcu(&old_fp->rcu, foo_reclaim);
}
void foo_reclaim(struct rcu_head *rp)
{
struct foo *fp = container_of(rp, struct foo, rcu);
foo_cleanup(fp->a);
kfree(fp);
}

何时使用RCU？
读操作远多于写操作、且写操作不是特别紧急时使用RCU

顺序锁

顺序锁为写者赋予更高的优先级，写者永远不会等待读者。缺点是读者有时不得不读多次数据以获取正确的结果。
顺序锁的数据结构中除了有spinlock外，还有一个顺序号。如果成功获得锁，顺序锁的顺序号会加1，以便读者能够检查出是否在读期间有写者访问过。读者在读取数据前后两次读顺序值，如果两次值不相同，则说明读取期间有新的写者操作过数据了，那么本次读取就是无效的。

典型使用：
读端：

do {
seqnum = read_seqbegin(&seqlock_a);
//读操作代码块
...
} while (read_seqretry(&seqlock_a, seqnum));

写端：

spin_lock(&lock);
write_seqlock(&seqlock_a)
...
write_sequnlock(&seqlock_a)
spin_unlock(&lock);

写者通过调用write_seqlock()和write_sequnlock()获取和释放顺序锁。write_seqlock()函数获取seqlock_t数据结构中的自旋锁，然后使顺序计数器sequence加1；write_sequnlock()函数再次增加顺序计数器sequence，然后释放自旋锁。这样可以保证写者在整个写的过程中，计数器sequence的值是奇数，并且当没有写者在改变数据的时候，计数器的值是偶数。
read_seqbegin()返回顺序锁的当前顺序号；如果局部变量seq的值是奇数（写者在read_seqbegin()函数被调用后，正更新数据结构），或seq的值与顺序锁的顺序计数器的当前值不匹配（当读者正执行临界区代码时，写者开始工作），read_seqretry()就返回1，说明本次读取失败，需要重新读取。

并不是每一种资源都可以使用顺序锁来保护。一般来说，必须在满足下述条件时才能使用顺序锁：
1. 读者的临界区资源不包括被写者修改和被读者取值的指针，否则，写者有可能使指针失效，读者读取时会产生OPPs。
2. 读者的临界区代码没有副作用。

何时使用顺序锁？
读操作远多于写操作、且写操作很紧急时使用顺序锁。

小结

本文对Linux内核的多种锁的同步机制进行了分析对比，如有疏漏或错误，请各位不吝指正。

阅读(1927) | 评论(0) | 转发(0) |

上一篇：如何让linux服务器磁盘io性能翻倍

下一篇：大时代提供的机遇——近距离揭秘万达电商(1)

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6