对LINUX内核各种锁的理解-silence0607-ChinaUnix博客

爱梦想&nbsp;爱生活

首页　| 　博文目录　| 　关于我

silence0607

博客访问： 674176
博文数量： 87
博客积分： 0
博客等级：民兵
技术积分： 2022
用户组：普通用户
注册时间： 2014-04-23 11:16

个人简介

西邮大三狗！！！

文章分类

全部博文（87）

Docker（1）
HA/LB（6）
Nosql（2）
网络（3）
Linux（5）
数据结构&算法（0）
Nginx（7）
Mysql（14）
监控（4）
源码编译Lnmp（5）
操作系统（5）
C/C++（12）
Shell/Python（12）
正则（4）
杂技（7）
未分配的博文（0）

文章存档

2015年（47）

2014年（40）

我的朋友

自旋锁

自旋锁是内核中最基础的锁机制。
自旋锁不会引起调用者睡眠，如果自旋锁已经被别的执行单元持有，调用者就一直循环在那里看是否该自旋锁的持有者已经释放了锁，"自旋"一词就是因此而得名。
自旋锁适用于锁使用者保持锁时间比较短的情况。

使用自旋锁需要注意有可能造成的死锁情况:

static DEFINE_SPINLOCK(xxx_lock);
unsigned long flags;
spin_lock_irqsave(&xxx_lock, flags);
... critical section here ..
spin_unlock_irqrestore(&xxx_lock, flags);

代码中spin_lock_irqsave会禁止本地cpu中断的抢占。以上代码在任何情况下都是安全的。但问题是关中断的代价太大。
如果把spin_lock_irqsave/spin_unlock_irqrestore换成spin_lock/spin_unlock会有什么问题吗？
答案是，如果中断中调用了spin_lock，可能会引起死锁！
例如：

spin_lock(&lock);
...
<- interrupt comes in:
spin_lock(&lock);

值得注意的是，如果产生中断的cpu和进程中调用spin_lock的cpu不是同一个，则不会有问题。这也是irq版本的spin_lock函数实现时只需要禁止本地cpu中断的原因。

结论：要想在进程中用spin_lock代替spin_lock_irqsave，条件是中断中不会使用相应的spin_lock

何时使用自旋锁？
不允许睡眠的上下文且临界区操作较短时使用自旋锁。

读写自旋锁

如果读写锁当前没有读者，也没有写者，那么写者可以立刻获得读写锁，否则它必须自旋在那里，直到没有任何写者或读者。如果读写锁没有写者，那么读者可以立即获得该读写锁，否则读者必须自旋在那里，直到写者释放该读写锁。
读写锁适合于对数据结构的读次数比写次数多得多的情况。

注意：读写锁需要比spinlocks更多的访问原子内存操作，如果读临界区不是很大，最好别使用读写锁。

读写锁代码：

点击(此处)折叠或打开

rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
unsigned long flags;
read_lock_irqsave(&xxx_lock, flags);
.. critical section that only reads the info ...
read_unlock_irqrestore(&xxx_lock, flags);
write_lock_irqsave(&xxx_lock, flags);
.. read and write exclusive access to the info ...
write_unlock_irqrestore(&xxx_lock, flags);

读写锁比较适合链表等数据结构，特别是查找远多于修改的情况。

另外，可以灵活的使用read-write和irq版本的自旋锁。例如，如果中断中只是用了读锁，进程中就可以使用non-irq版本的读锁和irq版本的写锁。

注意：RCU比读写锁更适合遍历list，但需要更关注细节。目前kernel社区正在努力用RCU代替读写锁。

信号量

semaphore和spin lock的区别是semaphore会引起睡眠。
查看semaphore的数据结构可以发现，semaphore除了拥有spinlock，还有一个计数器和一个等待队列。当某个进程获取信号量的count值小于等于0时，被添加到wait_list中。

struct semaphore {
raw_spinlock_t lock;
unsigned int count;
struct list_head wait_list;
};

何时使用semaphore？
允许睡眠的上下文、临界区操作较长、计数值大于1时使用semaphore

信号量也有读写信号量，在此略过。

mutex

mutex可以理解成计数值只有0和1的semaphore

既然有了semaphore，内核为何还需要mutex？
因为内核中对二值信号量的需求很大，单独提供一个mutex更利于代码编写和清晰度。

mutex缺点：
为了实现某些性能上的优化，mutex数据结构比semaphore更大（这已经违背了mutex刚设计时的意愿），这也会消耗更多的CPU cache和memory footprint.

何时使用mutex？
允许睡眠的上下文、临界区操作较长、计数值只为0或1时使用mutex
kernel文档建议，在任何需要加锁且mutex可以满足需求的情况都应该使用mutex而不是其他锁。

RCU

RCU(Read-Copy Update)即读-拷贝，更新。对于用RCU保护的资源，读者不需要任何等待，而写者访问它时，需要先拷贝一个副本，然后对副本修改，最后在适当的时机把指向原来数据的指针指向新的数据。这个“适当的时机”指的是没有任何读者操作该资源时。

RCU相关API：

rcu_read_lock()
读者进入临界区
rcu_read_unlock()
读者退出临界区
synchronize_rcu()
由写者调用，当读者都退出老更新前的临界区后，写者才可以返回该函数。
call_rcu()
由写者调用，但不阻塞。该函数的参数中有一个回调函数，当读者都退出更新前的临界区后，调用该回调函数。
rcu_assign_pointer()
给临界区资源赋新值
rcu_dereference()
使用临界区资源

RCU 写者的典型流程：
1. 拷贝一份临界区资源，此时有两份临界区资源，这里称为老资源和新资源
2. 用新资源代替老资源，使得之后的读者访问的是新资源
3. 等待读取老临界区的读者全部退出
4. 此时，老资源已没有读者操作，释放该资源

内核提供了对list,hlist等常用数据结构的RCU版本。对于RCU，对共享数据的操作必须保证能够被没有使用同步机制的读者看到，所以内存栅是非常必要的。内存栅只在alpha架构上才使用。

RCU代替读写锁：

点击(此处)折叠或打开

@@ -13,15 +14,15 @@
struct list_head *lp;
struct el *p;
- read_lock();
- list_for_each_entry(p, head, lp) {
+ rcu_read_lock();
+ list_for_each_entry_rcu(p, head, lp) {
if (p->key == key) {
*result = p->data;
- read_unlock();
+ rcu_read_unlock();
return 1;
}
}
- read_unlock();
+ rcu_read_unlock();
return 0;
}
@@ -29,15 +30,16 @@
{
struct el *p;
- write_lock(&listmutex);
+ spin_lock(&listmutex);
list_for_each_entry(p, head, lp) {
if (p->key == key) {
- list_del(&p->list);
- write_unlock(&listmutex);
+ list_del_rcu(&p->list);
+ spin_unlock(&listmutex);
+ synchronize_rcu();
kfree(p);
return 1;
}
}
- write_unlock(&listmutex);
+ spin_unlock(&listmutex);
return 0;
}

用RCU保护数据结构:

点击(此处)折叠或打开

struct foo {
int a;
char b;
long c;
};
DEFINE_SPINLOCK(foo_mutex);
struct foo *gbl_foo;
/*
* Create a new struct foo that is the same as the one currently
* pointed to by gbl_foo, except that field "a" is replaced
* with "new_a". Points gbl_foo to the new structure, and
* frees up the old structure after a grace period.
*
* Uses rcu_assign_pointer() to ensure that concurrent readers
* see the initialized version of the new structure.
*
* Uses synchronize_rcu() to ensure that any readers that might
* have references to the old structure complete before freeing
* the old structure.
*/
void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;
new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);
spin_unlock(&foo_mutex);
synchronize_rcu();
kfree(old_fp);
}
/*
* Return the value of field "a" of the current gbl_foo
* structure. Use rcu_read_lock() and rcu_read_unlock()
* to ensure that the structure does not get deleted out
* from under us, and use rcu_dereference() to ensure that
* we see the initialized version of the structure (important
* for DEC Alpha and for people reading the code).
*/
int foo_get_a(void)
{
int retval;
rcu_read_lock();
retval = rcu_dereference(gbl_foo)->a;
rcu_read_unlock();
return retval;
}

以下是写者不阻塞的代码：

点击(此处)折叠或打开

struct foo {
int a;
char b;
long c;
struct rcu_head rcu;
};
/*
* Create a new struct foo that is the same as the one currently
* pointed to by gbl_foo, except that field "a" is replaced
* with "new_a". Points gbl_foo to the new structure, and
* frees up the old structure after a grace period.
*
* Uses rcu_assign_pointer() to ensure that concurrent readers
* see the initialized version of the new structure.
*
* Uses call_rcu() to ensure that any readers that might have
* references to the old structure complete before freeing the
* old structure.
*/
void foo_update_a(int new_a)
{
struct foo *new_fp;
struct foo *old_fp;
new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
spin_lock(&foo_mutex);
old_fp = gbl_foo;
*new_fp = *old_fp;
new_fp->a = new_a;
rcu_assign_pointer(gbl_foo, new_fp);
spin_unlock(&foo_mutex);
call_rcu(&old_fp->rcu, foo_reclaim);
}
void foo_reclaim(struct rcu_head *rp)
{
struct foo *fp = container_of(rp, struct foo, rcu);
foo_cleanup(fp->a);
kfree(fp);
}

何时使用RCU？
读操作远多于写操作、且写操作不是特别紧急时使用RCU

顺序锁

顺序锁为写者赋予更高的优先级，写者永远不会等待读者。缺点是读者有时不得不读多次数据以获取正确的结果。
顺序锁的数据结构中除了有spinlock外，还有一个顺序号。如果成功获得锁，顺序锁的顺序号会加1，以便读者能够检查出是否在读期间有写者访问过。读者在读取数据前后两次读顺序值，如果两次值不相同，则说明读取期间有新的写者操作过数据了，那么本次读取就是无效的。

典型使用：
读端：

do {
seqnum = read_seqbegin(&seqlock_a);
//读操作代码块
...
} while (read_seqretry(&seqlock_a, seqnum));

写端：

spin_lock(&lock);
write_seqlock(&seqlock_a)
...
write_sequnlock(&seqlock_a)
spin_unlock(&lock);

写者通过调用write_seqlock()和write_sequnlock()获取和释放顺序锁。write_seqlock()函数获取seqlock_t数据结构中的自旋锁，然后使顺序计数器sequence加1；write_sequnlock()函数再次增加顺序计数器sequence，然后释放自旋锁。这样可以保证写者在整个写的过程中，计数器sequence的值是奇数，并且当没有写者在改变数据的时候，计数器的值是偶数。
read_seqbegin()返回顺序锁的当前顺序号；如果局部变量seq的值是奇数（写者在read_seqbegin()函数被调用后，正更新数据结构），或seq的值与顺序锁的顺序计数器的当前值不匹配（当读者正执行临界区代码时，写者开始工作），read_seqretry()就返回1，说明本次读取失败，需要重新读取。

并不是每一种资源都可以使用顺序锁来保护。一般来说，必须在满足下述条件时才能使用顺序锁：
1. 读者的临界区资源不包括被写者修改和被读者取值的指针，否则，写者有可能使指针失效，读者读取时会产生OPPs。
2. 读者的临界区代码没有副作用。

何时使用顺序锁？
读操作远多于写操作、且写操作很紧急时使用顺序锁。

小结

本文对Linux内核的多种锁的同步机制进行了分析对比，如有疏漏或错误，请各位不吝指正。

阅读(3372) | 评论(0) | 转发(0) |

上一篇：系统性能分析—vmstat

下一篇：fsck原理详解

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6