POSIX threads-djkpengjun-ChinaUnix博客

Chinaunix首页 | 论坛 | 博客

没有代码的日子会死djkpengjun.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

djkpengjun

博客访问： 1620615
博文数量： 399
博客积分： 8508
博客等级：中将
技术积分： 5302
用户组：普通用户
注册时间： 2009-10-14 09:28

个人简介

能力强的人善于解决问题，有智慧的人善于绕过问题。区别很微妙，小心谨慎做后者。

文章分类

全部博文（399）

Kotlin（0）
Archtecture（71）

数据库（1）

Kafka（1）

Domain_Driven_De（2）

搜索（1）

Linux Swiss（1）

编程（5）

Scrum（1）

前端架构（5）

MongoDB（3）

项目架构（10）

Node.js（1）

Angular（2）

AOP（1）

Guava（1）

Web Crawler（2）

Play（8）

高并发（15）

Load_Balance（3）

Hadoop（4）

REST（4）
金融IT常识（1）
信息压缩理论（1）
EMC的日子（16）

Shell Ahead（2）
简历（1）
wingdb调试（1）
职业规划（4）
养生（1）
分布式（4）
五险一金（1）
linux内核研究（15）
人际交往（2）
算法导论（1）
VS2005（0）
概率（21）
google（1）
百度分享（1）
跳槽必看（9）
智力题（7）
SHELL 脚本（2）
大规模数据处理（6）
POJ（16）
wince（1）
笔试面试（28）
ACM（17）
操作系统（10）
网络（14）
算法（55）

国际大学ACM程序（0）

国际大学ACM程序（15）

ACM程序设计培训（23）
数据结构（11）
c++（45）
嵌入式（19）
未分配的博文（17）

文章存档

2018年（3）

2017年（1）

2016年（1）

2015年（69）

2013年（14）

2012年（17）

2011年（12）

2010年（189）

2009年（93）

我的朋友

相关博文

POSIX threads

分类： LINUX

2010-10-11 10:29:02

POSIX threads(简称Pthreads)是在多核平台上进行并行编程的一套常用的API。线程同步(Thread Synchronization)是并行编程中非常重要的通讯手段，其中最典型的应用就是用Pthreads提供的锁机制(lock)来对多个线程之间共享的临界区(Critical Section)进行保护(另一种常用的同步机制是barrier)。

Pthreads提供了多种锁机制：
(1) Mutex（互斥量）：pthread_mutex_***
(2) Spin lock（自旋锁）：pthread_spin_***
(3) Condition Variable（条件变量）：pthread_con_***
(4) Read/Write lock（读写锁）：pthread_rwlock_***

Pthreads提供的Mutex锁操作相关的API主要有：

Pthreads提供的与Spin Lock锁操作相关的API主要有：

从实现原理上来讲，Mutex属于sleep-waiting类型的锁。例如在一个双核的机器上有两个线程(线程A和线程B)，它们分别运行在Core0和Core1上。假设线程A想要通过pthread_mutex_lock操作去得到一个临界区的锁，而此时这个锁正被线程B所持有，那么线程A就会被阻塞(blocking)，Core0 会在此时进行上下文切换(Context Switch)将线程A置于等待队列中，此时Core0就可以运行其他的任务(例如另一个线程C)而不必进行忙等待。而Spin lock则不然，它属于busy-waiting类型的锁，如果线程A是使用pthread_spin_lock操作去请求锁，那么线程A就会一直在 Core0上进行忙等待并不停的进行锁请求，直到得到这个锁为止。

如果大家去查阅Linux glibc中对pthreads API的实现NPTL() 的源码的话(使用”getconf GNU_LIBPTHREAD_VERSION”命令可以得到我们系统中NPTL的版本号)，就会发现pthread_mutex_lock()操作如果没有锁成功的话就会调用system_wait()的系统调用（现在NPTL的实现采用了用户空间的，不需要频繁进行系统调用，性能已经大有改善），并将当前线程加入该mutex的等待队列里。而spin lock则可以理解为在一个while(1)循环中用内嵌的汇编代码实现的锁操作(印象中看过一篇论文介绍说在linux内核中spin lock操作只需要两条CPU指令，解锁操作只用一条指令就可以完成)。有兴趣的朋友可以参考另一个名为的微内核中pthreds API的实现：，尽管与NPTL中的代码实现不尽相同，但是因为它的实现非常简单易懂，对我们理解spin lock和mutex的特性还是很有帮助的。

那么在实际编程中mutex和spin lcok哪个的性能更好呢？我们知道spin lock在Linux内核中有非常广泛的利用，那么这是不是说明spin lock的性能更好呢？下面让我们来用实际的代码测试一下（请确保你的系统中已经安装了最近的g++）。

001
// Name: spinlockvsmutex1.cc 

002
// Source:  

003
// Compiler(spin lock version): g++ -o spin_version -DUSE_SPINLOCK spinlockvsmutex1.cc -lpthread 

004
// Compiler(mutex version): g++ -o mutex_version spinlockvsmutex1.cc -lpthread 

005
#include  

006
#include  

007
#include  

008
#include  

009
#include  

010
#include  

011
#include  

012
  
013
#define LOOPS 50000000 

014
  
015
using namespace std; 

016
  
017
list<int> the_list; 

018
  
019
#ifdef USE_SPINLOCK 

020
pthread_spinlock_t spinlock; 

021
#else 

022
pthread_mutex_t mutex; 

023
#endif 

024
  
025
//Get the thread id 

026
pid_t gettid() { return syscall( __NR_gettid ); } 

027
  
028
void *consumer(void *ptr) 

029
{ 

030
    int i; 

031
  
032
    printf("Consumer TID %lun", (unsigned long)gettid()); 

033
  
034
    while (1) 

035
    { 

036
#ifdef USE_SPINLOCK 

037
        pthread_spin_lock(&spinlock); 

038
#else 

039
        pthread_mutex_lock(&mutex); 

040
#endif 

041
  
042
        if (the_list.empty()) 

043
        { 

044
#ifdef USE_SPINLOCK 

045
            pthread_spin_unlock(&spinlock); 

046
#else 

047
            pthread_mutex_unlock(&mutex); 

048
#endif 

049
            break; 

050
        } 

051
  
052
        i = the_list.front(); 

053
        the_list.pop_front(); 

054
  
055
#ifdef USE_SPINLOCK 

056
        pthread_spin_unlock(&spinlock); 

057
#else 

058
        pthread_mutex_unlock(&mutex); 

059
#endif 

060
    } 

061
  
062
    return NULL; 

063
} 

064
  
065
int main() 

066
{ 

067
    int i; 

068
    pthread_t thr1, thr2; 

069
    struct timeval tv1, tv2; 

070
  
071
#ifdef USE_SPINLOCK 

072
    pthread_spin_init(&spinlock, 0); 

073
#else 

074
    pthread_mutex_init(&mutex, NULL); 

075
#endif 

076
  
077
    // Creating the list content... 

078
    for (i = 0; i < LOOPS; i++) 

079
        the_list.push_back(i); 

080
  
081
    // Measuring time before starting the threads... 

082
    gettimeofday(&tv1, NULL); 

083
  
084
    pthread_create(&thr1, NULL, consumer, NULL); 

085
    pthread_create(&thr2, NULL, consumer, NULL); 

086
  
087
    pthread_join(thr1, NULL); 

088
    pthread_join(thr2, NULL); 

089
  
090
    // Measuring time after threads finished... 

091
    gettimeofday(&tv2, NULL); 

092
  
093
    if (tv1.tv_usec > tv2.tv_usec) 

094
    { 

095
        tv2.tv_sec--; 

096
        tv2.tv_usec += 1000000; 

097
    } 

098
  
099
    printf("Result - %ld.%ldn", tv2.tv_sec - tv1.tv_sec, 

100
        tv2.tv_usec - tv1.tv_usec); 

101
  
102
#ifdef USE_SPINLOCK 

103
    pthread_spin_destroy(&spinlock); 

104
#else 

105
    pthread_mutex_destroy(&mutex); 

106
#endif 

107
  
108
    return 0; 

109
}

该程序运行过程如下：主线程先初始化一个list结构，并根据LOOPS的值将对应数量的entry插入该list，之后创建两个新线程，它们都执行consumer()这个任务。两个被创建的新线程同时对这个list进行pop操作。主线程会计算从创建两个新线程到两个新线程结束之间所用的时间，输出为下文中的”Result “。

测试机器参数：
Ubuntu 9.04 X86_64
Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz
4.0 GB Memory

从下面是测试结果：

01
gchen@gchen-desktop:~/Workspace/mutex$ g++ -o spin_version -DUSE_SPINLOCK spinvsmutex1.cc -lpthread 

02
gchen@gchen-desktop:~/Workspace/mutex$ g++ -o mutex_version spinvsmutex1.cc -lpthread 

03
gchen@gchen-desktop:~/Workspace/mutex$ time ./spin_version 

04
Consumer TID 5520 

05
Consumer TID 5521 

06
Result - 5.888750 

07
  
08
real    0m10.918s 

09
user    0m15.601s 

10
sys    0m0.804s 

11
  
12
gchen@gchen-desktop:~/Workspace/mutex$ time ./mutex_version 

13
Consumer TID 5691 

14
Consumer TID 5692 

15
Result - 9.116376 

16
  
17
real    0m14.031s 

18
user    0m12.245s 

19
sys    0m4.368s

可以看见spin lock的版本在该程序中表现出来的性能更好。另外值得注意的是sys时间，mutex版本花费了更多的系统调用时间，这就是因为mutex会在锁冲突时调用system wait造成的。

但是，是不是说spin lock就一定更好了呢？让我们再来看一个锁冲突程度非常剧烈的实例程序：

01
//Name: svm2.c 

02
//Source:  

03
//Compile(spin lock version): gcc -o spin -DUSE_SPINLOCK svm2.c -lpthread 

04
//Compile(mutex version): gcc -o mutex svm2.c -lpthread 

05
#include  

06
#include  

07
#include  

08
#include  

09
  
10
#define        THREAD_NUM     2 

11
  
12
pthread_t g_thread[THREAD_NUM]; 

13
#ifdef USE_SPINLOCK 

14
pthread_spinlock_t g_spin; 

15
#else 

16
pthread_mutex_t g_mutex; 

17
#endif 

18
__uint64_t g_count; 

19
  
20
pid_t gettid() 

21
{ 

22
    return syscall(SYS_gettid); 

23
} 

24
  
25
void *run_amuck(void *arg) 

26
{ 

27
       int i, j; 

28
  
29
       printf("Thread %lu started.n", (unsigned long)gettid()); 

30
  
31
       for (i = 0; i < 10000; i++) { 

32
#ifdef USE_SPINLOCK 

33
           pthread_spin_lock(&g_spin); 

34
#else 

35
               pthread_mutex_lock(&g_mutex); 

36
#endif 

37
               for (j = 0; j < 100000; j++) { 

38
                       if (g_count++ == 123456789) 

39
                               printf("Thread %lu wins!n", (unsigned long)gettid()); 

40
               } 

41
#ifdef USE_SPINLOCK 

42
           pthread_spin_unlock(&g_spin); 

43
#else 

44
               pthread_mutex_unlock(&g_mutex); 

45
#endif 

46
       } 

47
  
48
       printf("Thread %lu finished!n", (unsigned long)gettid()); 

49
  
50
       return (NULL); 

51
} 

52
  
53
int main(int argc, char *argv[]) 

54
{ 

55
       int i, threads = THREAD_NUM; 

56
  
57
       printf("Creating %d threads...n", threads); 

58
#ifdef USE_SPINLOCK 

59
       pthread_spin_init(&g_spin, 0); 

60
#else 

61
       pthread_mutex_init(&g_mutex, NULL); 

62
#endif 

63
       for (i = 0; i < threads; i++) 

64
               pthread_create(&g_thread[i], NULL, run_amuck, (void *) i); 

65
  
66
       for (i = 0; i < threads; i++) 

67
               pthread_join(g_thread[i], NULL); 

68
  
69
       printf("Done.n"); 

70
  
71
       return (0); 

72
}

这个程序的特征就是临界区非常大，这样两个线程的锁竞争会非常的剧烈。当然这个是一个极端情况，实际应用程序中临界区不会如此大，锁竞争也不会如此激烈。测试结果显示mutex版本性能更好：

01
gchen@gchen-desktop:~/Workspace/mutex$ time ./spin 

02
Creating 2 threads... 

03
Thread 31796 started. 

04
Thread 31797 started. 

05
Thread 31797 wins! 

06
Thread 31797 finished! 

07
Thread 31796 finished! 

08
Done. 

09
  
10
real    0m5.748s 

11
user    0m10.257s 

12
sys    0m0.004s 

13
  
14
gchen@gchen-desktop:~/Workspace/mutex$ time ./mutex 

15
Creating 2 threads... 

16
Thread 31801 started. 

17
Thread 31802 started. 

18
Thread 31802 wins! 

19
Thread 31802 finished! 

20
Thread 31801 finished! 

21
Done. 

22
  
23
real    0m4.823s 

24
user    0m4.772s 

25
sys    0m0.032s

另外一个值得注意的细节是spin lock耗费了更多的user time。这就是因为两个线程分别运行在两个核上，大部分时间只有一个线程能拿到锁，所以另一个线程就一直在它运行的core上进行忙等待，CPU占用率一直是100%；而mutex则不同，当对锁的请求失败后上下文切换就会发生，这样就能空出一个核来进行别的运算任务了。（其实这种上下文切换对已经拿着锁的那个线程性能也是有影响的，因为当该线程释放该锁时它需要通知操作系统去唤醒那些被阻塞的线程，这也是额外的开销）

总结
（1）Mutex适合对锁操作非常频繁的场景，并且具有更好的适应性。尽管相比spin lock它会花费更多的开销（主要是上下文切换），但是它能适合实际开发中复杂的应用场景，在保证一定性能的前提下提供更大的灵活度。

（2）spin lock的lock/unlock性能更好(花费更少的cpu指令)，但是它只适应用于临界区运行时间很短的场景。而在实际软件开发中，除非程序员对自己的程序的锁操作行为非常的了解，否则使用spin lock不是一个好主意(通常一个多线程程序中对锁的操作有数以万次，如果失败的锁操作(contended lock requests)过多的话就会浪费很多的时间进行空等待)。

（3）更保险的方法或许是先（保守的）使用 Mutex，然后如果对性能还有进一步的需求，可以尝试使用spin lock进行调优。毕竟我们的程序不像Linux kernel那样对性能需求那么高(Linux Kernel最常用的锁操作是spin lock和rw lock)。

2010年3月3日补记：这个观点在Oracle的文档中得到了支持：

During configuration, Berkeley DB selects a mutex implementation for the architecture. Berkeley DB normally prefers blocking-mutex implementations over non-blocking ones. For example, Berkeley DB will select POSIX pthread mutex interfaces rather than assembly-code test-and-set spin mutexes because pthread mutexes are usually more efficient and less likely to waste CPU cycles spinning without getting any work accomplished.

p.s.调用syscall(SYS_gettid)和syscall( __NR_gettid )都可以得到当前线程的id:)

阅读(1216) | 评论(0) | 转发(0) |

上一篇：linuxe network 专家

下一篇：keepalive

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6