cacheline_aligned-sigsegv11-ChinaUnix博客

unixmindwangbj.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

sigsegv11

博客访问： 181506
博文数量： 42
博客积分： 2185
博客等级：大尉
技术积分： 455
用户组：普通用户
注册时间： 2009-06-11 21:32

文章分类

全部博文（42）

plan（1）
algo（6）
misc（9）
linux（19）
u-boot（0）
qemu（7）
未分配的博文（0）

文章存档

2012年（5）

2011年（13）

2010年（6）

2009年（18）

我的朋友

相关博文

cacheline_aligned

分类： LINUX

2010-01-15 16:56:55

刚好找到一台可用的8 core cpu，测试多线程版本的使用的线程均为8个。

考虑一个简单的并发计数器程序（仅仅用于演示，很简单的问题）

single threaded:

#include <stdio.h> static unsigned long counter; #define LOOP_TOTAL (1UL << 31) int main(int argc, char* argv[]) { unsigned long i; for (i = 0; i < LOOP_TOTAL; i++) { ++counter; } printf("counter = %lu\n", counter); return 0; }

$ time ./counter-st
counter = 2147483648

real    0m2.245s
user    0m2.244s
sys     0m0.000s

如果采用多线程？怎么来做？

try 1: 使用互斥变量

pthread_mutex_lock(&counter_lock); ++counter; pthread_mutex_unlock(&counter_lock);

pthread_mutex_lock() / pthread_mutex_unlock() 开销比较大，尤其当core很多的时候，绝大多数时间都花在了临界区。互斥访问(mutual exclusion)是非常昂贵的，详情可以参见Lamport's bakery algorithm(derived from Peterson Lock)，其复杂度正比于线程的数量。

$ time ./counter-mt1
counter = 2147483648

real    4m59.876s
user    3m8.431s
sys     34m58.903s

try 2: 使用原子操作

atomic_inc(&counter);

没有了软件实现的临界区保护，保护机制有硬件实现，很大程度上减少了互斥访问开销，但硬件互斥访问也不是“免费”的。

$ time ./counter-mt2
counter = 2147483648

real    1m2.040s
user    7m19.202s
sys     0m0.030s

try 3: 每个线程使用独立的counter，total_counter =
foreach local_counter in thread_local_counters:
    total_counter <- total_counter + local_counter;

这时候完全避免了需要使用互斥/锁的使用，因此效率进一步得到提高：

$ time ./counter-mt3
counter = 2147483648

real    0m0.300s
user    0m2.241s
sys     0m0.001s

try 4: 每个线程使用独立的counter，但CPU的cache line通常为32/64 bytes, 因此cache访问数据都是基于cacheline，而不是字节或是WORD。多个处理器共享一个cache line毫无意义(false sharing)，而且会降低性能，因此我们应该考虑将counter对其到一个cache line。

#define ____cacheline_aligned __attribute__((__aligned__(64))) #define ____cacheline_aligned_in_smp __attribute__((__aligned__(64)))

struct per_cpu_data { unsigned long counter; } ____cacheline_aligned_in_smp;

由于消除了false cache line sharing，效率得以进一步提高（测试比较弱智，显示不出来了）。
（如果观察Linux内核的代码，Linux都用了这种机制用于消除cache false sharing）

$ time ./counter-mt4
counter = 2147483648

real    0m0.287s
user    0m2.244s
sys     0m0.000s

阅读(4110) | 评论(0) | 转发(2) |

上一篇：using inotify to monitor/watch file changes

下一篇：Install Debian testing/lenny into Qemu MIPS Malta

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6