分类: LINUX
2010-10-20 09:24:24
总共需要3次shift, 3次and, 2次add, 1次mul, 1次mod 共10次算数运算。这是32位整数的版本,改成适用于64位整数的版本也很简单。主要思想也是分治以及并行加法。只是与原始的shift_and_add有两点不同:typedef unsigned __int32 uint32;
const uint32 m1=033333333333;
const uint32 m2=011111111111;
const uint32 m3=030707070707;
int popcount_hakmem(uint32 x) {
x -= (x>>1 & m1 + x>>2 & m2);
x += x>>3 & m3;
return x % 63;
}
总共需要 4次shift, 4次and, 2次加法,1次减法,1次乘法共12次算数运算。const uint64 h01 = 0x0101010101010101;
int popcount_3(uint64 x) {
x -= (x >> 1) & m1;
x = (x & m2) + ((x >> 2) & m2);
x = (x + (x >> 4)) & m4;
return (x * h01)>>56;
}
之前几种方法中的所有操作都可以在寄存器中完成,而查找表法由于无法将查找表放在寄存器中,不可避免地要出现访问cache或内存的操作。但由于我们采用一个循环频繁访问查找表,在所有的实现中查找表都应该被锁定在cache中,访问开销很小[4]。因此从后面的评测我们可以看到,这种方法的效率最高。const int lut[256] =
{
0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5,2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
2,3,3,4,3,4,4,5,3,4,4,5,4,5,5,6,3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,
3,4,4,5,4,5,5,6,4,5,5,6,5,6,6,7,4,5,5,6,5,6,6,7,5,6,6,7,6,7,7,8
};
static inline int popcount_lut(uint64 x) {
int i, ret = 0;
for (i = 0; i < 64; i += 8)
ret +=lut[(x >> i) & 0xff];
return ret;
}
popcnt,在Itanium中就已提供。既然CPU提供原生支持,那想必应该是最快的了(据说1个时钟周期就可以[10])。
chinaunix网友2010-10-20 16:35:50
很好的, 收藏了 推荐一个博客,提供很多免费软件编程电子书下载: http://free-ebooks.appspot.com