Chinaunix首页 | 论坛 | 博客
  • 博客访问: 244714
  • 博文数量: 76
  • 博客积分: 1400
  • 博客等级: 上尉
  • 技术积分: 660
  • 用 户 组: 普通用户
  • 注册时间: 2008-10-17 21:43
文章分类
文章存档

2011年(1)

2010年(9)

2009年(20)

2008年(46)

我的朋友

分类: C/C++

2008-10-21 23:30:48

Table Lookup

Use a pre-built lookup table of all the 1-bit counts for every possibly byte, then index into that for each byte that comprises the word. This is the fastest method (slightly) for 32 bits on both Intel and Sparc, and (even more slightly) the fastest for 64 bits on Sparc, falling to second fastest on 64 bits on Intel. Changing the lookup table from anything but unsigned or int makes it a little slower (what with the extra casting and byte-loading the compiler is forced to add.)
unsigned numbits_lookup_table[256] = {
    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2,
    3, 3, 4, 3, 4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3,
    3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3,
    4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4,
    3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5,
    6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,
    4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5,
    6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,
    3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 3,
    4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6,
    6, 7, 6, 7, 7, 8
};

unsigned numbits_lookup(unsigned i)
{
    unsigned n;
    
    n = numbits_lookup_table[i & 0xff];
    n += numbits_lookup_table[i>>8  & 0xff];
    n += numbits_lookup_table[i>>16 & 0xff];
    n += numbits_lookup_table[i>>24 & 0xff];
    
    return n;
}

Counters

If you want a full explanation of how this works, read my writeup at counting 1 bits, but suffice it to say that you are essentially partitioning the word into groups, and combining the groups by adding them together in pairs until you are left with only one group, which is the answer. (performance notes in the next section.)
unsigned numbits(unsigned int i)
{

    unsigned int const MASK1  = 0x55555555;
    unsigned int const MASK2  = 0x33333333;
    unsigned int const MASK4  = 0x0f0f0f0f;
    unsigned int const MASK8  = 0x00ff00ff;
    unsigned int const MASK16 = 0x0000ffff;
    
    i = (i&MASK1 ) + (i>>1 &MASK1 );
    i = (i&MASK2 ) + (i>>2 &MASK2 );
    i = (i&MASK4 ) + (i>>4 &MASK4 );
    i = (i&MASK8 ) + (i>>8 &MASK8 );
    i = (i&MASK16) + (i>>16&MASK16);
    
    return i;
}

Subtract 1 and AND

See counting 1 bits SPOILER for a fuller explanation of this one, but basically the lowest 1-bit gets zeroed out every iteration, so when you run out of 1s to zero, you've iterated to the number of bits in the word. Clever. Unfortunately, not that terribly fast; it's roughly two to three times slower than the lookup and counters methods on both architectures.
unsigned numbits_subtraction(unsigned i)
{
    unsigned n;

    for(n=0; i; n++)
        i &= i-1;
        
    return n;
}
阅读(646) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~