怎么介绍?
分类:
2010-09-06 06:48:36
by
Consider the bits within a byte. The least significant bit is number 0 and the most significant bit is number 7:
MSB | LSB | ||||||
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
Numbering the bits this way is the only way that makes sense. This way, left-shifting one by n sets bit n in the byte:
/* function to set bit "n" in byte "mask" */
char le_set_bit(char mask, int n)
{
return mask | (1 << n);
}
/* function to clear bit "n" in byte "mask" */
char le_clear_bit(char mask, int n)
{
return mask & ~(1 << n);
}
/* function to return the value of bit "n" in byte "mask" */
int le_test_bit(char mask, int n)
{
return (mask & (1 << n)) != 0;
}
Now suppose, perversely, we decided to number the bits in the opposite order[1]:
MSB | LSB | ||||||
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
We can still set, clear and test bits, but we have to know how wide a byte is in order to do it (usually, but not always, eight bits -- this is why the Internet RFCs refer to "octets" and not "bytes"):
/* function to set bit "n" in byte "mask" */
char be_set_bit(char mask, int n)
{
return mask | (0x80 >> n);
}
/* function to clear bit "n" in byte "mask" */
char be_clear_bit(char mask, int n)
{
return mask & ~(0x80 >> n);
}
/* function to return the value of bit "n" in byte "mask" */
int be_test_bit(char mask, int n)
{
return (mask & (0x80 >> n)) != 0;
}
I don't think there's any controversy about which order is the right order for numbering the bits within a byte. So why on earth would you ever want to number the bytes in the opposite order?
Consider the bytes within a word (and here me mean "word" in the DEC sense: an integer two bytes wide, often called a halfword). These bytes are stored in sequential addresses in memory, and one of those addresses is lower than the other. The lower address holds byte zero and the higher address holds byte 1:
MSB | LSB | MSB | LSB | ||||||||||||
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
byte 1 | byte 0 |
Now, if we have a pointer to this word, no matter what the endianness of the processor it holds the address of byte 0:
unsigned short word = 0x00FF;
unsigned char *byte = (unsigned char *) &word;
On either a little- or big-endian machine, the following boolean statement evaluates as true:
(long) byte == (long) &word
So what are the values in byte[0] and byte[1]?
On a little-endian machine, byte[0] == 0xFF and byte[1] == 0x00. This has the nice property that, in addition to the statement above, the following statement also evaluates as true for all values of word that fit in a byte:
*byte == word
In other words, dereferencing a pointer-to-byte that holds the address of a word-sized value that happens to fit in a byte gives the same value.
On a big-endian machine, byte[0] == 0x00 and byte[1] == 0xFF, so that even though the addresses are still equal and the value would fit in a byte, the values differ:
*byte != word
Nonetheless, on either little- or big-endian machines:
*(unsigned short *)byte == word
There are those who would argue that by comparing two different widths of integer I get what I deserve. However, the C programming language does the implicit integer promotions in boolean expressions, so the fact that
if(*byte == word) {
/* ... */
}
won't branch but
if(*(unsigned short *)byte == word) {
/* ... */
}
will branch, when both statements are perfectly valid standard C and neither statement will even generate a compiler warning, is at the very least a trap set for the unwary on big-endian machines.