Chinaunix首页 | 论坛 | 博客
  • 博客访问: 336800
  • 博文数量: 79
  • 博客积分: 2466
  • 博客等级: 大尉
  • 技术积分: 880
  • 用 户 组: 普通用户
  • 注册时间: 2006-02-07 16:47
文章分类

全部博文(79)

文章存档

2014年(3)

2012年(7)

2011年(14)

2010年(2)

2009年(2)

2008年(2)

2007年(18)

2006年(31)

分类: LINUX

2007-05-16 17:20:39

还是前两天提到的那本关于汇编的书,讲二进制数字的符号扩展和非符号扩展的时候,从汇编语言实现方法的角度,讨论了带符号类型和无符号类型数据进行类型提升时的效果,以及这一效果引起的对fgetc()的一个常见的使用错误。说得简明扼要,清晰易懂,记下来以备今后参考。

Extending of unsigned and signed integers also occurs in C. Variables in
C may be declared as either signed or unsigned (int is signed). Consider
the code in Figure 2.1. In line 3, the variable a is extended using the rules
for unsigned values (using MOVZX), but in line 4, the signed rules are used
for b (using MOVSX).

1 unsigned char uchar = 0xFF;
2 signed char schar = 0xFF;
3 int a = (int ) uchar ; /* a = 255 (0x000000FF) */
4 int b = (int ) schar ; /* b = −1 (0xFFFFFFFF) */
Figure 2.1:

char ch;
while( (ch = fgetc(fp)) != EOF ) {
/* do something with ch */
}
Figure 2.2:

There is a common C programming bug that directly relates to this
subject. Consider the code in Figure 2.2. The prototype of fgetc()is:

int fgetc( FILE * );

One might question why does the function return back an int since it reads
characters? The reason is that it normally does return back an char (extended
to an int value using zero extension). However, there is one value
that it may return that is not a character, EOF. This is a macro that is
usually defined as −1. Thus, fgetc() either returns back a char extended
to an int value (which looks like 000000xx in hex) or EOF (which looks like
FFFFFFFF in hex).

The basic problem with the program in Figure 2.2 is that fgetc() returns
an int, but this value is stored in a char. C will truncate the higher
order bits to fit the int value into the char. The only problem is that the
numbers (in hex) 000000FF and FFFFFFFF both will be truncated to the
byte FF. Thus, the while loop can not distinguish between reading the byte
FF from the file and end of file.

Exactly what the code does in this case, depends on whether char is
signed or unsigned. Why? Because in line 2, ch is compared with EOF.
Since EOF is an int value, ch will be extended to an int so that two values
being compared are of the same size. As Figure 2.1 showed, where the
variable is signed or unsigned is very important.

If char is unsigned, FF is extended to be 000000FF. This is compared to
EOF (FFFFFFFF) and found to be not equal. Thus, the loop never ends!
If char is signed, FF is extended to FFFFFFFF. This does compare as
equal and the loop ends. However, since the byte FF may have been read
from the file, the loop could be ending prematurely.

The solution to this problem is to define the ch variable as an int, not a
char. When this is done, no truncating or extension is done in line 2. Inside
the loop, it is safe to truncate the value since ch must actually be a simple
byte there.
阅读(2918) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~