数据的符号扩展/非符号扩展及其对fgetc()的影响-chg.s-ChinaUnix博客

老虎的试验田lewis.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

chg.s

博客访问： 339965
博文数量： 79
博客积分： 2466
博客等级：大尉
技术积分： 880
用户组：普通用户
注册时间： 2006-02-07 16:47

文章分类

全部博文（79）

算法（3）
信口开河（19）
读书笔记（55）
宝贝筐子（2）
未分配的博文（0）

文章存档

2014年（3）

2012年（7）

2011年（14）

2010年（2）

2009年（2）

2008年（2）

2007年（18）

2006年（31）

我的朋友

相关博文

数据的符号扩展/非符号扩展及其对fgetc()的影响

分类： LINUX

2007-05-16 17:20:39

还是前两天提到的那本关于汇编的书，讲二进制数字的符号扩展和非符号扩展的时候，从汇编语言实现方法的角度，讨论了带符号类型和无符号类型数据进行类型提升时的效果，以及这一效果引起的对fgetc()的一个常见的使用错误。说得简明扼要，清晰易懂，记下来以备今后参考。

Extending of unsigned and signed integers also occurs in C. Variables in
C may be declared as either signed or unsigned (int is signed). Consider
the code in Figure 2.1. In line 3, the variable a is extended using the rules
for unsigned values (using MOVZX), but in line 4, the signed rules are used
for b (using MOVSX).

1 unsigned char uchar = 0xFF;
2 signed char schar = 0xFF;
3 int a = (int ) uchar ; /* a = 255 (0x000000FF) */
4 int b = (int ) schar ; /* b = −1 (0xFFFFFFFF) */
Figure 2.1:

char ch;
while( (ch = fgetc(fp)) != EOF ) {
/* do something with ch */
}
Figure 2.2:

There is a common C programming bug that directly relates to this
subject. Consider the code in Figure 2.2. The prototype of fgetc()is:

int fgetc( FILE * );

One might question why does the function return back an int since it reads
characters? The reason is that it normally does return back an char (extended
to an int value using zero extension). However, there is one value
that it may return that is not a character, EOF. This is a macro that is
usually defined as −1. Thus, fgetc() either returns back a char extended
to an int value (which looks like 000000xx in hex) or EOF (which looks like
FFFFFFFF in hex).

The basic problem with the program in Figure 2.2 is that fgetc() returns
an int, but this value is stored in a char. C will truncate the higher
order bits to fit the int value into the char. The only problem is that the
numbers (in hex) 000000FF and FFFFFFFF both will be truncated to the
byte FF. Thus, the while loop can not distinguish between reading the byte
FF from the file and end of file.

Exactly what the code does in this case, depends on whether char is
signed or unsigned. Why? Because in line 2, ch is compared with EOF.
Since EOF is an int value, ch will be extended to an int so that two values
being compared are of the same size. As Figure 2.1 showed, where the
variable is signed or unsigned is very important.

If char is unsigned, FF is extended to be 000000FF. This is compared to
EOF (FFFFFFFF) and found to be not equal. Thus, the loop never ends!
If char is signed, FF is extended to FFFFFFFF. This does compare as
equal and the loop ends. However, since the byte FF may have been read
from the file, the loop could be ending prematurely.

The solution to this problem is to define the ch variable as an int, not a
char. When this is done, no truncating or extension is done in line 2. Inside
the loop, it is safe to truncate the value since ch must actually be a simple
byte there.

阅读(2947) | 评论(0) | 转发(0) |

上一篇：80x86的几种内存模式

下一篇：不用分支、跳转语句选出两数中较大的一个

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6