C语言的黑暗角落: 副作用与序列点-slimzhao-ChinaUnix博客

一笑

首页　| 　博文目录　| 　关于我

slimzhao

博客访问： 2393251
博文数量： 527
博客积分： 10343
博客等级：上将
技术积分： 5565
用户组：普通用户
注册时间： 2005-07-26 23:05

文章分类

全部博文（527）

static-analyze（2）
test（1）
GNU make（5）
linux（15）
debug（17）
杂想（2）
其它（170）
Perl 脚本（5）
.NET/C#（103）
源代码/读书（7）
windows编程（24）
c/c++编程（101）
vim（20）
bash 脚本（39）
未分配的博文（16）

文章存档

2014年（4）

2012年（13）

2011年（19）

2010年（91）

2009年（136）

2008年（142）

2007年（80）

2006年（29）

2005年（13）

我的朋友

jiangjia

相关博文

C语言的黑暗角落: 副作用与序列点

分类： C/C++

2010-06-21 18:01:14

副作用与序列点

int a = i++;

变量a取得i在自增1之前的值, 表达式i++的正作用是产生i的值(左值), 副作用是要保证i自增1. 地球人都知道. 下面是地球人不一定都知道的:

      mov eax, [esp-12]          ; 取变量i的值
      mov [esp-16], eax          ; 将取得的变量i的值存入变量a
      inc [esp-12]		 ; 让变量i自增1

第三条语句与第二条语句互换也是完全可以的. 这在一条简单语句里没什么稀奇;

      int i = 1;
      int a = (i++) + (i++);

即使我没有写很变态的 i+++++i, 即使我多余地加了括号以正视听, 这个语句的值还是暗藏玄机, 问题在于它不止有一个符合C语言标准的值. 它可以是2, 3. 因为语言标准对这种情况副作用于何时发生未作规定, 编译器可以任意决定.

      mov eax, [esp-12]          ; 取变量i的值
      inc [esp-12]		 ; 让变量i自增1
      mov ebx, [esp-12]          ; 取变量i的值
      inc [esp-12]		 ; 让变量i自增1
      add eax, ebx		 ; 相加
      mov [esp-16], eax		 ; 结果存入a

这样得到3.

      mov eax, [esp-12]          ; 取变量i的值
      mov ebx, [esp-12]          ; 取变量i的值
      inc [esp-12]		 ; 让变量i自增1
      inc [esp-12]		 ; 让变量i自增1
      add eax, ebx		 ; 相加
      mov [esp-16], eax		 ; 结果存入a

这样得到2.

C99标准中这样说:

2 Accessing a volatile object, modifying an object, modifying a file, or calling a function
  that does any of those operations are all side effects,11) which are changes in the state of
  the execution environment. Evaluation of an expression may produce side effects. At
  certain specified points in the execution sequence called sequence points, all side effects
  of previous evaluations shall be complete and no side effects of subsequent evaluations
  shall have taken place. (A summary of the sequence points is given in annex C.)

对sequence point的定义是:

Sequence points
1 The following are the sequence points described in 5.1.2.3:
--- The call to a function, after the arguments have been evaluated (6.5.2.2).
--- The end of the first operand of the following operators: logical AND && (6.5.13);
    logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
--- The end of a full declarator: declarators (6.7.5);
--- The end of a full expression: an initializer (6.7.8); the expression in an expression
    statement (6.8.3); the controlling expression of a selection statement (if or switch)
    (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
    expressions of a for statement (6.8.5.3); the expression in a return statement
    (6.8.6.4).
--- Immediately before a library function returns (7.1.4).
--- After the actions associated with each formatted input/output function conversion
    specifier (7.19.6, 7.24.2).
--- Immediately before and immediately after each call to a comparison function, and
    also between any call to a comparison function and any movement of the objects
    passed as arguments to that call (7.20.5).

为什么叫序列点, 我猜想这个术语的选择是基于这样的考虑: 底层最终负责执行的机器(或C假想中的一个C语言执行机)需要以更原始的操作来实现C语言中的一条语句. 这些操作当然与C的高级语句不一定是一一对应的关系, 所以需要确定在这些原始操作操作中的一些特殊的点, 当执行流到这样的特殊点时, 恰好对应一个C语言语句或表达式完成了它的全部语意(包括副作用). 在这些点上C语言的表达式是意义完整的. 如 int a = i++;

      mov eax, [esp-12]          ; 取变量i的值
      mov [esp-16], eax          ; 将取得的变量i的值存入变量a
      inc [esp-12]		 ; 让变量i自增1

执行点不能是在第1或第2条语句之后, 因为此时无法确定i++的状态.

对序列点的精确定义确定了在什么样的范围内同一个对象的副作用发生多次时其结果是标准未加规定的, 曾经看到一个叫"时代兔子"的在

说: 标准规定，在两个序列点之间，一个对象所保存值最多只能被修改一次。

对这个"只能被修改一次", 可以做下面的理解:

如果通过副作用在两个序列点之间修改了同一个对象两次, 程序执行时只会修改它一次.
程序员只能修改一次, 修改多次时(1)编译时会报错(2)运行时会怎样?

应该是, 在两个序列点之间, 如果对一个对象进行了多次修改, 则其行为是未定义的(undefined)

Between two sequence points, an object is modified more than once, or is modified
and the prior value is read other than to determine the value to be stored (6.5).

包括了这样的情况

     a = i + i++;

上面的意思是在两个序列点之间, 一个对象被修改多于一次, 或者(虽然只被修改一次)被修改的同时还被读取了, 而且这个读取并非用于修改该对象. 上面i++的实现如果是通过(1)读取i的值到寄存器中(2)将寄存器中的值加1(3)将寄存器中的值存回变量i所在的存储单元实现的, 则步骤(1)中的读取就是"用于修改该对象"的读取. 标准中的那句话即指如果在两个序列点之间, 除了这次读取还有其它的读取, 那么即使只修改对象一次, 其行为也是未定义的. 在i + i++中, 作为+号运算符的第一个运算子i值的获取就需要一次"读", 而这次读不是用于修改i值的那次.

我相信这一点对于即使是了解序列点概念的人来说, 比起相邻两个序列点之间对同一对象的多于一次修改更为阴险.

注意上面的(1)(2)(3)步骤实现i++是完全可能的, 这是因为并非所有的机器指令集都如Intel的那样支持对一个内存单元的内容直接增1, 很有可能对内存单元的任何修改都必需通过寄存器(比如典型的RISC指令集 MIPS)

阅读(945) | 评论(0) | 转发(0) |

上一篇：C(不讨论C++)语言中的名字空间

下一篇：C语言的黑暗角落: bit field

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6