分类: C/C++
2013-01-21 23:43:32
原标题:
Here’s a short quiz. What will the following code print:#includeusing namespace std; class Foo { public: Foo(const char* s = "") { cerr << "Constructing Foo with " << s << endl; } }; void somefunc() { static Foo funcstatic("funcstatic"); Foo funcauto("funcauto"); } static Foo glob("global"); int main() { cerr << "Entering main\\n"; somefunc(); somefunc(); somefunc(); return 0; }
Try to think about it for a moment before reading on. Foo is a dummy class with the sole purpose of demonstrating when its constructor is being called. There are a few Foo instances here: one global, one function static (by which I mean static in a function scope) and one function local (automatic).
Recently I ran into (a variation of) this code and was surprised that its output is:
Constructing Foo with global Entering main Constructing Foo with funcstatic Constructing Foo with funcauto Constructing Foo with funcauto Constructing Foo with funcauto
What’s surprising here is the construction of funcstatic happening after entering main. Actually, it’s happening when somefunc is first called. Why was I surprised? Because I always kind-of assumed that function static variables are handled similarly to global static variables, except their visibility is limited only to the function. While this is true in C, it’s only partially true in C++, and here’s why.
In C++, variables not only have to be initialized – sometimes, they also have to be constructed. While for POD (Plain Old Data) types the behavior is C-like (the compiler just writes the initialization value into the .data segment, no special code required), for types with custom constructors this can’t work. Some code has to be generated to call these constructors.
It turns out that in case of function static variables, this code can be placed in the function and thus is executed when the function is first called. This behavior is actually allowed by the C++ standard. Here’s an excerpt from section 6.7 of a working draft (N1095) of the current C++ standard (C++98):
The zero-initialization (8.5) of all local objects with static storage duration (3.7.1) is performed before any other initialization takes place. A local object of POD type (3.9) with static storage duration initialized with constant-expressions is initialized before its block is first entered. An implementation is permitted to perform early initialization of other local objects with static storage duration under the same conditions that an implementation is permitted to statically initialize an object with static storage duration in namespace scope (3.6.2). Otherwise such an object is initialized the first time control passes through its declaration; such an object is considered initialized upon the completion of its initialization.
Highlight is mine. What this means, less formally, is that while the compiler is permitted to invoke the constructors of function static variables at global scope, it’s free to do this in the function if it wants.
And apparently, most modern compilers indeed choose to construct function static objects when the function is first called. This makes sense as an optimization – calling too many constructors before main runs can have a . Not to mention that dependencies between statically constructed objects are one of the C++ has to offer.
But herein lies a problem: this construction of static function variables is not thread safe! If somefunc is being called from multiple threads, it may so happen that the constructor of funcstatic will be called multiple times. After all, being static, funcstatic is shared between all threads. The C++ standard doesn’t protect us from this happening – it doesn’t even acknowledge the existence of threads (this is C++98 we’re talking about).
So keep this in mind: such code is not thread safe – you can not assume that in the presence of multiple threads the function static variable will be constructed only once. It is the job of the programmer to guarantee this won’t happen.
This is the main point I wanted to make in this post. The rest is going to examine in more detail the code generated by popular compilers for this scenario and discuss the implications.
Let’s start with MS Visual C++ 2008. Here’s the disassembly of somefunc, skipping the function prologue:
static Foo funcstatic("funcstatic"); 00E314FD mov eax,dword ptr [$S1 (0E3A148h)] 00E31502 and eax,1 00E31505 jne somefunc+71h (0E31531h) 00E31507 mov eax,dword ptr [$S1 (0E3A148h)] 00E3150C or eax,1 00E3150F mov dword ptr [$S1 (0E3A148h)],eax 00E31514 mov dword ptr [ebp-4],0 00E3151B push offset string "funcstatic" (0E3890Ch) 00E31520 mov ecx,offset funcstatic (0E3A14Ch) 00E31525 call Foo::Foo (0E31177h) 00E3152A mov dword ptr [ebp-4],0FFFFFFFFh Foo funcauto("funcauto"); 00E31531 push offset string "funcauto" (0E38900h) 00E31536 lea ecx,[ebp-11h] 00E31539 call Foo::Foo (0E31177h)
Here’s what this does: a special flag is being kept in memory (in address 0x0E3A148 for this particular run). Its goal is to make sure the constructor of funcstatic is only called once. The code fetches the flag into eax and looks at its lowest bit. If that bit is already turned on, it just skips the call and goes to the next line. Otherwise, it places 1 in the lowest bit and calls the constructor.
The idea here is obvious – this flag is used to ensure the constructor is only being called once. Note how it blissfully ignores the existence of threads. Suppose two threads – A and B enter somefunc simultaneously. Both can check the flag at the same time, see it’s still 0 and then call the constructor. Nothing here prevents that from happening. And this is all good and fine according to the C++ standard.
With GCC, however, things get more interesting. Here’s the same function compiled with g++ -O0 -g:
0000000000400a9d <_Z8somefuncv>: 400a9d: 55 push rbp 400a9e: 48 89 e5 mov rbp,rsp 400aa1: 48 83 ec 40 sub rsp,0x40 400aa5: b8 a8 21 60 00 mov eax,0x6021a8 400aaa: 0f b6 00 movzx eax,BYTE PTR [rax] 400aad: 84 c0 test al,al 400aaf: 75 76 jne 400b27 <_Z8somefuncv+0x8a> 400ab1: bf a8 21 60 00 mov edi,0x6021a8 400ab6: e8 cd fd ff ff call 400888 <__cxa_guard_acquire@plt> 400abb: 85 c0 test eax,eax 400abd: 0f 95 c0 setne al 400ac0: 84 c0 test al,al 400ac2: 74 63 je 400b27 <_Z8somefuncv+0x8a> 400ac4: c6 45 df 00 mov BYTE PTR [rbp-0x21],0x0 400ac8: be aa 0c 40 00 mov esi,0x400caa 400acd: bf b0 21 60 00 mov edi,0x6021b0 400ad2: e8 89 00 00 00 call 400b60 <_ZN3FooC1EPKc> 400ad7: c6 45 df 01 mov BYTE PTR [rbp-0x21],0x1 400adb: bf a8 21 60 00 mov edi,0x6021a8 400ae0: e8 03 fe ff ff call 4008e8 <__cxa_guard_release@plt> 400ae5: eb 40 jmp 400b27 <_Z8somefuncv+0x8a> 400ae7: 48 89 45 c8 mov QWORD PTR [rbp-0x38],rax 400aeb: 48 89 55 d0 mov QWORD PTR [rbp-0x30],rdx 400aef: 8b 45 d0 mov eax,DWORD PTR [rbp-0x30] 400af2: 89 45 ec mov DWORD PTR [rbp-0x14],eax 400af5: 48 8b 45 c8 mov rax,QWORD PTR [rbp-0x38] 400af9: 48 89 45 e0 mov QWORD PTR [rbp-0x20],rax 400afd: 0f b6 45 df movzx eax,BYTE PTR [rbp-0x21] 400b01: 83 f0 01 xor eax,0x1 400b04: 84 c0 test al,al 400b06: 74 0a je 400b12 <_Z8somefuncv+0x75> 400b08: bf a8 21 60 00 mov edi,0x6021a8 400b0d: e8 06 fe ff ff call 400918 <__cxa_guard_abort@plt> 400b12: 48 8b 45 e0 mov rax,QWORD PTR [rbp-0x20] 400b16: 48 89 45 c8 mov QWORD PTR [rbp-0x38],rax 400b1a: 48 63 45 ec movsxd rax,DWORD PTR [rbp-0x14] 400b1e: 48 8b 7d c8 mov rdi,QWORD PTR [rbp-0x38] 400b22: e8 11 fe ff ff call 400938 <_Unwind_Resume@plt> 400b27: 48 8d 7d ff lea rdi,[rbp-0x1]
What’s going on here? It turns out that , GCC generates "guard" calls that ensure multi-threaded safety for this kind of initialization. To better understand what’s going on in the code above, there’s a relevant section in the Itanium C++ ABI (which GCC follows) . GCC also allows to disable these guards by passing -fno-threadsafe-statics flag during compilation. With this flag, the code generated by GCC for our code sample is quite similar to the one generated by MSVC.
On one hand, this is nice of GCC to do. On the other hand, it’s one of those things that introduce insidious portability problems. Develop the code for GCC and everything is peachy for function static constructors – no multithreading problems because of the guard code. Then port the code to Windows and start witnessing intermittent failures due to races between threads. Not fun.
The only solution is, of course, to write code that adheres to the C++ standard and doesn’t make assumptions that must not be made.