全部博文(230)
分类: C/C++
2008-07-23 13:04:30
Even relatively new C programmers have no trouble reading simple C declarations such as
int foo[5]; // foo is an array of 5 ints
char *foo; // foo is a pointer to char
double foo(); // foo is a function returning a double
but as the declarations get a bit more involved, it's more difficult to know exactly what you're looking at.
char *(*(**foo[][8])())[]; // huh ?????
It turns out that the rules for reading an arbitrarily-complex C variable declaration are easily learned by even beginning programmers (though how to actually use the variable so declared may be well out of reach).
This Tech Tip shows how to do it.
In addition to one variable name, a declaration is composed of one "basic type" and zero or more "derived types", and it's crucial to understand the distinction between them.
The complete list of basic types is:
• char • signed char • unsigned char • short • unsigned short • int • unsigned int • long • unsigned long • float • double • void • struct tag • union tag • enum tag • long long • unsigned long long • long double ANSI/ISO C only
A declaration can have exactly one basic type, and it's always on the far left of the expression.
The "basic types" are augmented with "derived types", and C has three of them:
A derived type always modifies something that follows, whether it be the basic type or another derived type, and to make a declaration read properly one must always include the preposition ("to", "of", "returning"). Saying "pointer" instead of "pointer to" will make your declarations fall apart.
It's possible that a type expression may have no derived types (e.g., "int i" describes "i is an int"), or it can have many. Interpreting the derived types is usually the sticking point when reading a complex declaration, but this is resolved with operator precedence in the next section.
Almost every C programmer is familiar with the operator precedence tables, which give rules that say (for instance) multiply and divide have higher precedence than ("are preformed before") addition or subtraction, and parentheses can be used to alter the grouping. This seems natural for "normal" expressions, but the same rules do indeed apply to declarations - they are type expressions rather than computational ones.
The "array of" [] and "function returning" () type operators have higher precedence than "pointer to" *, and this leads to some fairly straightforward rules for decoding.
Always start with the variable name:
foo is ...
and always end with the basic type:
foo is ... int
The "filling in the middle" part is usually the trickier part, but it can be summarize with this rule:
"go right when you can, go left when you must"
Working your way out from the variable name, honor the precedence rules and consume derived-type tokens to the right as far as possible without bumping into a grouping parenthesis. Then go left to the matching paren.
We'll start with a simple example:
long **foo[7];
We'll approach this systematically, focusing on just one or two small
part as we develop the description in English. As we do it, we'll
show the focus of our attention in red,
and strike out the parts we've finished with.
This completes the declaration!
To really test our skills, we'll try a very complex declaration that very well may never appear in real life (indeed: we're hard-pressed to think of how this could actually be used). But it shows that the rules scale to very complex declarations.
We have no idea how this variable is useful, but at least we can describe the type correctly.
The C standard describes an "abstract declarator", which is used when a type needs to be described but not associated with a variable name. These occur in two places -- casts, and as arguments to sizeof -- and they can look intimidating:
int (*(*)())()
To the obvious question of "where does one start?", the answer is "find where the variable name would go, then treat it like a normal declaration". There is only one place where a variable name could possibly go, and locating it is actually straightforward. Using the syntax rules, we know that:
Looking at the example, we see that the rightmost "pointer to" sets one boundary, and the leftmost "function returning" sets another one:
int (*(* • ) • ())()
The red • indicators show the only two places that could possibly hold the variable name, but the leftmost one is the only one that fits the "inside the grouping parens" rule. This gives us our declaration as:
int (*(*foo)())()
which our "normal" rules describe as:
foo is a pointer to function returning pointer to function returning int
Not all combinations of derived types are allowed, and it's possible to create a declaration that perfectly follows the syntax rules but is nevertheless not legal in C (e.g., syntactically valid but semantically invalid). We'll touch on them here.
void *foo; // legal
void foo(); // legal
void foo; // not legal
void foo[]; // not legal
On the Windows platform, it's common to decorate a function declaration with an indication of its calling convention. These tell the compiler which mechanism should be used to call the function in question, and the method used to call the function must be the same one which the function expects. They look like:
extern int __cdecl main(int argc, char **argv);
extern BOOL __stdcall DrvQueryDriverInfo(DWORD dwMode, PVOID pBuffer,
DWORD cbBuf, PDWORD pcbNeeded);
These decorations are very common in Win32 development, and are straightforward enough to understand. More information can be found in
Where it gets somewhat more tricky is when the calling convention must be incorporated into a pointer (including via a typedef), because the tag doesn't seem to fit into the normal scheme of things. These are often used (for instance) when dealing with the LoadLibrary() and GetProcAddress() API calls to call a function from a freshly-loaded DLL.
We commonly see this with typedefs:
typedef BOOL (__stdcall *PFNDRVQUERYDRIVERINFO)(
DWORD dwMode,
PVOID pBuffer,
DWORD cbBuf,
PDWORD pcbNeeded
);
...
/* get the function address from the DLL */
pfnDrvQueryDriverInfo = (PFNDRVRQUERYDRIVERINFO)
GetProcAddress(hDll, "DrvQueryDriverInfo")
The calling convention is an attribute of the function, not the pointer, so in the usual reading puts it after the pointer but inside the grouping parenthesis:
BOOL (__stdcall *foo)(...);
is read as:
foo is a pointer
to a __stdcall function
returning BOOL.