The Practice of Programming
Brian W.Kernighan, Rob Pike. The Practice of Programming. Addison-Wesley, 1999
1 Style
2011-03-01 Tue
It is an old observation that the best writers sometimes disregard the rules of rhetoric. When they do so, however, the reader will usually find in the sentence some compensating merit, attained at the cost of the violation. Unless he is certain of doing as well, he will probably do best to follow the rules.
—William Strunk and E.B. White, The Elements of Style
1.1 Names
Use descriptive names for globals, short names for locals. It's also helpful to include a brief comment with the declaration of each global:
int n = 0; //current length of input queue
Global functions, classes, and structures should also have descriptive names that suggest their role in a probram.
Be consistent. Give related things related names that show their relationship and highlight their difference.
Use active names for functions. Function names should be based on active verbs, perhaps followed by nouns:
now = date.getTime();
putchar('\n');
Functions that return a boolean (true or false) value shoud be named so that the return value is unambiguous. Thus
? if (checkoctal(c))...
does not indicate which value is true and which is false, while
if (isoctal(c))..
makes it clear that the function returns true if the argument is octal and false if not.
Be accurate. A name not only labels, it conveys information to the reader. A misleading name can result in mystifying bugs.
1.2
1.3 Expressions and Statements
Use spaces around operators to suggest grouping; more generally, format to help readability.
Indent to show structure. A consistent indentation style is the lowest-energy way to make a program's structure self-evident.
for (n++; n < 100; n++)
field[n] = '\0';
*i = '\0';
return '\n';
Use the natural form for expressions. Write expressions as you might speak them aloud.
Parenthesize to resolve ambiguity. Parentheses specify grouping and can be used to make the intent clear even when they are not required.
When mixing unrelated operators, though, it's a good idea to parenthesize.
while ((c = getchar()) != EOF)
...
if ((x&MASK) == BITS)
...
leap_year = ((y%4 == 0) && (y%100 != 0) || (y%400 == 0);
We also removed some of the blanks: grouping the operands of higher-preceden operators helps the reader to see the structure more quickly.
Break up complex expressions.
? *x += (*xp=(2*k < (n-m) ? c[k+1] : d[k--]));
if (2*k < n-m)
*xp = c[k+1];
else
*xp = d[k--];
*x += *xp;
Be clear. The goal is to write clear code, not clever code.
Some constructs seem to invite abuse. The ?: operator can lead to mysterious code:
? child = (!LC&&!RC) ? 0 : (!LC ? RC : LC);
if (LC == 0 && RC == 0)
child = 0;
else if (LC == 0)
child = RC;
else
child = LC;
The ?: operator is fine for short expressions where it can replace four lines of if-else with one, as in
max = (a > b) ? a : b;
but it is not a general replacement for conditional statements.
Clarity is not the same as brevity. Often the clearer code will be shorter, but it can also be longer. The proper criterion is ease of understanding.
Be careful with side effects. Operators like ++ have side effects: besides returning a value, they also modify an underlying variable. Side effects can be extremely convenient, but they can also cause trouble because the actions of retrieving the value and updating he variable might not happen at the same time.
? array[i++] = i;
If i if initially 3, the array element might be set to 3 or 4.
It's not just increments and decrements that have side effects; I/O is another source of behind-the-scenes action. This example is an attempt to read two related numbers from standard input:
? scanf("%d %d", &yr, &profit[yr]);
It is broken because part of the expression modifies yr and another part uses it. The real issue is that all the arguments to scanf are evaluated before the routine is called, so &profit[yr] will always be evaluated using the old value of yr. The fix is to break up the expression:
scanf("%d", &yr);
scanf("%d", &profit[yr]);
1.4 Consistency and Idioms
Consistency leads to better programs.
Use a consistent indentation and brace style. The specific style is much less important than its consistent application. Pick one style, use it consistently.
When one if imediately follows another, always use braces:
if (month = FEB) {
if (year%4 == 0) {
if (day > 29)
legal = FALSE;
} else {
if (day > 28)
legal = FALSE;
}
}
By the way, if you work on a program you didn't write, preserve the style you find there. The program's consistency is more important than your own, because it makes life easier for those who follow.
Use idioms for consistency. Like natural languages, programming languages have idioms, conventional ways that experienced programmers write common pieces of code. A central part of learning any language is developing a familiarity with its idioms.
One of the most common idioms is the form of a loop.
for (i = 0; i < n; i++)
array[i] = 1.0;
This is not an arbitrary choice. It visits each member of an n-element array indexed from 0 to n-1. It places all the loop control in the for itself, runs in increasing order, and uses the very idiomatic ++ operator to update the loop variable. It leaves the index variable at a known value just beyond the last array element.
In C++ or Java:
for (int i = 0; i < n; i++)
array[i] = 1.0;
Here is the standard loop for walking along a list in C:
for (p = list; p != NULL; p = p->next)
...
For an infinite loop, we prefer
for (;;)
...
Identation should be idiomatic, too.
for (ap = arr; ap < arr+128; ap++)
*ap = 0;
Another common idiom is to nest an assignment inside a loop condition, as in
while ((c = getchar()) != EOF)
putchar(c);
The do-while statement is used much less often than for and while, because it always executes at least once, testing at the bottom of the loop instead of the top. The do-while loop is the right one only when the body of the loop must always be executed at least once.
One advantage of the consistent use of idioms is that it draws attention to non-standard loops, a frequent sign of trouble:
? int i, *iArray, nmemb;
?
? iArray = malloc(nmemb * sizeof(int));
? for (i = 0; i <= nmemb; i++)
? iArray[i] = i;
C and C++ also hae idioms for allocating space for strings and then manipulating it, and code that doesn't use them often harbors a bug:
? char *p, buf[256];
?
? gets(buf);
? p = malloc(strlen(buf));
? strcpy(p, buf);
One should never use gets, since there is no way to limit the amount of input it will read. There is another problem as well: strlen does not count the '\0' that terminates a string, while strcpy copies it. So not enough space is allocated, and strcpy writes past the end of the allocated space. The idiom is
p = malloc(strlen(buf)+1);
strcpy(p, buf);
or
p = new char[strlen(buf)+1];
strcpy(p, buf);
in C++. If you don't see the +1, beware.
Java doesn't suffer from this specific problem, since strings are not represented as null-terminated arrays. Array subscripts are checked as well, so it is not possible to access outside the bounds of an array in Java.
In a real program the return value from malloc, realloc, strdup, or any other allocation routine should always be checked.
Use =else-ifs= for multi-way decisions. Multi-way decisions are idiomatically expressed as a chain of if...else if...else, like this:
if (condition1)
statement1
else if (condition2)
statement2
...
else if (condition,)
statement,
else
default-statement
The conditions are read from top to bottom; at the first condition that is satisfied, the statement that follows is executed, and then the rest of the construct is skipped. The statement part may be a single statement or a group of statements enclosed in braces. The last else handles the "default" situation, where none of the the other alternatives was chosen. This trailing else part may be ommitted if there is no action for the default, although leaving it in with an error message may help to catch conditions that "can't happen."
Align all of the else clauses vertically rather than lining up each else with the corresponding if. Vertical alignment emphasizes that the tests are done in sequence and keeps them from marching off the right sice of the page.
A sequence of nested if statements is often a warning of awkward code, if not outright errors:
? if (argc == 3)
? if ((fin = fopen(argv[1], "r")) != NULL)
? if ((fout = fopen(argv[2], "w")) != NULL) {
? while ((c = getc(fin)) != EOF)
? putc(c, fout);
? fclose(fin);
? fclose(fout);
? } else
? printf("Can't open output file %s\n", argv[2]);
? else
? printf("Can't open input file %s\n", argv[1]);
? else
? printf ("Usage: cp inputfile outputfile \n");
The sequence of ifs requires us to maintain a mental pushdown stack of what tests were made, so that at the appropriate point we can pop them until we determine the corresponding action. Since at most one action will be perfromed, we really want an else if. Changing the order in which the decisions are made leads to a clearer version, in which we have also corrected the resource leak in the original:
if (argc != 3)
printf("Usage: cp inputfile outputfile \n");
else if ((fin = fopen(argv[1], "r")) == NULL)
printf("Can't open input file %s\n", argv[1]);
else if ((fout = fopen(argv[2], "w")) == NULL) {
printf("Can't open output file %s\n", argv[2]);
fclose(fin);
} else {
while ((c = getc(fin)) != EOF)
putc(c, fout);
fclose(fin);
fclose(fout);
}
We read down the test until the first one that is true, do the corresponding action, and continue after the last else. The rule is to follow each decision as closely as possible by its associated action. Or, to put it another way, each time you make a test, do something.
1.5 Function Macros
With modern machines and compilers, the drawbacks of function macros outweigh their benefits.
Avoid function macros.
1.6 Magic Numbers
Magic numbers are the constants, array sizes, character positions, conversion factors, and other literal numeric values that appear in programs.
Give names to magic numbers. As a guideline, any number other than 0 or 1 is likely to be magic and should have a name of its own. A raw number in program source gives no indication of its importance or derivation, making the program harder to understand and modify.
Define number as constants, not macros. In C and C++, integer constants can be defined with an enum statement:
enum {
MINROW = 1, /* top edge */
MINCOL = 1, /* left edge */
MAXROW = 24, /* bottom edge (<=) */
MAXCOL = 80, /* right edge (<=) */
NLET = 26, /* size of alphabet */
HEIGHT = MAXROW - 4, /* height of bars */
WIDTH = (MAXCOL - 1)/NLET /* width of bars */
};
Constants of any type can be declared with const in C++:
const int MAXROW = 24, MAXCOL = 80;
or final in Java:
static final int MAXROW = 24, MAXCOL = 80;
C also has const values but they cannot be used as array bounds, so the enum statement remains the method of choice in C.
Use character constants, not integers.
Best is to use the library to test the properties of characters:
if (isupper(c))
...
in C or C++, or
if (Character.isUpperCase(c))
...
in Java.
A related issue is that the number 0 appears often in programs, in many contexts. don't write
? str = 0;
? name[i] = 0;
? x = 0;
but rather:
str = NULL;
name[i] = '\0';
x = 0.0;
We prefer to use different explicit constants, reserving 0 for a literal integer zero, because they indicate the use of the value and thus provide a bit of documentation.
Use the language to calculate the size of an object. Don't use an explicit size for any data type; use sizeof(int) instead of 2 or 4, for instance. For similar reasons, sizeof(array[0]) may be better than sizeof(int) because it's one less thing to change if the type of the array changes.
The sizeof operator is sometimes a convenient way to avoid inventing names for the numbers that determine array sizes.
char buf[1024];
fgets(but, sizeof(buf), stdin);
Java arrays have a length field that gives the number of elements:
char buf[] = new char[1024];
for (int i = 0; i < buf.length; i++)
...
1.7 Comments
Comments are meant to help the reader of a program. They do not help by saying the code already plainly says, or by contradicting the code, or by distracting the reader with elaborate typographical displays. The best comments aid the understanding of a program by briefly pointing out salient details or by providing a larger-scale view of the proceedings.
Don't belabor the obvious. Comments shouldn't report self-evident information, such as the fact that i++ has incremented i. Here are some of of our favorite worthless comments:
? /*
? * default
? */
? default:
? break;
? /* return SUCCESS */
? return SUCCESS;
? zerocount++; /* Increment zero entry counter */
? /* Initialize "total" to "number-received" */
? node->total = node->number-received;
All of these comments should be deleted; they're just clutter.
Comments should add something that is not immediately evident from the code, or collect into one place information that is spread through the source. When something subtle is happening, a comment may clarify, but if the actions are obviou already, restating them in words is pointless:
? while ((c = getchar() != EOF && isspace(c))
? ; /* skip white space */
? if (c == EOF) /* end of file */
? type = endoffile;
? else if (c == '(') /* left paren */
? type = leftparen;
? else if (c == ';') /* semicolon */
? ...
These comments should also be deleted, since the well-chosen names already convey the information.
Comment functions and global data. We comment functions, global variables, constant definitions, fields in structures and classes, and anything else where a brief summary can aid understanding.
Global variables have a tendency to crop up intermittently throughout a program; a comment serves as a reminder to be referred to as needed.
struct State { /* prefix + suffix list */
char *pref[NPREF]; /* prefix words */
Suffix *suf; /* list of suffixes */
State *next; /* next in has table */
A comment that introduces each function sets the stage for reading the code itself. If the code isn't too long or technical, a single line is enough:
//random: return an integer in the range [0..r-1]
int random(int r)
{
return (int)(Math.floor(Math.random()*r));
}
Sometimes code is genuinely difficult, perhaps because the algorithm is complicated or the data structure are intricate. In that case, a comment that points to a source of understanding can aid the reader. It may also be valuable to suggest why particular decisions were made.
/*
* idct: Scaled integer implementation of
* Inverse two dimensional 8*8 Discrete Cosine Transform,
* Chen-Wang algorithm (IEEE ASSP-32, pp 803-816, Aug 1984)
* 32-bit integer arithmetic
* Coefficients extended to 12 bits for IEEE 1180-1990 compliance
*/
static void icde(int b[8*8])
{
...
}
This helpful comment cites the reference, briefly describes the data used, indicates the performance of the algorithm, and tells how and why the original algorithm has been modified.
Don't comment bad code, rewrite it. Comment anything unusual or potentially confusing, but when the comment outweighs the code, the code probably needs fixing. This example uses a long, muddled comment and a conditionally-compiled debugging print statement to explain a single statement:
? /* If 'result' is 0 a match was found so return true (non-zero).
? Otherwise, 'result' is non-zero so return false (zero). */
? #ifdef DEBUG
? printf("*** is word returns !result = %d\n", !result);
? fflush(stdout);
? #endif
?
? return (!result);
Negations are hard to understand and should be avoided. Part of the problem is the uninformative variable name, result. A more descriptive name, mathfound, makes the comment unnecessary and cleans up the print statement, too.
#ifdef DEBUG
printf("*** isword returns machfound = %d\n", matchfound);
fflush(stdout);
#endif
Don't contradict the code. When you change code, make sure the comments are still accurate.
Comments should not only agree with code, they should support it.
time(&now);
strcpy(date, ctime(&now));
/* ctime() puts newline at the end of string; delete it */
date[strlen(date)-1] = '\0';
The last expression is the C idiom for removing the last character from a string.
Clarify, don't confuse. Comments are supposed to help readers over the hard parts, not create more obstacles.
/* strcmp: return < 0 if s1 0 if s1>s2, 0 if equal */
/* ANSI C, section 4.11.4.2 */
int strcmp(const char *s1, const char *s2)
{
...
}
strcmp implements a standard function, its comment can help by summarizing the behavior and telling us where the definition originates; that's all that's needed.
Comments are meant to help a ready understand pans of the program that are not readily understood from the code itself. As much as possible, write code that is easy to understand; the better you do this, the fewer comments you need. Good code needs fewer comments than bad code.
1.8 Why Bother?
The answer is that well-written code is easier to read and to understand, almost surely has fewer errors, and is likely to be smaller than code that has been carelessly tossed together and never polished.
The key observation is that good style should be a matter of habit. If you think about style as you write code originally, and if you take the time to revise and improve it, you will develop good habits. Once they become automatic, your subconscious will take care of many of the details for you, and even the code you produce under pressure will be better.
阅读(792) | 评论(0) | 转发(0) |