SUMMARY OF THE RULES
The following list restates each rule from Chapters 4 and 5 and then briefly summarizes the major points made in the text. A list of the names of the rules can be found in Section 7.2 on page 110.
SPACE-FOR-TIME RULES
Space-For-Time Rule 1-Data Structure Augmentation: The time required for common operations on data can often be reduced by augmenting the structure with extra information or by changing the information within the structure so that it can be accessed more easily. (Page 39.)
- Reference counters facilitate garbage collection by keeping additional information in dynamically allocated nodes.
- Hints augment data structures by keeping a fast but possibly inaccurate structure along with a slow but robust structure.
Space-For-Time Rule 2-Store Precomputed Results: The cost of recomputing an expensive function can be reduced by computing the function only once and storing the results. Subsequent requests for the function are then handled by table lookup rather than by computing the function. (Page 40.)
- Peterson stored the value of evaluated board positions to reduce the time of a game playing program from 27.10 seconds to 0.18 seconds.
- A procedure for computing Fibonacci numbers can be replaced by a table of the numbers.
- Stu Feldman precomputed the number of ones in all eight-bit strings to reduce run time from over a week to less than two hours.
Space-For-Time Rule 3-Caching: Data that is accessed most often should be the cheapest to access. (Page 42.)
- Jalics found that caching the last element retrieved from a table reduced the access cost from 2004 instructions to 4 instructions in 99% of the queries.
- Chris Van Wyk's storage allocator cached the most popular kind of node and reduced the system run time by over fifty percent; Peter Deutsch cached activation records in an allocator and reduced run time by thirty percent.
- In implementing a dictionary, keep most of the dictionary on disk but cache the most common words in core.
- Rick Cattell cached recently-used tuples in a database system to reduce the time of an access from 8 milliseconds to 1.4 milliseconds.
- Caching can "backfire" and increase the run time of a program if locality is not present in the underlying data.
Space-For-Time Rule 4-Lazy Evaluation: The strategy of never evaluating an item until it is needed avoids evaluations of unnecessary items. (Page 43.)
- In building a table of Fibonacci numbers, only compute the numbers actually used.
- Al Aho evaluated the elements of a table as they were needed and reduced the run time of a program from 30 seconds to less than half a second.
- Brian Kernighan reduced the run time of a document formatter by twenty percent by calculating the width of the current line only as needed rather than for every input character.
TIME-FOR-SPACE RULES
Time-For-Space Rule 1-Packing: Dense storage representations can decrease storage costs by increasing the time required to store and retrieve data. (Page 45.)
- Storing integers in one decimal digit per eight-bit byte, two digits per byte, and in binary format represent three levels of packing.
- The space of a database system could be reduced by one-third by packing three integers (between 0 and 1000) in two 16-bit words.
- John Laird reduced the time required to read a file of real numbers by a factor of over 80 by packing the file.
- Stu Feldman found that by unpacking a table he increased the data space slightly but decreased the code space by over four thousand words
- Overlaying reduces data space by storing data items that are never simultaneously active in the same memory space.
- Code overlaying reduces code space by using the same storage for routines that are never simultaneously needed. Many operating systems provide this service automatically in their virtual memory systems.
Time-For-Space Rule 2-Interpreters: The space required to represent a program can often be decreased by the use of interpreters in which common sequences of operations are represented compactly. (Page 47.)
- Finite State Machines (FSM's) can be implemented by small tables; they are easy to define, code, prove correct, and maintain.
- Brooks describes how an interpreter led to small space requirements for a console interpreter, and how the time spent in decoding a dense representation of a FORTRAN compiler was paid for by drastically reduced input and output costs.
- In some systems the programmer should use the interpreter provided by the underlying machine architecture and "compile" common operations into machine code.
LOOP RULES
Loop Rule 1-Code Motion Out of Loops: Instead of performing a certain computation in each iteration of a loop, it is better to perform it only once, outside the loop. (Page 52.)
- Moving the calculation of a constant factor outside a for loop reduced its time from 13.8N microseconds to 7.9N microseconds.
- Code cannot be moved out of loops if it has side effects that are desired on every iteration.
Loop Rule 2-Combining Tests: An efficient inner loop should contain as few tests as possible, and preferably only one. The programmer should therefore try to simulate some of the exit conditions of the loop by other exit conditions. (Page 53.)
- Adding a sentinel in the last element of an unsorted vector reduced the time to search it from 7.3C to 4.1C microseconds.
- Sentinels can decrease the robustness of a program. Improper use of a sentinel caused a C compiler to generate non-reentrant code; the bug surfaced rarely, but was fatal in those circumstances.
- Sentinels are a common application of Loop Rule 2: we place a sentinel at the boundary of a data structure to reduce the cost of testing whether our search has exhausted the structure.
- Bob Sproull described how the lexical analyzer of the SAIL compiler used a control character at the end of the input buffer as a sentinel to avoid testing for end-of-buffer on each input character.
- Combining tests in the sequential search of a sorted array increased the run time from 6.8C microseconds to 7.3C microseconds (due to a system-dependent peculiarity); using sentinels finally reduced the search time to 4.1C microseconds.
- Bob Sproull described how three tests could be combined into one to increase the speed of the inner loop of a screen editor.
Loop Rule 3-Loop Unrolling: A large cost of some short loops is in modifying the loop indices. That cost can often be reduced by unrolling the loop. (Page 56.)
- Unrolling a loop to sum an array of ten real numbers reduced the run time from 63.4 microseconds to 22.1 microseconds.
- Unrolling the loop of a sequential search reduced its time from 4.1C microseconds to 3.4C microseconds.
Loop Rule 4-Transfer-Driven Loop Unrolling: If a large cost of an inner loop is devoted to trivial assignments, then those assignments can often be removed by repeating the code and changing the use of variables. Specifically, to remove the assignment I: = J, the subsequent code must treat J as though it were I. (Page 59.)
- Unrolling the inner loop of a routine for Fibonacci numbers reduced its time from 273 microseconds to 143 microseconds.
- Knuth used unrolling to decrease the time of inserting an element into a linked list by 16 percent.
Loop Rule 5-Unconditional Branch Removal: A fast loop should contain no unconditional branches. An unconditional branch at the end of a loop can be removed by "rotating" the loop to have a conditional branch at the bottom. (Page 62.)
- This technique is applicable only in low-level languages.
Loop Rule 6-Loop Fusion: If two nearby loops operate on the same set of elements, then combine their operational parts and use only one set of loop control operations. (Page 63.)
- To find the maximum and minimum elements in an array, we make only one iterative pass through the array.
LOGIC RULES
Logic Rule 1-Exploit Algebraic Identities: If the evaluation of a logical expression is costly, replace it by an algebraically equivalent expression that is cheaper to evaluate. (Page 66.)
- Simple optimizations are often done by compilers; programmers must be careful that a change of this type does not result in slower code.
- An algebraic identity allowed us to remove the square root in Fragment A2 to yield Fragment A3; this gave a speedup of almost a factor of two.
Logic Rule 2-short-circuiting Monotone Functions: If we wish to test whether some monotone nondecreasing function of several variables is over a certain threshold, then we need not evaluate any of the variables once the threshold has been reached. (Page 67.)
- A simple application is evaluating and and or: to evaluate A and B we need not test B if A is false.
- Short-circuiting the distance evaluation in Fragment A5 reduced the time of Fragment A6 by forty percent.
- A more complex application of this rule exits from a loop as soon as the purpose of the loop has been accomplished.
Logic Rule 3-Reordering Tests: Logical tests should be arranged such that inexpensive and often successful tests precede expensive and rarely successful tests. (Page 69.)
- This was used in testing the character types in a lexical analyzer.
- This rule is used to push an expensive test inside a cheaper test.
- Peter Weinberger used a single-line test in a Scrabble program that was able to avoid an expensive test in over 99% of the cases.
Logic Rule 4-Precompute Logical Functions: A logical function over a small finite domain can be replaced by a lookup in a table that represents the domain. (Page 72.)
- Testing character types in a lexical analyzer is often implemented by a table of character types indexed by characters; Brian Kernighan reports that this reduced the run time of some programs by thirty to forty percent.
- David Moon designed a fast interpreter for a PDP-8 that had one table entry for each of the 4096 possible instructions.
Logic Rule 5-Boolean Variable Elimination: We can remove boolean variables from a program by replacing the assignment to a boolean variable V by an if-then-else statement in which one branch represents the case that V is true and the other represents the case that V is false. (This generalizes to case statements and other logical control structures.) (Page 73.)
- This rule usually decreases time slightly (say, less than 25 percent), but greatly increases code space.
- More complex applications of this rule remove boolean variables from data structures by keeping separate structures for the true and false records.
PROCEDURE RULES
Procedure Rule 1-Collapsing Procedure Hierarchies: The run times of the elements of a set of procedures that (nonrecursively) call themselves can often be reduced by rewriting procedures in line and binding the passed variables. (Page 75.)
- Rewriting the distance procedure in line reduced the run time of Fragment A4 from 21.2N2 microseconds to 14.0N2 microseconds.
- Dennis Ritchie increased the speed of a macro processor by a factor of four by writing procedures in line.
Procedure Rule 2-Exploit Common Cases: Procedures should be organized to handle all cases correctly and common cases efficiently. (Page 76.)
- Mary Shaw used this technique to increase the efficiency of the register SAVE and UNSAVE operations on the Rice University Computer; efficiently handling the special case of operating on all possible registers reduced the run time of some programs by thirty percent.
- This rule encourages us to remove unneeded generality from subroutines; Chris Van Wyk increased the speed of a program by a factor of three by using a special-purpose procedure for intersecting line segments.
- We should organize systems so that efficient cases are common cases; by ensuring that bit fields always start in the same positions in words, Rob Pike increased the efficiency of a raster graphics operation by a factor of two.
Procedure Rule 3-Coroutines: A multiple-pass algorithm can often be turned into a single-pass algorithm by use of coroutines. (Page 79.)
- An intermediate file that is written sequentially and then read sequentially can often be removed by linking together the two programs as coroutines; this increases space requirements but reduces costly input/output operations.
Procedure Rule 4-Transformations on Recursive Procedures: The run time of recursive procedures can often be reduced by applying the following transformations: (Page 80.)
- Code the recursion explicitly by use of a program stack.
- If the final action of a procedure P is to call itself recursively, replace that call by a goto to its first statement; this is usually known as removing tail recursion. That goto can often be transformed into a loop.
- If a procedure contains only one recursive call on itself, then it is not necessary to store the return address on the stack.
- It is often more efficient to solve small subproblems by use of an auxiliary procedure, rather than by recurring down to problems of size zero or one.
Procedure Rule 5-Parallelism: A program should be structured to exploit as much of the parallelism as possible in the underlying hardware. (Page 80.)
- Kulsrud, Sedgewick, Smith, and Szymanski used techniques at many design levels to build a Quicksort program on a Cray-1 that can sort 800,000 elements in less than 1.5 seconds.
EXPRESSION RULES
Expression Rule 1-Compile-Time Initialization: As many variables as possible should be initialized before program execution. (Page 82.)
- John Laird preprocessed data unchanged between runs of a program to reduce the program's run time from 120 seconds to 4 seconds.
Expression Rule 2-Exploit Algebraic Identities: If the evaluation of an expression is costly, replace it by an algebraically equivalent expression that is cheaper to evaluate. (Page 82.)
- An algebraic identity yields a fast range test that compiler writers can use on two's-complement architectures.
- We can often multiply or divide by powers of two by shifting left or right.
- Strength reduction on a loop that iterates through the elements of an array replaces a multiplication by an addition. This technique generalizes to a large class of incremental algorithms.
- David Jefferson used an incremental algorithm to reduce the number of characters sent to a terminal by a factor of over five.
Expression Rule 3-Common Subexpression Elimination: If the same expression is evaluated twice with none of its variables altered between evaluations, then the second evaluation can be avoided by storing the result of the first and using that in place of the second. (Page 84.)
- We cannot eliminate the common evaluation of an expression with important side-effects.
Expression Rule 4-Pairing Computation: If two similar expressions are frequently evaluated together, then we should make a new procedure that evaluates them as a pair. (Page 84.)
- Knuth reported that both the sine and the cosine of a given angle can be computed together for 1.5 times the cost of computing either individually. Similarly, the maximum and the minimum elements of a vector can be found at about 1.5 times the cost of finding either one.
Expression Rule 5-Exploit Word Parallelism: Use the full word width of the underlying computer architecture to evaluate expensive expressions. (Page 85.)
- When we OR two 32-bit sets together giving as output their 32-bit union, we are performing 32 operations in parallel.
- Stu Feldman's program to count one bits in a word (described in Space-For-Time Rule 1) and Peter Weinberger's Scrabble program (described in Logic Rule 3) both use this rule.
From Writing Efficient Programs by Jon Bentley; Prentice-Hall 1982 ISBN 0-13-970244-X
阅读(427) | 评论(0) | 转发(0) |