分类:
2008-09-03 19:53:18
3. Given a tape containing 1,050,000 twenty-bit integers, how
can you find one that appears at least twice? (Explain, first,
why this must happen.) You may assume:
a. two tape drives
b. one tape drive
Solution:
This problem is somehow similar to the previous one.
a. [Solution in the book, page 165]
Binary search finds an element that occurs at least twice by recursively
searching the subinterval that contains more than half of the integers.
My original solution did not guarantee that the number of integers is halved
in each iteration, so the worst-case run time of its log2 N passes was
proportional to N log N. Jim Saxe of Carnegie-Mellon University reduced
that to linear time by observing that the search can avoid carrying too
many duplicates. When his search knows that a duplicate must be in a current
range of M integers, it will only store M + I integers on its current work
tape; if more integers would have gone on the tape, his program merely
discards them. Although his method frequently ignores input variables,
its strategy is conservative enough to ensure that it finds at least one
duplicate.
b. We can do a traversal of the tape at most 4 times, first checking to
see how many times does each 5-bit binary word appear in the last five bits
of each word. At least one of them (call it "edcba") must have at least
1050000/32 > 2^15 appearances (this happens due to Dirichlet's/Pigeonhole
Principle)
Scan the tape a second time, looking only at the words ending in "edcba".
This time, count for how many words does each 5-bit word appears as a "9-through-5"
position substring. Obviously, also from Pigeonhole Principle, there must exist
one 5-bit word "jihgf" having its counter strictly larger than 2^15 / 32 = 2^10.
(Obviously, then, the number of words ending in "jihgfedcba" on the tape is
the actual value of this counter.)
Scan the tape two more times, looking at the other 5-long groups of bits of
each of the "remaining" words, namely bits "14-through-10", and "19-through-15".
(Note that we scan the tape completely every time, but we only check fewer
words for substrings).