C++,python,热爱算法和机器学习
全部博文(1214)
分类:
2011-03-08 13:39:39
return n == 1 ? 1.0 : different_birthdays(n-1) * (365.0-(n-1))/365.0;
int n;for (n = 1; n <= 365; n++)printf("%3d: %e\n", n, 1.0-different_birthdays(n));
The Birthday Paradox Philip J. ErdelskyJuly 4, 2001 |
Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.
BUT WHAT ABOUT LEAP YEAR?
The original problem can be solved with a slide rule, which is exactly what I did when I first heard it many, many years ago.
If we add February 29 to the mix, it gets considerably more complicated. In this case, we make some additional assumptions:
Hence the probability that a randomly selected person was born on February 29 is 0.25/365.25, and the probability that a randomly selected person was born on another specified day is 1/365.25.
The probability that N persons, possibly including one born on February 29, have distinct birthdays is the sum of two probabilities:
The probabilities add because the two cases are mutually exclusive.
Now each probability can be expressed recursively:
A program to display the probabilities goes something like this:
The result is something like this:
As expected, the probabilities are slightly lower, because there is a lower probability of matching birthdays when there are more possible birthdays. But the smallest number with probability greater than 0.5 is still 23.
Of course, a mathematical purist may argue that leap years don't always come every four years, so the calculations need further modification. However, the last quadrennial year that wasn't a leap year was 1900, and the next one will be 2100. The number of persons now living who were born in 1900 is so small that I think our approximation is valid for all practical purposes. But you are welcome to make the required modifications if you wish.
The Birthday Paradox has implications beyond the world of parlor betting. A standard technique in data storage is to assign each item a number called a hash code. The item is then stored in a bincorresponding to its hash code. This speeds up retrieval because only a single bin must be searched. The Birthday Paradox shows that the probability that two or more items will end up in the same bin is high even if the number of items is considerably less than the number of bins. Hence efficient handling of bins containing two or more items is required in all cases.