Chinaunix首页 | 论坛 | 博客
  • 博客访问: 4565374
  • 博文数量: 1214
  • 博客积分: 13195
  • 博客等级: 上将
  • 技术积分: 9105
  • 用 户 组: 普通用户
  • 注册时间: 2007-01-19 14:41
个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文(1214)

文章存档

2021年(13)

2020年(49)

2019年(14)

2018年(27)

2017年(69)

2016年(100)

2015年(106)

2014年(240)

2013年(5)

2012年(193)

2011年(155)

2010年(93)

2009年(62)

2008年(51)

2007年(37)

分类:

2011-03-08 13:39:39

生日悖论来源于概率,n个人中每个人生日都不同的概率:
.bar p(n) = 1 .cdot .left(1-.frac{1}{365}.right) .cdot .left(1-.frac{2}{365}.right)  .cdots .left(1-.frac{n-1}{365}.right) =.frac{364}{365} .cdot .frac{363}{365} .cdot .frac{362}{365} .cdots .frac{365-n+1}{365}
因为第二个人不能跟第一个人有相同的生日(概率是364/365), 第三个人不能跟前两个人生日相同(概率为363/365),依此类推。用阶乘可以写成如下形式:
{ 365! .over 365^n (365-n)! }
p(n)表示 n个人中至少2人生日相同的概率:
 p(n) = 1 - .bar p(n)=1 - { 365! .over 365^n (365-n)! }


初級或然率(機率)與統計學課程裡,最為人津津樂道的就是生日問題 (Birthday Problem):探討 N 個人裡,隨便選兩個人,生日是同一天的機率問題 (同月同日,但不見得要同年)。
第二個課題就是:N 個人裡的 N 要有多大,才能讓機率高於 50% 呢? 答案是 23,這樣的數字,小到讓人覺得不可思議。基於此,我們常稱此為生日悖論 (Birthday Paradox;也有人稱「生日矛盾」)。
這個理論假定有兩個前提:
1. 沒有人的生日是二月二十九。
2. 每個人的生日乃平均分散於一年的 365 天內。
此問題首先要提到的就是先解決互補問題 (complementary problem),這也是比較簡單的一部份:隨便選,要選幾個人是生日完全不同的? 我們可以把它寫成一個遞迴函數 (recursive function):
double different_birthdays(int n) 
{
return n == 1 ? 1.0 : different_birthdays(n-1) * (365.0-(n-1))/365.0; 
}
顯然,N = 1 的機率為 1,N>1 的機率則有兩種結果:
1. 前 N-1 個人擁有完全不同的生日。
2. 第 N-th 個人的生日與前 N-1 個人不同。
展現此機率的程式可能長得像這樣:
void main(void)
{
int n;
for (n = 1; n <= 365; n++) 
printf("%3d: %e\n", n, 1.0-different_birthdays(n));
}
產生結果如下:
1: 0.000000e+00
2: 2.739726e-03
3: 8.204166e-03
4: 1.635591e-02
5: 2.713557e-02
***
20: 4.114384e-01
21: 4.436883e-01
22: 4.756953e-01
23: 5.072972e-01
24: 5.383443e-01
25: 5.686997e-01
***
結論則為,在 N 個人裡,至少有兩個人擁有相同生日,其機率大於 0.5 者,N 為 23。
原文出處:
原文的函数double different_birthdays_including_Feb_29(int)错误

The Birthday Paradox
Philip J. ErdelskyJuly 4, 2001

Please e-mail comments, corrections and additions to the webmaster at pje@efgh.com.

BUT WHAT ABOUT LEAP YEAR?

The original problem can be solved with a slide rule, which is exactly what I did when I first heard it many, many years ago.

If we add February 29 to the mix, it gets considerably more complicated. In this case, we make some additional assumptions:

  1. Equal numbers of people are born on days other than February 29.
  2. The number of people born on February 29 is one-fourth of the number of people born on any other day.

Hence the probability that a randomly selected person was born on February 29 is 0.25/365.25, and the probability that a randomly selected person was born on another specified day is 1/365.25.

The probability that N persons, possibly including one born on February 29, have distinct birthdays is the sum of two probabilities:

  1. That the N persons were born on N different days other than February 29.
  2. That the N persons were born on N different days, and include one person born on February 29.

The probabilities add because the two cases are mutually exclusive.

Now each probability can be expressed recursively:

  1. double different_birthdays_excluding_Feb_29(int n)
  2. {
  3.   return n == 1 ? 365.0/365.25 :
  4.     different_birthdays_excluding_Feb_29(n-1) * (365.0-(n-1)) / 365.25;
  5. }

  6. double different_birthdays_including_Feb_29(int n)
  7. {
  8.   return n == 1 ? 0.25 / 365.25 :
  9.     different_birthdays_including_Feb_29(n-1) * (365.0-(n-2)) / 365.25;
  10. }

A program to display the probabilities goes something like this:

  1. int main()
  2. {
  3.     int n;
  4.     for (n = 1; n <= 36; n++)
  5.     printf("%3d: %e\n", n, 1.0-different_birthdays_excluding_Feb_29(n) -
  6.             different_birthdays_including_Feb_29(n));
  7.     return 0;
  8. }

The result is something like this:

  1: -8.348357e-18
  2: 3.420440e-03
  3: 9.557661e-03
  4: 1.836877e-02
  5: 2.978904e-02
  6: 4.373274e-02
  ***
 19: 3.867019e-01
 20: 4.190237e-01
 21: 4.512327e-01
  ***

As expected, the probabilities are slightly lower, because there is a lower probability of matching birthdays when there are more possible birthdays. But the smallest number with probability greater than 0.5 is still 23.

Of course, a mathematical purist may argue that leap years don't always come every four years, so the calculations need further modification. However, the last quadrennial year that wasn't a leap year was 1900, and the next one will be 2100. The number of persons now living who were born in 1900 is so small that I think our approximation is valid for all practical purposes. But you are welcome to make the required modifications if you wish.

The Birthday Paradox has implications beyond the world of parlor betting. A standard technique in data storage is to assign each item a number called a hash code. The item is then stored in a bincorresponding to its hash code. This speeds up retrieval because only a single bin must be searched. The Birthday Paradox shows that the probability that two or more items will end up in the same bin is high even if the number of items is considerably less than the number of bins. Hence efficient handling of bins containing two or more items is required in all cases.


阅读(1253) | 评论(0) | 转发(0) |
0

上一篇:格雷码 Gray Code

下一篇:过桥问题

给主人留下些什么吧!~~