博客首页 注册 建议与交流 排行榜 加入友情链接
推荐 投诉 搜索: 帮助

ypxing

学而不思则罔,思而不学则殆

见贤思齐焉,见不贤而内自省也

人不知而不愠,不亦君子乎?

   ypxing.cublog.cn
关于作者  
姓名:星云鹏 (Yunpeng Xing)
职业:IT相关
年龄:28
位置:北京
个性介绍:
Love me, feed me, 
never leave me.
失败只有一种, 那就是半途而废

我的分类  




朴素贝页斯分类(Naive Bayes Classifier)
Naive Bayes Classifier Introductory Overview

The Naive Bayes Classifier technique is based on the so-called Bayesian theorem and is particularly suited when the dimensionality of the inputs is high. Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods.

To demonstrate the concept of Naïve Bayes Classification, consider the example displayed in the illustration above. As indicated, the objects can be classified as either GREEN or RED. Our task is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently exiting objects.

Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case (which hasn't been observed yet) is twice as likely to have membership GREEN rather than RED. In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous experience, in this case the percentage of GREEN and RED objects, and often used to predict outcomes before they actually happen.

Thus, we can write:

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class membership are:

Having formulated our prior probability, we are now ready to classify a new object (WHITE circle). Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects in the vicinity of X, the more likely that the new cases belong to that particular color. To measure this likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood:

From the illustration above, it is clear that Likelihood of X given GREEN is smaller than Likelihood of X given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus:

Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis, the final classification is produced by combining both sources of information, i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes' rule (named after Rev. Thomas Bayes 1702-1761).

Finally, we classify X as RED since its class membership achieves the largest posterior probability.

Note.The above probabilities are not normalized. However, this does not affect the classification outcome since their normalizing constants are the same.

 

Technical Notes

In the previous section, we provided an intuitive example for understanding classification using Naive Bayes. In this section are further details of the technical issues involved. Naive Bayes classifiers can handle an arbitrary number of independent variables whether continuous or categorical. Given a set of variables, X = {x1,x2,x...,xd}, we want to construct the posterior probability for the event Cj among a set of possible outcomes C = {c1,c2,c...,cd}. In a more familiar language, X is the predictors and C is the set of categorical levels present in the dependent variable. Using Bayes' rule:

where p(Cj | x1,x2,x...,xd) is the posterior probability of class membership, i.e., the probability that X belongs to Cj. Since Naive Bayes assumes that the conditional probabilities of the independent variables are statistically independent we can decompose the likelihood to a product of terms:

and rewrite the posterior as:

Using Bayes' rule above, we label a new case X with a class level Cj that achieves the highest posterior probability.

Although the assumption that the predictor (independent) variables are independent is not always accurate, it does simplify the classification task dramatically, since it allows the class conditional densities p(xk | Cj) to be calculated separately for each variable, i.e., it reduces a multidimensional task to a number of one-dimensional ones. In effect, Naive Bayes reduces a high-dimensional density estimation task to a one-dimensional kernel density estimation. Furthermore, the assumption does not seem to greatly affect the posterior probabilities, especially in regions near decision boundaries, thus, leaving the classification task unaffected.

Naive Bayes can be modeled in several different ways including normal, lognormal, gamma and Poisson density functions:

Note. Poisson variables are regarded here as continuous since they are ordinal rather than truly categorical. For categorical variables, a discrete probability is used with values of the categorical level being proportional to their conditional frequency in the training data.

 原文地址 http://www.statsoft.com/textbook/stnaiveb.html
 发表于: 2007-12-06,修改于: 2007-12-06 16:18 已浏览1076次,有评论1条 推荐 投诉

  网友评论
  本站网友 时间:2007-12-20 14:47:23 IP地址:58.211.16.★
联系方式  QQ: 1127080

全国第一家虚拟主机:支持伪静态.有利于提高排名 

15G全能空间年付500元/月付70元 可免费试用 
5GB 独立WEB空间、5GB 企业邮箱空间、5GB MSSQL数据库
(可划分5个数据库。可独立放5个不同的站点)

IIS连接数据 500 个、500GB/月流量、共享日志文件空间 

企业邮箱功能 
赠送5GB 超大企业邮箱,500个Email企业邮箱用户 
自动回复、自动转发、POP3、SMTP收发信、SMTP发信认证 
邮件过滤、邮件拒收、邮件夹管理、邮件域管理、定制邮件数 

数据库功能 
支持5GB MSSQL数据库空间,5个用户数据库、Access 

主机功能支持 
采用安全稳定的Win2003 .net2.0 架构 
支持ASP、PHP、ASP.NET、PERL等脚本、支持自定义CGI 
全面支持.net2.0版本,独立的Application应用池, 
支持SSI(Shtml),支持FrontPage扩展 
可免费自行绑定5个域名、500个解析、500个子域名 

详情咨询QQ:1127080
官方网站:www.abcnic.com


  发表评论



Copyright © 2001-2006 ChinaUnix.net All Rights Reserved

感谢所有关心和支持过ChinaUnix的朋友们
页面生成时间:0.00873

京ICP证041476号