example for document classify use nltk and python-OowarrioroO-ChinaUnix博客

MyLinuxBlog

首页　| 　博文目录　| 　关于我

OowarrioroO

博客访问： 298238
博文数量： 82
博客积分： 0
博客等级：民兵
技术积分： 874
用户组：普通用户
注册时间： 2015-03-21 09:58

个人简介

traveling in cumputer science!!

文章分类

全部博文（82）

C++（3）
MongoDB（6）
字符编码（1）
linux（9）
NLP（1）
other（0）
spark（15）
python（20）
android（7）
JAVA（1）
搜索引擎（2）
git（1）
Algorithm（5）
myLinuxCoding（11）
未分配的博文（0）

文章存档

2016年（13）

2015年（69）

我的朋友

zhaoriti

相关博文

example for document classify use nltk and python

分类： Python/Ruby

2015-09-16 10:36:32

声明：文章原创，转载需注明出处。由于文章大多是学习过程中的记录内容，技术能力有限，希望大家指出其中错误，共同交流进步。由此原因，文章会不定期改善，看原文最新内容，请到：http://blog.chinaunix.net/uid/29454152.html
1. get the movie comment and classify it into pos or neg
code like below:

点击(此处)折叠或打开

>>> import nltki
>>> import random
>>> from nltk.corpus import movie_reviews
>>> documents = [(list(movie_reviews.words(fileid)), category)
... for category in movie_reviews.categories()
... for fileid in movie_reviews.fileids(category)]
>>> random.shuffle(documents)

2.get the features of the documents, that if the word in the selected document
code like below:

点击(此处)折叠或打开

>>> all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
>>> word_features = all_words.keys()[:2000]
>>> def document_features(document):
... document_words = set(document)
... features = {}
... for word in word_features:
... features['contains(%s)' % word] = (word in document_words)
... return features

3.train and test the classifier for the document
code like below:

点击(此处)折叠或打开

>>> featuresets = [(document_features(d), c) for (d,c) in documents]
>>> train_set, test_set = featuresets[100:], featuresets[:100]
>>> classifier = nltk.NaiveBayesClassifier.train(train_set)
>>> print nltk.classify.accuracy(classifier, test_set)
0.73

阅读(1187) | 评论(0) | 转发(0) |

上一篇：how to use BaiduMap in android studio under ubuntu

下一篇：RegExp and classfier used in part-of-speech(POS) tagging

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6