Chinaunix首页 | 论坛 | 博客
  • 博客访问: 291907
  • 博文数量: 82
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 874
  • 用 户 组: 普通用户
  • 注册时间: 2015-03-21 09:58
个人简介

traveling in cumputer science!!

文章分类

全部博文(82)

文章存档

2016年(13)

2015年(69)

我的朋友

分类: Python/Ruby

2015-09-16 10:36:32

        声明:文章原创,转载需注明出处。由于文章大多是学习过程中的记录内容,技术能力有限,希望大家指出其中错误,共同交流进步。由此原因,文章会不定期改善,看原文最新内容,请到:http://blog.chinaunix.net/uid/29454152.html
1. get the movie comment and classify it into pos or neg

    code like below:

点击(此处)折叠或打开

  1. >>> import nltki
  2. >>> import random
  3. >>> from nltk.corpus import movie_reviews
  4. >>> documents = [(list(movie_reviews.words(fileid)), category)
  5. ... for category in movie_reviews.categories()
  6. ... for fileid in movie_reviews.fileids(category)]
  7. >>> random.shuffle(documents)

2.get the features of the documents, that if the word in the selected document
    code like below:

点击(此处)折叠或打开

  1. >>> all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
  2. >>> word_features = all_words.keys()[:2000]
  3. >>> def document_features(document):
  4. ... document_words = set(document)
  5. ... features = {}
  6. ... for word in word_features:
  7. ... features['contains(%s)' % word] = (word in document_words)
  8. ... return features

3.train and test the classifier for the document
    code like below:

点击(此处)折叠或打开

  1. >>> featuresets = [(document_features(d), c) for (d,c) in documents]
  2. >>> train_set, test_set = featuresets[100:], featuresets[:100]
  3. >>> classifier = nltk.NaiveBayesClassifier.train(train_set)
  4. >>> print nltk.classify.accuracy(classifier, test_set)
  5. 0.73


阅读(1165) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~