Chinaunix首页 | 论坛 | 博客
  • 博客访问: 59901
  • 博文数量: 22
  • 博客积分: 0
  • 博客等级: 民兵
  • 技术积分: 221
  • 用 户 组: 普通用户
  • 注册时间: 2015-08-28 11:18
文章分类

全部博文(22)

文章存档

2021年(4)

2020年(1)

2016年(10)

2015年(7)

我的朋友

分类: 大数据

2016-05-17 19:05:34

布尔型偏好就是给出的偏好数据中没有偏好程度,只有用户ID和物品ID。只表示物品和用户之间有关联,但是关联程度的深度没有表示。

(主要的笔记内容我直接写在代码的注释中)
第一个案例:直接套用传统的GenericDataModel的方式使用GenericBooleanPrefDataModel方式出现的错误。


点击(此处)折叠或打开

  1. // PearsonCorrelationSimilarity 会出现java.lang.IllegalArgumentException: 由于是布尔型偏好,默认给出的偏好假值都是1.0,皮尔逊相关系数是两个数据集的协方差与标准差的比值,当所有数据都是1.0,那么两个值都是0,会出现0/0的情况,java返回的结果是NaN
  2. public class GenericBooleanPrefDataModelTest {

  3.     public static void main(String[] args) throws Exception{
  4.         // TODO Auto-generated method stub

  5.         DataModel model=new GenericBooleanPrefDataModel(
  6.                 GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("ua.base"))));
  7.         RecommenderEvaluator evaluator=new AverageAbsoluteDifferenceRecommenderEvaluator();
  8.         RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
  9.             
  10.             @Override
  11.             public Recommender buildRecommender(DataModel model) throws TasteException {
  12.                 // TODO Auto-generated method stub
  13.                 UserSimilarity similarity=new PearsonCorrelationSimilarity(model);
  14.                 UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
  15.                 return new GenericUserBasedRecommender(model,neighborhood,similarity);
  16.             }
  17.         };
  18.         
  19.         DataModelBuilder modelBuilder=new DataModelBuilder() {
  20.             
  21.             @Override
  22.             public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
  23.                 // TODO Auto-generated method stub
  24.                 
  25.                 return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
  26.             }
  27.         };
  28.         
  29.         double score=evaluator.evaluate(recommenderBuilder, modelBuilder, model, 0.9, 1.0);
  30.         System.out.println(score);
  31.     }

  32. }
改进措施我们换一种求取相似性的方法,虽然结果是0看似合理,但是实际上也是没有意义的。

点击(此处)折叠或打开

  1. //我们使用LogLikelihoodSimilarity替代前面的Pear进行相似性测量。结果是0.0,表示完美的预测结果,但是这个测试是无效的,因为给出的偏好的假值都是1.0,计算的平均差值只能是0,所以才会出现这个结果。
  2. public class GenericBooleanPrefDataModel_LogLike {

  3.     public static void main(String[] args) throws Exception{
  4.         // TODO Auto-generated method stub

  5.         DataModel model=new GenericBooleanPrefDataModel(
  6.                 new FileDataModel(new File("ua.base")));
  7.         RecommenderEvaluator evaluator=new AverageAbsoluteDifferenceRecommenderEvaluator();
  8.         RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
  9.             
  10.             @Override
  11.             public Recommender buildRecommender(DataModel model) throws TasteException {
  12.                 // TODO Auto-generated method stub
  13.                 UserSimilarity similarity=new LogLikelihoodSimilarity(model);
  14.                 UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
  15.                 return new GenericUserBasedRecommender(model,neighborhood,similarity);
  16.             }
  17.         };
  18.         
  19.         DataModelBuilder modelBuilder=new DataModelBuilder() {
  20.             
  21.             @Override
  22.             public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
  23.                 // TODO Auto-generated method stub
  24.                 
  25.                 return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
  26.             }
  27.         };
  28.         
  29.         double score=evaluator.evaluate(recommenderBuilder, modelBuilder, model, 0.9, 1.0);
  30.         System.out.println(score);
  31.     }

  32. }
上面我们评价的是差值,但是使用上面的LogLike方法求取相似性的方法求取Precision、Recall是有意义的。




点击(此处)折叠或打开

  1. //虽然使用LogLikelihoodSimilarity进行相似性计算得到的平均差值没有参考性,但是Precision、Recall的值还是有效的。
  2. //但是我们使用了GenericUserBasedRecommender生成一个推荐器,但是这种推荐程序是按照估计的偏好对推荐进行排序,但这些值都是1.0,所以这种排序只能是随机排序。
  3. //我们可以使用GenericBooleanPrefUserBasedRecommender生成推荐。它是按照与其他相似用户相关的物品计算权重,用户相似度越高,这个权重值越大。不生成加权平均。
  4. public class GenericBooleanPrefDataModel_Precision_Recall {

  5.     public static void main(String[] args) throws Exception{
  6.         // TODO Auto-generated method stub

  7.         DataModel model=new GenericBooleanPrefDataModel(
  8.                 new FileDataModel(new File("ua.base")));
  9.         RecommenderIRStatsEvaluator evaluator=new GenericRecommenderIRStatsEvaluator();
  10.         RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
  11.             
  12.             @Override
  13.             public Recommender buildRecommender(DataModel model) throws TasteException {
  14.                 // TODO Auto-generated method stub
  15.                 UserSimilarity similarity=new LogLikelihoodSimilarity(model);
  16.                 UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
  17.                 return new GenericUserBasedRecommender(model,neighborhood,similarity);
  18.             }
  19.         };
  20.         
  21.         DataModelBuilder modelBuilder=new DataModelBuilder() {
  22.             
  23.             @Override
  24.             public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
  25.                 // TODO Auto-generated method stub
  26.                 
  27.                 return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
  28.             }
  29.         };
  30.         
  31.         IRStatistics stats=evaluator.evaluate(recommenderBuilder, modelBuilder, model, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
  32.         System.out.println(stats.getPrecision());
  33.         System.out.println(stats.getRecall());
  34.     }

  35. }


点击(此处)折叠或打开

  1. public class GenericBooleanPrefDataModel_LogLike_Pre_Recall_Final {

  2.     public static void main(String[] args) throws Exception{
  3.         // TODO Auto-generated method stub

  4.         DataModel model=new GenericBooleanPrefDataModel(
  5.                 new FileDataModel(new File("ua.base")));
  6.         RecommenderIRStatsEvaluator evaluator=new GenericRecommenderIRStatsEvaluator();
  7.         RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
  8.             
  9.             @Override
  10.             public Recommender buildRecommender(DataModel model) throws TasteException {
  11.                 // TODO Auto-generated method stub
  12.                 UserSimilarity similarity=new LogLikelihoodSimilarity(model);
  13.                 UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
  14.                 return new GenericUserBasedRecommender(model,neighborhood,similarity);
  15.             }
  16.         };
  17.         
  18.         DataModelBuilder modelBuilder=new DataModelBuilder() {
  19.             
  20.             @Override
  21.             public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
  22.                 // TODO Auto-generated method stub
  23.                 
  24.                 return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
  25.             }
  26.         };
  27.         
  28.         IRStatistics stats=evaluator.evaluate(recommenderBuilder, modelBuilder, model, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
  29.         System.out.println(stats.getPrecision());
  30.         System.out.println(stats.getRecall());
  31.     }


  32. }

最后想说的是,不能讲布尔型和非布尔型数据在一个DataModel中混合使用。一种解决方法是忽略偏好值,而将所有数据都看成布尔型数据进行处理。或者你可以在布尔型数据后面添加假想的偏好值。这些偏好值通过一些办法推测出来,即便只是简单的填充一个现有偏好值的平均数。












阅读(1106) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~