布尔型偏好就是给出的偏好数据中没有偏好程度,只有用户ID和物品ID。只表示物品和用户之间有关联,但是关联程度的深度没有表示。
(主要的笔记内容我直接写在代码的注释中)
第一个案例:直接套用传统的GenericDataModel的方式使用GenericBooleanPrefDataModel方式出现的错误。
-
// PearsonCorrelationSimilarity 会出现java.lang.IllegalArgumentException: 由于是布尔型偏好,默认给出的偏好假值都是1.0,皮尔逊相关系数是两个数据集的协方差与标准差的比值,当所有数据都是1.0,那么两个值都是0,会出现0/0的情况,java返回的结果是NaN
-
public class GenericBooleanPrefDataModelTest {
-
-
public static void main(String[] args) throws Exception{
-
// TODO Auto-generated method stub
-
-
DataModel model=new GenericBooleanPrefDataModel(
-
GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("ua.base"))));
-
RecommenderEvaluator evaluator=new AverageAbsoluteDifferenceRecommenderEvaluator();
-
RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
-
-
@Override
-
public Recommender buildRecommender(DataModel model) throws TasteException {
-
// TODO Auto-generated method stub
-
UserSimilarity similarity=new PearsonCorrelationSimilarity(model);
-
UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
-
return new GenericUserBasedRecommender(model,neighborhood,similarity);
-
}
-
};
-
-
DataModelBuilder modelBuilder=new DataModelBuilder() {
-
-
@Override
-
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
-
// TODO Auto-generated method stub
-
-
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
-
}
-
};
-
-
double score=evaluator.evaluate(recommenderBuilder, modelBuilder, model, 0.9, 1.0);
-
System.out.println(score);
-
}
-
-
}
改进措施我们换一种求取相似性的方法,虽然结果是0看似合理,但是实际上也是没有意义的。
-
//我们使用LogLikelihoodSimilarity替代前面的Pear进行相似性测量。结果是0.0,表示完美的预测结果,但是这个测试是无效的,因为给出的偏好的假值都是1.0,计算的平均差值只能是0,所以才会出现这个结果。
-
public class GenericBooleanPrefDataModel_LogLike {
-
-
public static void main(String[] args) throws Exception{
-
// TODO Auto-generated method stub
-
-
DataModel model=new GenericBooleanPrefDataModel(
-
new FileDataModel(new File("ua.base")));
-
RecommenderEvaluator evaluator=new AverageAbsoluteDifferenceRecommenderEvaluator();
-
RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
-
-
@Override
-
public Recommender buildRecommender(DataModel model) throws TasteException {
-
// TODO Auto-generated method stub
-
UserSimilarity similarity=new LogLikelihoodSimilarity(model);
-
UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
-
return new GenericUserBasedRecommender(model,neighborhood,similarity);
-
}
-
};
-
-
DataModelBuilder modelBuilder=new DataModelBuilder() {
-
-
@Override
-
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
-
// TODO Auto-generated method stub
-
-
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
-
}
-
};
-
-
double score=evaluator.evaluate(recommenderBuilder, modelBuilder, model, 0.9, 1.0);
-
System.out.println(score);
-
}
-
-
}
上面我们评价的是差值,但是使用上面的LogLike方法求取相似性的方法求取Precision、Recall是有意义的。
-
//虽然使用LogLikelihoodSimilarity进行相似性计算得到的平均差值没有参考性,但是Precision、Recall的值还是有效的。
-
//但是我们使用了GenericUserBasedRecommender生成一个推荐器,但是这种推荐程序是按照估计的偏好对推荐进行排序,但这些值都是1.0,所以这种排序只能是随机排序。
-
//我们可以使用GenericBooleanPrefUserBasedRecommender生成推荐。它是按照与其他相似用户相关的物品计算权重,用户相似度越高,这个权重值越大。不生成加权平均。
-
public class GenericBooleanPrefDataModel_Precision_Recall {
-
-
public static void main(String[] args) throws Exception{
-
// TODO Auto-generated method stub
-
-
DataModel model=new GenericBooleanPrefDataModel(
-
new FileDataModel(new File("ua.base")));
-
RecommenderIRStatsEvaluator evaluator=new GenericRecommenderIRStatsEvaluator();
-
RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
-
-
@Override
-
public Recommender buildRecommender(DataModel model) throws TasteException {
-
// TODO Auto-generated method stub
-
UserSimilarity similarity=new LogLikelihoodSimilarity(model);
-
UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
-
return new GenericUserBasedRecommender(model,neighborhood,similarity);
-
}
-
};
-
-
DataModelBuilder modelBuilder=new DataModelBuilder() {
-
-
@Override
-
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
-
// TODO Auto-generated method stub
-
-
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
-
}
-
};
-
-
IRStatistics stats=evaluator.evaluate(recommenderBuilder, modelBuilder, model, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
-
System.out.println(stats.getPrecision());
-
System.out.println(stats.getRecall());
-
}
-
-
}
-
public class GenericBooleanPrefDataModel_LogLike_Pre_Recall_Final {
-
-
public static void main(String[] args) throws Exception{
-
// TODO Auto-generated method stub
-
-
DataModel model=new GenericBooleanPrefDataModel(
-
new FileDataModel(new File("ua.base")));
-
RecommenderIRStatsEvaluator evaluator=new GenericRecommenderIRStatsEvaluator();
-
RecommenderBuilder recommenderBuilder=new RecommenderBuilder() {
-
-
@Override
-
public Recommender buildRecommender(DataModel model) throws TasteException {
-
// TODO Auto-generated method stub
-
UserSimilarity similarity=new LogLikelihoodSimilarity(model);
-
UserNeighborhood neighborhood=new NearestNUserNeighborhood(10, similarity, model);
-
return new GenericUserBasedRecommender(model,neighborhood,similarity);
-
}
-
};
-
-
DataModelBuilder modelBuilder=new DataModelBuilder() {
-
-
@Override
-
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
-
// TODO Auto-generated method stub
-
-
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
-
}
-
};
-
-
IRStatistics stats=evaluator.evaluate(recommenderBuilder, modelBuilder, model, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
-
System.out.println(stats.getPrecision());
-
System.out.println(stats.getRecall());
-
}
-
-
-
}
最后想说的是,不能讲布尔型和非布尔型数据在一个DataModel中混合使用。一种解决方法是忽略偏好值,而将所有数据都看成布尔型数据进行处理。或者你可以在布尔型数据后面添加假想的偏好值。这些偏好值通过一些办法推测出来,即便只是简单的填充一个现有偏好值的平均数。
阅读(1547) | 评论(0) | 转发(0) |