利用 Payload 功能,可以提高文档中特定词汇的评分,如黑体词汇、斜体词汇等,从而优化搜索结果排序。
下面还以文档 D0 和 D1 为例说明如何设置和检索 Payload。其中GPRS为专业术语,但search “GPRS描述”的时候,返回的D1的得分比D0高。但这不是我们想要的结果,我们可能想要D0得分高一些,这时可在incrementToken中,自定义词的权重(例如术语权重高些),然后在重写Similarity,自定义score。
D0 = "GPRS的问题"
D1 = "问题描述"
Step1:在 Analyzer 处理过程中,为特殊词汇添加评分 Payload
ICTCLASTokenizer.java
/** * @see org.apache.lucene.analysis.TokenStream#incrementToken() */ @Override public boolean incrementToken() throws IOException { clearAttributes(); Word lexeme = segmentation.next(); if (lexeme == null) return false;
termAttr.setTermBuffer(lexeme.getText()); offsetAttr.setOffset(lexeme.getStartPosition(), lexeme.getEndPosition());
/* * 有词性,就存进payload */ String payloadText = ""; if (needPOSTagged && !StringUtils.isEmpty(lexeme.getPartOfSpeech())) payloadText = lexeme.getPartOfSpeech();
/* * 该词为指定关键字或者术语,就存进payload */ float keyweight = gmccKeyWordDeal.doDeal(lexeme.getText()); if(keyweight > 0) payloadText = payloadText + "_" + keyweight; if(!payloadText.equals("")) payloadAttr.setPayload(new Payload(payloadText.getBytes()));
finalOffset = lexeme.getEndPosition();
return true; }
|
Step2:重写 Similarity (主要负责排名和评分)
BwSimilarity.java
public class BwSimilarity extends DefaultSimilarity {
private static final long serialVersionUID = -8049061435299914513L;
public BwSimilarity() { super(); }
@Override public float scorePayload(int docId, String fieldName, int start, int end, byte[] payload, int offset, int length) { String payloadStr = ""; try { payloadStr = new String(payload, "UTF-8"); } catch (UnsupportedEncodingException e) { e.printStackTrace(); return 1; } // 获取设定的keyweight,默认为1 String kwStr = "1"; int kwIndex = payloadStr.indexOf("_"); if(kwIndex != -1) kwStr = payloadStr.substring(kwIndex + 1); return Float.parseFloat(kwStr); }
@Override public float coord(int overlap, int maxOverlap) { float overlap2 = (float)Math.pow(2, overlap); float maxOverlap2 = (float)Math.pow(2, maxOverlap); return (overlap2 / maxOverlap2); } }
|
Step3:使用重写的 boostingSimilarity 进行检索
PayloadTermQuery ptq = new PayloadTermQuery(new Term(field, term),new AveragePayloadFunction());
Searcher searcher = new IndexSearcher(…); Searcher.setSimilarity(boostingSimilarity); … ScoreDoc[] hits = searcher.search(ptq , hitsPerPage).scoreDocs;
|
相关链接:
阅读(2609) | 评论(0) | 转发(0) |