Chinaunix首页 | 论坛 | 博客
  • 博客访问: 91617
  • 博文数量: 40
  • 博客积分: 651
  • 博客等级: 上士
  • 技术积分: 356
  • 用 户 组: 普通用户
  • 注册时间: 2011-08-08 22:31
文章分类

全部博文(40)

文章存档

2013年(6)

2012年(3)

2011年(31)

我的朋友

分类: Java

2013-03-22 15:12:11


  if (!input.incrementToken())判断中,调用构建TokenScreamComponent时传入的分词器获取token分词
并存入CharTermAttribute中


  1. //A TokenFilter is a TokenStream whose input is another TokenStream.
  2. public class StandardFilter extends TokenFilter {
  3.      /**
  4.      * 去除词语末尾的"'s" 如 it's-> it
  5.      * 去除缩略语中的"." 如U.S.A -> USA
  6.      */
  7.      //词元的内容,如"tearcher" "xy12@yahoo.com" "1421"用来保存Token字符串;
  8.     private final org.apache.lucene.analysis.tokenattributes.CharTermAttribute termAtt ;
  9.     private final org.apache.lucene.analysis.tokenattributes.TypeAttribute typeAtt ;
  10.     //Token的大概14多重类型
  11.     
  12.     private final static String ACRONYM = "";
  13.     private final static String APOSTROPHE = "";
  14.  
  15.     protected StandardFilter(TokenStream input) {
  16.         super(input);
  17.          termAtt = addAttribute(CharTermAttribute.class);
  18.          typeAtt = addAttribute(TypeAttribute.class);
  19.     
  20.     }

  21.     @Override
  22.     public boolean incrementToken() throws IOException {
  23.         
  24.         if (!input.incrementToken())//这里会立即调用你所传入分词器的incrementToken方法
  25.         {
  26.             return false;
  27.         }
  28.         char[] buffer = termAtt.buffer();
  29.         final int bufferLength = termAtt.length();
  30.         final String type = typeAtt.type();
  31.         if (type ==String.valueOf(APOSTROPHE)
  32.                     && bufferLength >= 2
  33.                         && buffer[bufferLength-2] == '\''
  34.                                 && (buffer[bufferLength-1] == 's' || buffer[bufferLength-1] == 'S')) {
  35.        
  36.                 termAtt.setLength((bufferLength - 2));
  37.          } else if (type == String.valueOf(ACRONYM)) {
  38.                 int upto = 0;
  39.                 for(int i=0;i<bufferLength;i++) {
  40.                       char c = buffer[i];
  41.                       if (c != '.')
  42.                             buffer[upto++] = c;
  43.                 }
  44.                 termAtt.setLength(upto);
  45.           }
  46.       return true;
  47.     }

  48. }

阅读(1582) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~