简单的运用Lucene进行检索-laoliulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4664553
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

简单的运用Lucene进行检索

分类：网络与安全

2012-05-03 14:18:33

文章来源：http://www.blogjava.net/wangdei/archive/2008/06/17/208696.html

Lucene不是一个完整的全文索引应用，而是是一个用Java写的全文索引引擎工具包，它可以方便的嵌入到各种应用中实现针对应用的全文索引/检索功能。
为了更快的体验lucene.本文作者写了一个比较简单的类.大家可以上 BT下载或是小说520网看看其效果.

public class BtLucene {

private static Logger logger = Logger.getLogger(BtLucene.class);

public static String[] StopStrs = {"BT285","BT软件","BT电影","BT下载"};

/**

* 查询

* @param queryStr

* @param lucennePath

* @return

* @throws Exception

public LuceneModel query(String queryStr,String lucennePath) throws Exception {

String queryUTF8 = URLDecoder.decode(queryStr,"UTF-8");

LuceneModel luceneModel = new LuceneModel();

List<LuceneInfo> lucneneInfoList = new ArrayList<LuceneInfo>();

long begin = System.currentTimeMillis();

Document doc = new Document();

StandardAnalyzer analyzer = new StandardAnalyzer(StopStrs);

// 一段简单的检索代码

QueryParser queryParser = new QueryParser("title", analyzer);

Query query = queryParser.parse(queryUTF8);

// 检索

Searcher searcher = new IndexSearcher(lucennePath);// "index"指定索引文件位置

Hits hits = searcher.search(query);

int size = hits.length();

// 打印结果值集

if(logger.isDebugEnabled()){

logger.debug("result size is " + size);

}

luceneModel.setSize(size);

for (int i = 0; i < size; i++) {

LuceneInfo lucneneInfo = new LuceneInfo();

doc = hits.doc(i);

String id = doc.get("id");

String title = doc.get("title");

String[] splitTitle = title.split(queryUTF8);

lucneneInfo.setId(id);

if(splitTitle.length >1)

lucneneInfo.setTitle(splitTitle[0]+ "" +queryUTF8 +"" + splitTitle[1]);

else

lucneneInfo.setTitle(splitTitle[0]+ "" +queryUTF8 +"");

lucneneInfo.setTrip(title);

lucneneInfoList.add(lucneneInfo);

if(i==200)

break;

}

long needsTime = (System.currentTimeMillis()-begin);

long compiteTime = needsTime/1000;

luceneModel.setTime(String.valueOf(compiteTime));

luceneModel.setLuceneInfoList(lucneneInfoList);

logger.info("query the " + queryUTF8 + " needs " + needsTime +" ms" );

return luceneModel;

}

/**

* 建立索引

* @throws Exception

public void batchCreate() throws Exception {

Configure.propertiesConfigure();

BtBatchContentCreate contentCreate = new BtBatchContentCreate();

ClassPathXmlApplicationContext appContext = new ClassPathXmlApplicationContext(

"./mysqlContext.xml");

WNewsDAO newsDAO = (WNewsDAO) appContext.getBean("wNewDaoProxy");

Bt285DAO bt285DAO = (Bt285DAO) appContext.getBean("bt285DAO");

contentCreate.setNewsDAO(newsDAO);

contentCreate.setBt285DAO(bt285DAO);

LieService lieService = (LieService) appContext.getBean("lieService");

contentCreate.setLieService(lieService);

StandardAnalyzer analyzer = new StandardAnalyzer(StopStrs);

IndexWriter writer = new IndexWriter(Configure.getCreateBtLucenePath(), analyzer, true);

for (int i = 1; i < 214; i++) {

Page page = new Page();

logger.info("i=" + i);

page.setPageIndex(i);

page.setPageSize(1000);

List<Bt285> list = bt285DAO.findPageByQuery(

"select t from Bt285 t ",null, page);

for (Bt285 news : list) {

logger.debug("news Id=" + news.getId());

// 添加一条文档

Document doc = new Document();

String title = news.getTitle();

String newTitle = null;

if(title == null)

title ="no title";

newTitle = title.replace("|BT285.cn|BT下载|BT电影|BT软件", "");

doc.add(new Field("id", String.valueOf(news.getId()), Field.Store.YES, Field.Index.NO));

doc.add(new Field("title", newTitle, Field.Store.YES,

Field.Index.TOKENIZED));

doc.setBoost(news.getId() * 10);

writer.addDocument(doc);

}

writer.optimize();

writer.close();

}

public static void main(String[] args) throws Exception {

System.out.println("server begin!");

Configure.propertiesConfigure();

BtLucene action = new BtLucene();

//action.batchCreate();

String path = Configure.getCreateBtLucenePath();

action.query("nba",path);

System.out.println(URLEncoder.encode("天兆","UTF-8"));//%E5%A4%A9%E5%85%86

System.out.println("server finish!");

}

搜索法证先锋II 出来的效果.

[TVB连续剧][法证先锋II][粤语中字][TV-RMVB]

阅读(928) | 评论(0) | 转发(0) |

上一篇：Web cache 说明[翻译]

下一篇：中文在UTF8和GBK编码中的范围

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6