Chinaunix首页 | 论坛 | 博客
  • 博客访问: 103739057
  • 博文数量: 19283
  • 博客积分: 9968
  • 博客等级: 上将
  • 技术积分: 196062
  • 用 户 组: 普通用户
  • 注册时间: 2007-02-07 14:28
文章分类

全部博文(19283)

文章存档

2011年(1)

2009年(125)

2008年(19094)

2007年(63)

分类: Oracle

2008-04-24 20:56:28

发表人:dbaoracle   来源:dbaoracle.itpub.net

使用Oracle Text,如何检索类似 PreFix_Word 这样包括下划线"_"的词语?

SQL> create table xx (x1 number,x2 varchar2(100));

Table created.

SQL> alter table xx add constraint xx_pk primary key (x1) ;

Table altered.

SQL> insert into xx values (1,'Hello who_is_there');

SQL> commit;

SQL> select * from xx;

X1 X2
---------- ------------------------------------------------------------
1 Hello who_is_there

SQL> create index xx_ind_ctx on xx(x2) indextype is ctxsys.context ;

Index created.

SQL> select * from xx where contains(x2,‘who_is_there’) >0;

no rows selected

为何Oracle不能够索引带有下划线(特殊字符)的词语哪?

首先,默认情况下, 语法分析器lexer (以basic lexer为例)负责将要索引的文本分成很多单独的词语(tokens) ,并且以文本中任何非数字或者字母(non-alphanumeric character) 的字符作为词语分隔符 (token separator).

文本'who_is_there' 默认会被分为 'who', 'is','there'3个单词,因此查询不到词语"who_is_there".

但我们可以通过制定参数 printjoins 设置哪些特殊字符不被包括在词语分隔符 (token separator)里。

Specify the non alphanumeric characters that, when they appear anywhere in a word (beginning, middle, or end), are processed as alphanumeric and included with the token in the Text index. This includes printjoins that occur consecutively.

For example, if the hyphen '-' and underscore '_' characters are defined as printjoins, terms such as pseudo-intellectual and _file_ are stored in the Text index as pseudo-intellectual and _file_.

SQL> create table xx (x1 number,x2 varchar2(100));
Table created.
SQL> insert into xx values (1,'oracle_text');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from xx;
X1 X2
---------- ------------------------------------------------------------
1 oracle_text

SQL> begin
ctx_ddl.create_preference('customer_lexer','BASIC_LEXER');
ctx_ddl.set_attribute('customer_lexer','printjoins','_');
end; 2 3 4
5 /

PL/SQL procedure successfully completed.

SQL> create index xx_ind_ctx on xx(x2) indextype is ctxsys.context
parameters('lexer customer_lexer');
Index created.

SQL> select x2 from xx where contains(x2,'Oracle') > 0;
no rows selected

SQL> select x2 from xx where contains(x2,'Oracle_text') > 0;
X2
--------------------------------------
Oracle_text

SQL> select token_text from DR$XX_IND_CTX$I;

TOKEN_TEXT
----------------------------------------------------------------
ORACLE_TEXT

Oracle 索引了整个单词“ORACLE_TEXT”。

阅读(366) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~