Oracle Context检索特殊字符-sdccf-ChinaUnix博客

Fosdccf.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

sdccf

博客访问： 104974317
博文数量： 19283
博客积分： 9968
博客等级：上将
技术积分： 196062
用户组：普通用户
注册时间： 2007-02-07 14:28

文章分类

全部博文（19283）

香文化（0）
CU技术专题（2443）

Linux酷软（214）

tmp（0）

PostgreSQL（93）

Solaris（383）

AIX（173）

SCOUNIX（575）

DB2（1005）
涂鸦（9）
编程开发（1573）

Shell（386）

C/C++（1187）
数据库（6458）

MySQL（1750）

Sybase（465）

Oracle（3695）

Informix（548）
操作系统（8627）

HP-UX（0）

IBM AIX（2）

Sun Solaris（0）

BSD（1）

Linux（8597）

SCO UNIX（23）
未分配的博文（173）

文章存档

2011年（1）

2009年（125）

2008年（19094）

2007年（63）

我的朋友

相关博文

Oracle Context检索特殊字符

分类： Oracle

2008-04-24 20:56:28

发表人:dbaoracle 来源：dbaoracle.itpub.net

使用Oracle Text,如何检索类似 PreFix_Word 这样包括下划线"_"的词语？

SQL> create table xx (x1 number,x2 varchar2(100));

Table created.

SQL> alter table xx add constraint xx_pk primary key (x1) ;

Table altered.

SQL> insert into xx values (1,'Hello who_is_there');

SQL> commit;

SQL> select * from xx;

X1 X2
---------- ------------------------------------------------------------
1 Hello who_is_there

SQL> create index xx_ind_ctx on xx(x2) indextype is ctxsys.context ;

Index created.

SQL> select * from xx where contains(x2,‘who_is_there’) >0;

no rows selected

为何Oracle不能够索引带有下划线（特殊字符）的词语哪？

首先,默认情况下, 语法分析器lexer （以basic lexer为例）负责将要索引的文本分成很多单独的词语(tokens) ，并且以文本中任何非数字或者字母(non-alphanumeric character) 的字符作为词语分隔符 (token separator).

文本'who_is_there' 默认会被分为 'who', 'is','there'3个单词，因此查询不到词语"who_is_there".

但我们可以通过制定参数 printjoins 设置哪些特殊字符不被包括在词语分隔符 (token separator)里。

Specify the non alphanumeric characters that, when they appear anywhere in a word (beginning, middle, or end), are processed as alphanumeric and included with the token in the Text index. This includes printjoins that occur consecutively.

For example, if the hyphen '-' and underscore '_' characters are defined as printjoins, terms such as pseudo-intellectual and _file_ are stored in the Text index as pseudo-intellectual and _file_.

SQL> create table xx (x1 number,x2 varchar2(100));
Table created.
SQL> insert into xx values (1,'oracle_text');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from xx;
X1 X2
---------- ------------------------------------------------------------
1 oracle_text

SQL> begin
ctx_ddl.create_preference('customer_lexer','BASIC_LEXER');
ctx_ddl.set_attribute('customer_lexer','printjoins','_');
end; 2 3 4
5 /

PL/SQL procedure successfully completed.

SQL> create index xx_ind_ctx on xx(x2) indextype is ctxsys.context
parameters('lexer customer_lexer');
Index created.

SQL> select x2 from xx where contains(x2,'Oracle') > 0;
no rows selected

SQL> select x2 from xx where contains(x2,'Oracle_text') > 0;
X2
--------------------------------------
Oracle_text

SQL> select token_text from DR$XX_IND_CTX$I;

TOKEN_TEXT
----------------------------------------------------------------
ORACLE_TEXT

Oracle 索引了整个单词“ORACLE_TEXT”。

阅读(376) | 评论(0) | 转发(0) |

上一篇：oracle并行索引分析

下一篇：BCV备份Oracle数据库

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6