分类: Oracle
2008-04-24 20:56:28
首先,默认情况下, 语法分析器lexer (以basic lexer为例)负责将要索引的文本分成很多单独的词语(tokens) ,并且以文本中任何非数字或者字母(non-alphanumeric character) 的字符作为词语分隔符 (token separator).
文本'who_is_there' 默认会被分为 'who', 'is','there'3个单词,因此查询不到词语"who_is_there".
但我们可以通过制定参数 printjoins 设置哪些特殊字符不被包括在词语分隔符 (token separator)里。
Specify the non alphanumeric characters that, when they appear anywhere in a word (beginning, middle, or end), are processed as alphanumeric and included with the token in the Text index. This includes printjoins that occur consecutively.
For example, if the hyphen '-' and underscore '_' characters are defined as printjoins, terms such as pseudo-intellectual and _file_ are stored in the Text index as pseudo-intellectual and _file_.
SQL> create table xx (x1 number,x2 varchar2(100));
Table created.
SQL> insert into xx values (1,'oracle_text');
1 row created.
SQL> commit;
Commit complete.
SQL> select * from xx;
X1 X2
---------- ------------------------------------------------------------
1 oracle_text
SQL> begin
ctx_ddl.create_preference('customer_lexer','BASIC_LEXER');
ctx_ddl.set_attribute('customer_lexer','printjoins','_');
end; 2 3 4
5 /
PL/SQL procedure successfully completed.
SQL> create index xx_ind_ctx on xx(x2) indextype is ctxsys.context
parameters('lexer customer_lexer');
Index created.
SQL> select x2 from xx where contains(x2,'Oracle') > 0;
no rows selected
SQL> select x2 from xx where contains(x2,'Oracle_text') > 0;
X2
--------------------------------------
Oracle_text
SQL> select token_text from DR$XX_IND_CTX$I;
TOKEN_TEXT
----------------------------------------------------------------
ORACLE_TEXT
Oracle 索引了整个单词“ORACLE_TEXT”。