Chinaunix首页 | 论坛 | 博客
  • 博客访问: 841553
  • 博文数量: 199
  • 博客积分: 6363
  • 博客等级: 准将
  • 技术积分: 2225
  • 用 户 组: 普通用户
  • 注册时间: 2007-04-28 10:01
个人简介

来自农村的老实娃

文章分类

全部博文(199)

文章存档

2017年(1)

2014年(2)

2013年(3)

2012年(6)

2011年(26)

2010年(34)

2009年(50)

2008年(44)

2007年(33)

我的朋友

分类: Oracle

2008-08-14 14:58:12

4管理DML操作

对于CTXSYS.CONTEXT索引,当应用程序对基表进行DML操作后,对基表的索引维护是必须的。索引维护包括索引同步和索引优化。

在索引建好后,我们可以在该用户下查到Oracle自动产生了以下几个表:(假设索引名为myindex):

DR$myindex$IDR$myindex$KDR$myindex$RDR$myindex$N其中以I表最重要,可以查询一下该表,看看有什么内容:

SELECT token_text, token_count  FROM dr$i_rsk1$I  WHERE ROWNUM <= 20;

这里就不列出查询接过了。可以看到,该表中保存的其实就是Oracle 分析你的文档后,生成的term记录在这里,包括term出现的位置、次数、hash值等。当文档的内容改变后,可以想见这个I表的内容也应该相应改变,才能保证Oracle在做全文检索时正确检索到内容(因为所谓全文检索,其实核心就是查询这个表)。那么如何维护该表的内容呢?总不能每次数据改变都重新建立索引吧!这就用到sync optimize了。

同步(sync: 将新的term 保存到I表;

优化(optimize: 清除I表的垃圾,主要是将已经被删除的termI表删除。

 

 

当基表中的被索引文档发生insertupdatedelete操作的时候,基表的改变并不能马上影响到索引上直到同步索引。可以查询视图CTX_USER_PENDING查看相应的改动。例如:

SELECT pnd_index_name, pnd_rowid,

       TO_CHAR (pnd_timestamp, 'dd-mon-yyyy hh24:mi:ss') timestamp

  FROM ctx_user_pending;

该语句的输出类似如下:

PND_INDEX_NAME                 PND_ROWID          TIMESTAMP

------------------------------ ------------------ --------------------
MYINDEX                        AAADXnAABAAAS3SAAC 06-oct-1999 15:56:50

 

同步和优化方法: 可以使用Oracle提供的ctx_ddl包同步和优化索引。

一.     对于CTXCAT类型的索引来说, 当对基表进行DML操作的时候,Oracle自动维护索引。对文档的改变马上反映到索引中。CTXCAT是事务形的索引。

 

4.1索引同步

4.1.1CTXSRV(同步进程)

 

Oracle提供一个全文索引同步服务进程负责监视索引表变动并且第一时间同步索引。

 

只需要在后台运行这个进程,它会监视数据的变化,及时进行同步。但由于存在一些问题在未来的ORACLE版本中将要被取代。

启动同步索引服务进程方法:

HOST ctxsrv -user ctxsys/ctxsys>&/tmp/ctx.log&

 
当你启动了CTXSRV服务进程,在后台的同步请求处理就会象事时一样,在你提交修改12秒后新的数据马上就被索引了。

 

与手工同步相比,自动索引同步更容易使索引变的稀疏,需要执行DBMS_JOB定期优化和重建索引rebuild parameters( 'sync' )

 
默认情况下,如果你不启动CTXSRV进程,索引不会自动更新除非你手工告诉它们去更新自己。你可以使用 alter index  rebuild parameters ('sync') 更新索引。
 
ALTER INDEX search_idx  REBUILD parameters( 'sync' )
/
Index altered.
 
9i提供了新的专门用于更新索引的包ctx_ddl.sync_index(…)
 

4.1.2 CTX_DDL.SYNC_INDEX

在对基表插入,修改,删除之后同步索引。推荐使用sync同步索引。

语法:
ctx_ddl.sync_index(

idx_name  IN  VARCHAR2 DEFAULT NULL

memory IN VARCHAR2 DEFAULT NULL,

part_name IN VARCHAR2 DEFAULT NULL

parallel_degree IN NUMBER DEFAULT 1);

 

idx_name   索引名称

memory    指定同步索引需要的内存。默认是系统参数DEFAULT_INDEX_MEMORY

指定一个大的内存时候可以加快索引效率和查询速度,且索引有较少的碎片

part_name  同步哪个分区索引。

parallel_degree  并行同步索引。设置并行度。


例如:

使用2M内存同步索引myindex:

 

BEGIN

   ctx_ddl.sync_index ('myindex', '2M');

END;

 

 

NOTE执行者必须是索引所有者或者CTXSYS用户。如果执行者是CTXSYS用户,索引名称可以是空NULL,这样默认优化全部的CONTEXT索引。这样的同步效果就如同ctxsrv. 我们推荐定期执行作业job同步索引。-- 为每一个索引制定单独的作业job, 一个 ctxsys 作业job同步全部索引。这样就减少了使用ctxsrv的机率,也不用在每次数据库启动后都要启动CTXSRV服务进程。由于CTXSRV有一些缺陷,在未来将不再会被ORACLE使用或者被取代。

 

 

INSERT INTO mytable

     VALUES (2, 'first,second.this is the second rows before indexed');

COMMIT ;

EXEC ctx_ddl.sync_index('mytable_idx');


  SELECT /*+ FIRST_ROWS() */ ID, SCORE(1), TEXT

    FROM MYTABLE               

   WHERE CONTAINS (TEXT, 'searchterm', 1) > 0               

ORDER BY SCORE(1) DESC;

 

其中score(1)Oracle为全文查询计算的主题符合程度。

 

4.2索引优化

经常的索引同步将会导致你的CONTEXT索引产生碎片。索引碎片严重的影响了查询的反应速度。你可以定期优化索引来减少碎片,减少索引大小,提高查询效率。为了更好的理解索引优化,我们先看看索引的结构以及碎片是如何产生的。

CONTEXT索引是反向索引,每一个索引项目都包括单词和这个单词所出现过的文档地址。例如在一个初始化索引过程中,单词DOG可以包括如下条目

DOG DOC1 DOC3 DOC5

当新的文档被包含到表的时候,索引被同步。如果新行DOC7也包括单词DOG,将会形成如下条目。

DOG DOC1 DOC3 DOC5
DOG DOC7

很多的DML操作以后,单词DOG的条目可能如下情况:

DOG DOC1 DOC3 DOC5
DOG DOC7
DOG DOC9
DOG DOC11

同步新增加的文档产生了索引碎片。单词DOG的文挡列表会越来越长,索引越来越大。

你可以优化索引(CTX_DDL.OPTIMIZE_INDEX),使用FULL或者FAST参数都可以降低索引碎片,提高索引效率。

4.2.1文档垃圾处理

当文本从表中删除的时候,Oracle Text标记删除的文档,但是并不马上修改索引。因此,就的文档信息占据了不必要的空间,导致了查询额外的开销。你必须以FULL模式优化索引,从索引中删除无效的旧的信息。这个过程叫做垃圾处理。当你经常的对表文本数据进行更新,删除操作的时候,垃圾处理是很必要的。

BEGIN

   ctx_ddl.optimize_index ('myidx', 'full');

END;

4.2.2 Single Token Optimization

除了优化整个索引以外,你还可以专门对某个标记(token)进行优化。你可以仅仅优化那些经常查询的标记(token,而不必花太多时间在很少查询的单词上。

例如,你可以专门优化token DOG,它经常被检索或者经常被更新。这样可以提高查询这个token的查询效率。

BEGIN

   ctx_ddl.optimize_index ('myidx', 'token', token => 'DOG');

END;

4.2.3 FAST MODE

这种方法仅仅使碎片行紧凑。但是,旧的数据并不从索引中删除。

BEGIN

   ctx_ddl.optimize_index ('myidx', 'fast');

END;

二.     4.2.4使用job定时同步和优化

三.     用以下的两个job来完成(job要建在和表同一个用户下) :

-- sync:

VARIABLE jobno number;

BEGIN

DBMS_JOB.SUBMIT(:jobno,'ctx_ddl.sync_index(''myindex'');',

SYSDATE, 'SYSDATE + (1/24/4)');

commit;

END;

 

-- optimizer

VARIABLE jobno number;

BEGIN

DBMS_JOB.SUBMIT(:jobno,'ctx_ddl.optimize_index(''myindex'',''FULL'');',

SYSDATE, 'SYSDATE + 1');

commit;

四.    END;

其中, 第一个jobSYSDATE + (1/24/4)是指每隔15分钟同步一次,第二个jobSYSDATE + 1是每隔1天做一次全优化。具体的时间间隔,你可以根据自己的应用的需要而定。至此,你的全文检索功能已设置完成。

 

检索 CONTAINS

你建立好CONTEXT索引后,可以使用CONTAINS操作符检索数据了。你可以有很多查询选项。你可以进行逻辑,模糊,通配符,主题等查询,还可以查询HTMLXML文档。

 

5.1简单例子

SQL

SELECT语句中,可以在WHERE指定CONTAINS操作符。还可以指定返回记录的得分(SCORE)。

SELECT score (1) title

  FROM news

 WHERE contains (text, 'Oracle', 1) > 0;


得分SCORE是指查询结果的贴切程度。得分越高表示查询信息满意度越高。你可以根据SCORE进行排序。

 

SELECT score (1), title
    FROM news
   WHERE contains (text, 'Oracle', 1) > 0
ORDER BY score (1) DESC;
 
SELECT score (1), title, issue_date
    FROM news
   WHERE contains (text, 'Oracle', 1) > 0
     AND issue_date >= ('01-OCT-97')
ORDER BY score (1) DESC;

A structured query, also called a mixed queryFor example, the upper SELECT statement returns all articles that contain the word Oracle that were written on or after October 1, 1997.

SELECT id,score(1),score (2),score (1)+ score (2) total, text

    FROM mytable

   WHERE contains (text, 'biti', 1) > 0

      OR contains (text, 'hello', 2) > 0

ORDER BY total DESC;

根据contains中的不同数字标示各个contains返回的分数Score

 

PL/SQL形式:

DECLARE

   rowno   NUMBER := 0;

BEGIN

   FOR c1 IN  (  SELECT score (1) score, title

                   FROM news

                  WHERE contains (text, 'Oracle', 1) > 0

               ORDER BY score (1) DESC)

   LOOP

      rowno :=   rowno + 1;

      DBMS_OUTPUT.put_line ( c1.title || ': ' || c1.score);

      EXIT WHEN rowno = 10;

   END LOOP;

END;

 

返回了得分最高的前10条记录.

5.2 Logical Operators

允许你组合搜索条件,通过使用ANDOR等逻辑。

操作符

符号

描述

例子表达式

AND

&

Use the AND operator to search for documents that contain at least one occurrence of each of the query terms.

Score returned is the minimum of the operands.

'cats AND dogs'

'cats & dogs'

OR

|

Use the OR operator to search for documents that contain at least one occurrence of any of the query terms.

Score returned is the maximum of the operands.

'cats | dogs'

'cats OR dogs'

 

NOT

~

Use the NOT operator to search for documents that contain one query term and not another.

To obtain the documents that contain the term animals but not dogs, use the following expression:

'animals ~ dogs'

ACCUM

,

Use the ACCUM operator to search for documents that contain at least one occurrence of any of the query terms. The accumulate operator ranks documents according to the total term weight of a document.

The following query returns all documents that contain the terms dogs, cats and puppies giving the highest scores to the documents that contain all three terms:

'dogs, cats, puppies'

EQUIV

=

Use the EQUIV operator to specify an acceptable substitution for a word in a query.

The following example returns all documents that contain either the phrase alsatians are big dogs or German shepherds are big dogs:

'German

shepherds=alsatians are

big dogs'


可以查看更多的选项。



5.3
如何优化查询

5.3.1收集统计信息

By collecting statistics on the Text domain index, the Oracle cost-based optimizer is able to do the following:

  • estimate the selectivity of the CONTAINS predicate
  • estimate the I/O and CPU costs of using the Text index, that is, the cost of processing the CONTAINS predicate using the domain index
  • estimate the I/O and CPU costs of each invocation of CONTAINS

5.3.2使用 FIRST_ROWS(n) for ORDER BY Queries

With the FIRST_ROWS hint, Oracle instructs the Text index to return rowids in score-sorted order, if possible.

Without the hint, Oracle sorts the rowids after the Text index has returned all the rows in unsorted order that satisfy the CONTAINS predicate. Retrieving the entire result set as such takes time.

Since only the first 10 hits are needed in this query, using the hint results in better performance.

 

Note: Use the FIRST_ROWS(n) hint when you need only the first few hits of a query. When you need the entire result set, do not use this hint as it might result in poor performance.

 

 

 

6.分区

 

CONTEXT索引支持在分区表山建立本地分区索引,但仍然有一些限制。如下:

 

l         分区表必须是按范围分区,不支持HASH分区和复合分区。

l         你应该为每个分区指定索引分区名称。如果不指定,系统默认为每个分区表指定。分区索引列表的顺序必须和分区表的顺序一致。

l         可以为每个分区分别指定PRAMETERS参数,但是PARAMETERS只允许包括STORAGEMEMEROY参数。

l         不能指定ONLINE建立索引。

可以查询或者获得更多信息。

6.1创建一个本地分区索引:

------------------------BEGIN---------------------------
PROMPT create partitioned table and populate it

 

CREATE TABLE part_tab (a int, b varchar2(40)) PARTITION BY RANGE(a)

(partition p_tab1 values less than (10),

 partition p_tab2 values less than (20),

 partition p_tab3 values less than (30));

 

PROMPT create customer storage preference assigned each partition

 

Execute ctx_ddl.drop_preference('mystore1');

BEGIN

   ctx_ddl.create_preference ('mystore1', 'BASIC_STORAGE');

   ctx_ddl.set_attribute ('mystore1', 'I_TABLE_CLAUSE', 'tablespace indx ');

   ctx_ddl.set_attribute (

      'mystore1',

      'I_INDEX_CLAUSE',

      'tablespace users03 compress 2 '

   );

END;

 

PROMPT create partitioned index

 

CREATE INDEX part_idx on part_tab(b) INDEXTYPE IS CTXSYS.CONTEXT

LOCAL (partition p_idx1 parameters(‘storage mystore1’), partition p_idx2 parameters(‘storage mystore2’), partition p_idx3 parameters(‘storage mystore3’));

 

-------------------------END----------------------------

 

6.2 并行的创建一个本地分区索引


可以并行的建立分区索引,加快建立索引速度。但是建立索引不能“一步到位“。我们必须先建立一个unusable索引,然后利DBMS_PCLXUTIL.BUILD_PART_INDEX 并行建立索引。

------------------------BEGIN---------------------------

PROMPT the base table has three partitions.

PROMPT  We create a local partitioned unusable index first

 

CREATE INDEX tdrbip02bx ON tdrbip02b(text)

indextype is ctxsys.context local (partition tdrbip02bx1,

                                   partition tdrbip02bx2,

                                   partition tdrbip02bx3)

unusable;

 

PROMPT run the DBMS_PCLUTIL.BUILD_PART_INDEX,which builds the 3 partitions in parallel (inter-partition parallelism). Also inside each partition, index creation is done in parallel (intra-partition parallelism) with a parallel degree of 2.

 

BEGIN

   DBMS_PCLXUTIL.build_part_index (3, 2, 'TDRBIP02B', 'TDRBIP02BX', TRUE);

END;

-------------------------END----------------------------

 

6.3查询分区索引


你可以在单个表分区上进行全文检索。

SELECT *
    FROM part_tab PARTITION (p_tab4)
   WHERE contains (b, 'Oracle') > 0
ORDER BY score;

7Oracle Text支持对本地文件的检索

它的实现是依靠参数datastorefilter的组合。在数据库的文本列中只保存指向硬盘文件的指针。建立索引的时候,Oracle读取硬盘上的文件并且将索引存储在Oracle数据库中。

 

Oracle支持对很多格式的文件的文本检索,包括文本文件Txt,  Html文件,      Word文档,  Excel表格,  PowerPoint 的文本检索,也支持PDFpdf版本1.4目前还不支持)。

 

而且配合Lexer参数很好的支持了中文字符集的检索。

   

具体实现方法:

首先,建立存储选项参数。制定DATASTORE参数为FILE_DATASTROE,提示Oracle从文件路径中索引文本。

然后制定path参数,你可以指定多个文件存储的文件路径 windows环境用”;”号间隔(Path1;Path2;Path3;;, Unix环境用:号间隔(Path1:Path2:Path3::

BEGIN

   ctx_ddl.create_preference ('my_datastore_prefs', 'FILE_DATASTORE');

   ctx_ddl.set_attribute ('my_datastore_prefs', 'path', 'c:\TEMP');

END;

下一步,建立保存这些文件名称的表。Id列是主键,title列是对文本的简单说明,thefile列保存着磁盘中Path目录下文件的名称。文件必须能够在Path路径下找到,否则Oracle会报文件无法访问的错误信息。

然后向表中插入数据,注意:thefile列保存的必须是服务器上的指定的Path路径下面的文件。

 

CREATE TABLE mydocs( id NUMBER PRIMARY KEY, title VARCHAR2(255), thefile 
VARCHAR2(255) );
 
INSERT INTO mydocs( id, title, thefile ) VALUES( 1, 'Document1', 'WordDoc1.doc');
INSERT INTO mydocs( id, title, thefile ) VALUES( 2, 'Document2', 'WordDoc2.doc');
INSERT INTO mydocs( id, title, thefile ) VALUES( 3, 'Document3', 'WordDoc3.doc');

COMMIT;

 

建立全文索引,使用参数Datastore FilterLexer Filter可以帮助Oracle识别不同格式文件,可以是文本文件,Word文档,Pdf文档等。Lexer用来保证可以很好的从文件中索引中文信息。

 

CREATE INDEX mydocs_text_index ON mydocs(thefile) INDEXTYPE IS ctxsys.context
    PARAMETERS('datastore my_datastore_prefs Filter ctxsys.info_filter Lexer my_lexer');
 
--
-- 测试是否索引文件成功
-- 
SELECT id,title
  FROM mydocs

 WHERE contains( thefile, '你好' ) > 0;

 

 

指定路径带来的相关问题:

CTX_DDL.SET_ATTRIBUTE( 'my_datastore_prefs', 'path', 'c:\TEMP;c:\docs' );
如果在2个目录中均有同名的文件1.doc,如果在thefile列中保存的仅仅是文件名称 1.doc,则Oracle顺序查找路径下的文件,这样就会索引2次在C:\TEMP下的文件1.doc. 我们可以通过加上文件的路径信息。
 
在维护文档修改的时候同步索引的问题:
如果你修改了路径下面的某个文件的内容,加入了文本或者删除了文本,Oracle在同步的时候不会察觉到文档的内容的修改。有一个方法可以保证同步:
 
修改了内容之后,更新一下表thefile的信息,但仍保证文本路径不变。
 
UPDATE mydocs
   SET thefile = 'c:\source.doc'
 WHERE thefile = 'c:\source.dco';
 
再次执行同步索引的时候,Oracle才会保持文档内容同步。
 
关于建立以及同步索引的时候发生的错误信息可以从ctx_user_index_errors用户视图中查看。

 

8Oracle Text 支持检索对网页的文本检索

通过在表里面存储网络上各种格式的文本文件,HTML文件的路径UrlOracle在建立索引的时候,可以顺着Url读取文件的流信息,并且将索引存储在磁盘上。这样通过本地查找索引可以获得有用的网页的Url

通过自定义Datastore选项,指定URL_DATASTORE 类型。它支持Http访问,和Ftp访问,本地文件系统的访问。

存储在文本列中的Url格式如下:
 
[URL:]://[:]/[]

access_scheme 字符串可以是ftp http 或者file. 例如:



―――――――――――――――――――――――――――

注:

login:password@ 格式的语法只有在Ftp访问形式下才有效

―――――――――――――――――――――――――――

URL_DATASTORE 参数

URL_DATASTORE的一些参数,其中timeoutproxy是经常用到的:

属性

属性值

timeout

Specify the timeout in seconds. The valid range is 15 to 3600 seconds. The default is 30.这个参数根据网络性能调整。

maxthreads

Specify the maximum number of threads that can be running simultaneously. Use a number between 1and 1024. The default is 8.

Urlsize

Specify the maximum length of URL string in bytes. Use a number between 32 and 65535. The default is 256.

maxurls

Specify maximum size of URL buffer. Use a number between 32 and 65535. The defaults is 256.

maxdocsize

Specify the maximum document size. Use a number between 256 and 2,147,483,647 bytes (2 gigabytes). The defaults is 2,000,000.

http_proxy

Specify the host name of http proxy server. Optionally specify port number with a colon in the form hostname:port

ftp_proxy

Specify the host name of ftp proxy server. Optionally specify port number with a colon in the form hostname:port.

no_proxy

Specify the domain for no proxy server. Use a comma separated string of up to 16 domain names.

 
 
索引建立过程:
首先建立自己的URL_DATASTORE选项。如下指定了代理,Timeout时间。

BEGIN
   ctx_ddl.create_preference ('URL_PREF', 'URL_DATASTORE');
   ctx_ddl.set_attribute ('URL_PREF', 'Timeout', '300');
END;

建立存储Url路径的表:

CREATE TABLE urls(id NUMBER PRIMARY KEY, url VARCHAR2(2000));
INSERT INTO urls
     VALUES (1, 'http:// ');
INSERT INTO urls
     VALUES (2, '');
COMMIT ;
 
建立索引,索引Html文件可以使用HTML_SECTION_GROUP:

CREATE INDEX datastores_text ON urls ( url )  
INDEXTYPE IS ctxsys.CONTEXT   PARAMETERS (
 'Datastore URL_PREF Lexer my_lexer Section group ctxsys.HTML_SECTION_GROUP' );

SELECT token_text
  FROM dr$datastore_text$i;
 

关于建立以及同步索引的时候发生的错误信息可以从ctx_user_index_errors用户视图中查看。


9. 常见错误
下面就一些常见的错误信息给出解释和解决办法:

 

1sync 失败

 

DRG-10595: ALTER INDEX T_DOC6_CT失败

DRG-50857: Oracle error in drsxsopen

ORA-01480: STR 赋值变量缺少空后缀

 

解决:这是8i的一个bug, 但可以避免它,方法是在同步之前先发一个语句:

 

ALTER SESSION SET nls_language=american;
 

2create index 失败

ORA-29855: 执行 ODCIINDEXCREATE 例行程序时出错

ORA-20000: interMedia Text 错误:

ORA-06512: "CTXSYS.DRUE", line 126

ORA-06512: "CTXSYS.TEXTINDEXMETHODS", line 54

ORA-06512: line 1

 

解决:这是8.1.6.3之前的版本的一个bug, 在处理中文时,某个特殊字符造成的。向Oracle索取补丁,或者自己去metalink.Oracle.com 下载(需要CSI 号码)。

 

3create index 失败

 

RA-29855: 执行 ODCIINDEXCREATE 例行程序时出错

ORA-20000: interMedia Text 错误:

DRG-50704: Net8 监听器没有运行或无法启动外部过程

ORA-28575: 无法打开与外部过程代理程序的 RPC 连接

ORA-06512: "CTXSYS.DRUE", line 126

ORA-06512: "CTXSYS.TEXTINDEXMETHODS", line 54

ORA-06512: line 1

 

解决:明显的extproc配置不当。仔细阅读本文基本设置的第二步。

 

4.访问建有索引的表时失败

 

ora-29861: 域索引标记为loading/failed/unusable

 

解决:这是该表的一个intermedia索引有问题,该索引要么没有正确建立,要么是某次同步失败导致它状态异常。先查到是哪个索引:

SELECT idx_name, idx_status

  FROM ctxsys.ctx_indexes;

然后同步该索引或者强制删除它:
 

重建:ALTER INDEX myindex REBUILD ONLINE PARAMETERS('sync');

删除:DROP INDEX myindex FORCE;

 

5.使用chinese_lexer失败

ERROR at row 1:

ORA-29855: err on ODCIINDEXCREATE 

ORA-20000: interMedia Text err:

DRG-10502: index 1386 is not existing.

DRG-11102: the lexer cann't analyze as SIMPLIFIED CHINESE_CHINA.ZHS16GBK

ORA-06512: "CTXSYS.DRUE", line 126

ORA-06512: "CTXSYS.TEXTINDEXMETHODS", line 54

ORA-06512: line 1


解决chinese_lexer 只支持utf8字符集。现在你面临抉择:忍受chinese vgram lexer的愚蠢,或者将数据库字符集改到 utf8, 但面对可能引起你的应用不能正确处理中文的风险(先咨询Oracle support, 并且与你的应用软件提供商联系)。

 

6.升级或应用patch后失败


ORA-29856: err when execute ODCIINDEXDROP

ORA-20000: interMedia Texterr

ORA-06508: PL/SQL: can not find program unit beingcalled

ORA-06512: at "CTXSYS.DRUE", line 126

ORA-06512: at"CTXSYS.TEXTINDEXMETHODS", line 229

ORA-06512: at line 1


解决:这是intermedia的某个object 没有正确产生或者编译。用ctxsys用户登录后,运行:

$Oracle_home/ctx/admin/dr0pkh.sql

$Oracle_home/ctx/admin/dr0plb.sql
以重新产生所有的package.你也可以直接察看dba_objects视图,找出那些属于ctxsys用户并且statusinvalid的东西,重新产生或者重新编译。(你可能会发现有许多这种东西,不要惊讶,Oracle不会因此而崩溃)。

 

7create index 失败

ERROR 位于第 1 :

ORA-29855: 执行 ODCIINDEXCREATE 例行程序时出错

ORA-20000: interMedia Text 错误:

DRG-50857: Oracle error in driddl.IndexResume

ORA-04030: 在尝试分配 524288 字节 (cursor work he,QERHJ Bit vector)时进程内存不足

ORA-06512: "CTXSYS.DRUE", line 126

ORA-06512: "CTXSYS.TEXTINDEXMETHODS", line 214

ORA-06512: line 1


解决:引起这个问题可以有多种原因,首先你可以将sort_area_size这个参数减小到不多于2M,这可以防止Oracle在创建索引时分配太多的sort 内存而耗尽资源。 但如果这不起作用,而且你是8.1.7, 则恭喜,你hit bug 1391737. bug 在你要建索引的字段,如果某条记录的长度超过2000字符时引起Oracle耗尽内存资源。别无它法,除了打 8.1.7.1B 的补丁。

 

10.附录


10.1  ORACLE TEXT
资源:

Oracle Technology Network (OTN) discuss forum

 

10.2 关于索引性能的FAQ

This section answers some of the frequently asked questions about indexing performance.

How long should indexing take?

Answer: Indexing text is a resource-intensive process. Obviously, the speed of indexing will depend on the power of the hardware involved.

As a benchmark, with an average document size of 5K, Oracle Text can index approximately 200 documents per second with the following hardware and parallel configuration:

  • 4x400Mhz Sun Sparc CPUs
  • 4 gig of RAM
  • EMC symmetrix (24 disks striped)
  • Parallel degree of 5 with 5 partitions
  • Index memory of 600MB per index process
  • XML news documents that averaged 5K in size
  • USER_DATASTORE

Other factors such as your document format, location of your data, and the calls to user-defined datastores, filters, and lexers can have an impact on your indexing speed.

Which index memory settings should I use?

Answer: You can set your index memory with the system parameters DEFAULT_INDEX_MEMORY and MAX_INDEX_MEMORY. You can also set your index memory at run time with the CREATE INDEX memory parameter in the parameter string.

You should aim to set the DEFAULT_INDEX_MEMORY value as high as possible, without causing paging.

You can also improve Indexing performance by increasing the SORT_AREA_SIZE system parameter.

Experience has shown that using a large index memory setting, even into hundreds of megabytes, will improve the speed of indexing and reduce the fragmentation of the final indexes. However, if set too high, then the memory paging that occurs will cripple indexing speed.

With parallel indexing, each stream requires its own index memory. When dealing with very large tables, you can tune your database system global area (SGA) differently for indexing and retrieval. For querying, you are hoping to get as much information cached in the system global area's (SGA) block buffer cache as possible. So you should be allocating a large amount of memory to the block buffer cache. But this will not make any difference to indexing, so you would be better off reducing the size of the SGA to make more room for a large index memory settings during indexing.

You set the size of SGA in your Oracle initialization file.

See Also:

to learn more about Oracle Text system parameters.

for more information on setting SGA related parameters.

for more information on memory allocation and setting the SORT_AREA_SIZE parameter.

How much disk overhead will indexing require?

Answer: The overhead, the amount of space needed for the index tables, varies between about 50% of the original text volume and 200%. Generally, the larger the total amount of text, the smaller the overhead, but many small records will use more overhead than fewer large records. Also, clean data (such as published text) will require less overhead than dirty data such as emails or discussion notes, since the dirty data is likely to include many unique words from mis-spellings and abbreviations.

A text-only index is smaller than a combined text and theme index. A prefix and substring index makes the index significantly larger.

How does the format of my data affect indexing?

Answer: You can expect much lower storage overhead for formatted documents such as Microsoft Word files since such documents tend to be very large compared to the actual text held in them. So 1GB of Word documents might only require 50MB of index space, whereas 1GB of plain text might require 500MB, since there is ten times as much plain text in the latter set.

Indexing time is less clear-cut. Although the reduction in the amount of text to be indexed will have an obvious effect, you must balance this out against the cost of filtering the documents with the INSO filter or other user-defined filters.

Can I index in parallel?

Answer: Yes, you can index in parallel. Parallel indexing can improve index performance when you have a large amount of data, and have multiple CPUs.

You use the PARALLEL keyword when creating the index:

CREATE INDEX index_name ON table_name (column_name)

INDEXTYPE IS ctxsys.context PARAMETERS ('...') PARALLEL 3;

 

This will create the index with up to three separate indexing processes depending on your resources.


Note:

It is no longer necessary to create a partitioned table to index in parallel as was the case in earlier releases.


 


Note:

When you create a local index in parallel as such (which is actually run in serial), subsequent queries are processed in parallel by default. Creating a non-partitioned index in parallel does not turn on parallel query processing.

Parallel querying degrades query throughput especially on heavily loaded systems. Because of this, Oracle recommends that you disable parallel querying after indexing. To do so, use ALTER INDEX NOPARALLEL.



How do I create a local partitioned index in parallel?

Answer: You can improve indexing performance by creating a local index in parallel.

However, currently you cannot create a local partitioned index in parallel using the PARALLEL parameter with CREATE INDEX. In such cases the parameter is ignored and indexing proceeds serially.

To create a local index in parallel, create an unusable index first, then run the DBMS_PCLXUTIL.BUILD_PART_INDEX utility.

In this example, the base table has three partitions. We create a local partitioned unusable index first, the run the DBMS_PCLUTIL.BUILD_PART_INDEX, which builds the 3 partitions in parallel (inter-partition parallelism). Also inside each partition, index creation is done in parallel (intra-partition parallelism) with a parallel degree of 2.

CREATE INDEX tdrbip02bx ON tdrbip02b(text)

indextype is ctxsys.context local (partition tdrbip02bx1,

                                   partition tdrbip02bx2,

                                   partition tdrbip02bx3)

unusable;

 

exec dbms_pclxutil.build_part_index(3,2,'TDRBIP02B','TDRBIP02BX',TRUE);

How can I tell how far my indexing has got?

Answer: You can use the CTX_OUTPUT.START_LOG procedure to log output from the indexing process. Filename will normally be written to $ORACLE_HOME/ctx/log, but you can change the directory using the LOG_DIRECTORY parameter in CTX_ADM.SET_PARAMETER.

 

See Also:
to learn more about using this procedure.

 

 

8.3Frequently Asked Questions About Updating the Index

 

This section answers some of the frequently asked questions about updating your index and related performance issues.

How often should I index new or updated records?

Answer: How often do you need to? The less often you run reindexing with CTX_DLL.SYNC_INDEX then the less fragmented your indexes will be, and the less you will need to optimize them.

However, this means that your data will become progressively more out of date, which may be unacceptable for your users.

Many systems are OK with overnight indexing. This means data that is less than a day old is not searchable. Other systems use hourly, ten minute, or five minute updates.

See Also:

to learn more about using CTX_DDL.SYNC_INDEX.

in

How can I tell when my indexes are getting fragmented?

Answer: The best way is to time some queries, run index optimization, then time the same queries (restarting the database to clear the SGA each time, of course). If the queries speed up significantly, then optimization was worthwhile. If they don't, you can wait longer next time.

You can also use CTX_REPORT.INDEX_STATS to analyze index fragmentation.

See Also:

to learn more about using the CTX_REPORT package.

in .

Does memory allocation affect index synchronization?

Answer: Yes, the same way as for normal indexing. But of course, there are often far fewer records to be indexed during a synchronize operation, so it is not usually necessary to provide hundreds of megabytes of indexing memory

 

104示例JSP代码

This section describes the JSP web application.

Web Application Prerequisites

This application has the following requirements:

·          Your Oracle database (version 8.1.6 or higher) is up and running.

·          You have a web server such as Apache up and running and correctly configured to send requests to the Oracle9i server.

JSP Sample Code: search_html.jsp

 

 
<%@ page import="java.sql.* , Oracle.jsp.dbutil.*" %>
 
 
<%
  String connStr="jdbc:Oracle:thin:@localhost:1521:betadev";
 
  java.util.Properties info = new java.util.Properties();
 
  Connection conn = null;
  ResultSet  rset = null;
  Statement  stmt = null;
 
 
    if (name.isEmpty()) { %>
 
      
       search1 Search
       
       
         
         Search for:
         
         
         
       
       
     
     
 
   <%
   }
   else {
   %>
 
    
       Search
       
       
         
         Search for:
          size=30>
         
         
       
 
   <%
     try {
 
       DriverManager.registerDriver(new Oracle.jdbc.driver.OracleDriver() );
       info.put ("user", "ctxdemo");
       info.put ("password","ctxdemo");
       conn = DriverManager.getConnection(connStr,info);
 
       stmt = conn.createStatement();
       String theQuery =   request.getParameter("query");
 
       String myQuery = "select /*+ FIRST_ROWS */ rowid, tk, title,  score(1) 
scr from search_table where contains(text, '"+theQuery+"',1 ) > 0 order by 
score(1) desc";
       rset = stmt.executeQuery(myQuery);
 
       String color = "ffffff";
       int myTk = 0;
       String myTitle = null;
       int myScore = 0;
       int items = 0;
       while (rset.next()) {
         myTk = (int)rset.getInt(2);
         myTitle = (String)rset.getString(3);
         myScore = (int)rset.getInt(4);
         items++;
 
         if (items == 1) {
    %>
 
            
               
                  
                    
                    
                  
    <%   } %> 
 
         
           
           
         
 
   <%
         if (color.compareTo("ffffff") == 0)
               color = "eeeeee";
             else
               color = "ffffff";
 
 
       }
     } catch (SQLException e) {
     %>
        Error:  <%= e %>

     <%
     } finally {
       if (conn != null) conn.close();
       if (stmt != null) stmt.close();
       if (rset != null) rset.close();
     }
     %>
     
ScoreTitle
<%= myScore %>% <%= myTitle %>
           
     
     
     <%

   

 

(END)

阅读(3571) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~