Chinaunix首页 | 论坛 | 博客
  • 博客访问: 217668
  • 博文数量: 57
  • 博客积分: 1376
  • 博客等级: 中尉
  • 技术积分: 658
  • 用 户 组: 普通用户
  • 注册时间: 2010-05-09 09:51
文章分类

全部博文(57)

文章存档

2012年(1)

2011年(56)

分类: Oracle

2011-10-27 20:18:45

在一个客户处,发现其一套Oracle 10.2.0.4的RAC,突然的trace文件增大迅速。
在查看其系统事件状态时,发现并无相应event被激活。 查看方法:请查看这里
第一种可能的情况排除掉。 还有可能就是oracle报错带出的dump。

先说说下这几个上G的文件的内容特点,最开头会有一句话:“kksSearchChildList4 dump”,然后,后面就是一些堆栈的信息,紧接着大部分内容都是system state dump的内容。这上G的dump让人看了恶心。。。

看到一篇对于此事件的研究,我认为 分析得还是比较到位的:
摘自:



Waiting For KKSFBC CHILD COMPLETION?

客 户有一套AIX 5.3上的10.2.0.4.5生产库系统,最近频繁出现”KKSFBC CHILD COMPLETION”等待,同时导致session不断spin消耗CPU并hang住,从表象看这似乎是由bug引起的。以KKSFBC CHILD COMPLETION为关键字到MOS查询可以找到,该Bug的症状为进程不断spin且hang住、出现’KKSFBC CHILD COMPLETION’等待事件、还可能伴有’Waits for “cursor: pin S”‘等待事件,直接影响的版本有11.1.0.6、10.2.0.3和10.2.0.4。

对于该Bug的描述是在发生’kksfbc child completion’等待事件后会话陷入无休止的自旋(spins)中,这种自旋(spins)发生在由堆栈调用(stack call)kksSearchChildList->kkshgnc陷入对kksSearchChildList函数的无限循环中。

就当前用户提供的版本号及等待事件信息仍不足以定位到该Bug,我们需要更详细的stack call。所幸的是这个trouble是可以重现的(reproduceable),在之后的一次案发现场我们得到了必要的信息:


  1. Name PID CPU% PgSp Owner
  2. oracle 3723390 10.0 7.0 oracle

  3. SQL> oradebug setospid 3723390
  4. Oracle pid: 155, Unix process pid: 3723390, image: oracle@lmsdb1

  5. SQL> oradebug unlimit;
  6. Statement processed.

  7. SQL> oradebug short_stack;
  8. ksdxfstk+002c<-ksdxcb+04e4<-sspuser+0074<-000044F0
  9. <-kksfbc+0b9c<-kkspsc0+0ffc<-kksParseCursor+00d4
  10. <-opiosq0+0ae0<-kpooprx+0168<-kpoal8+0400<-opiodr+0ae0<-ttcpip+1020<-opitsk+1124
  11. <-opiino+0990<-opiodr+0ae0<-opidrv+0484<-sou2o+0090<-opimai_real+01bc<-main+0098<-__start+0098

  12. SQL> oradebug dump processstate 10;
  13. Statement processed.

  14. SQL> oradebug dump systemstate 266;
  15. Statement processed.

  16. =========================process state dump=========================

  17.     SO: 7000003f72e3378, type: 4, owner: 7000003f225c798, flag: INIT/-/-/0x00
  18.     (session) sid: 270 trans: 0, creator: 7000003f225c798, flag: (41) USR/- BSY/-/-/-/-/-
  19.               DID: 0001-009B-0000017F, short-term DID: 0000-0000-00000000
  20.               txn branch: 0
  21.               oct: 0, prv: 0, sql: 700000243ef9540, psql: 7000002fe76d9c0, user: 28/OLSUSER
  22.     service name: SYS$USERS
  23.     O/S info: user: newprepay, term: unknown, ospid: 1234, machine: newprepayC
  24.               program: JDBC Thin Client
  25.     application name: JDBC Thin Client, hash value=2546894660
  26.     last wait for 'kksfbc child completion' blocking sess=0x0 seq=4397 wait_time=48840 seconds since wait started=144918
  27.                 =0, =0, =0
  28.     Dumping Session Wait History
  29.      for 'kksfbc child completion' count=1 wait_time=48840
  30.                 =0, =0, =0

可以从以上trace中看到会话确实曾长时间处于’kksfbc child completion’等待中,之后陷入无限自旋(spins)中消耗了大量CPU时间。但这里实际的表现又存有差异,引发无限循环的函数是kksfbc 而不是kksSearchChildList(常规的调用序列是:kksParseCursor->kkspsc0->kksfbc ->kksSearchChildList->kkshgnc)。

kksfbc意为K[Kernel]K[Kompile]S[Shared]F[Find]B[Best]C[Child]该函数用以在软解析时找 寻合适的子游标,在10.2.0.2以后引入了mutex互斥体来取代原有的Cursor Pin机制,Mutex较Latch更为轻量级。著名的Tanel Poder指出虽然mutex的引入改变了众多cursor pin的内部机制,但kksfbc仍需要持有library cache latches才能扫描library cache hash chains(However the traversing of library cache hash chains (the right child cursor lookup using kksfbc()) was still protected by library cache latches)。另一方面当kksfbc函数针对某个parent cursor找到合适child cursor后,可能使用KKSCHLPINx方法将该child cursor pin住,这个时候就需要exclusive地持有该child cursor相应的mutex,如:


  1. SQL> select mutex_addr,requesting_session rs,blocking_session bs,location from x$mutex_sleep_history;

  2. MUTEX_ADDR RS BS LOCATION
  3. ---------------- ---------- ---------- ----------------------------------------
  4. 00000000A3CF8928 159 148 kksfbc [KKSCHLPIN1]
  5. 00000000A3CEA8A8 159 0 kksfbc [KKSCHLPIN1]
  6. 00000000A3CF5508 159 0 kksfbc [KKSCHLPIN1]
  7. 00000000A3CF10E8 148 159 kksLockDelete [KKSCHLPIN6]

当有一个进程执行kksfbc,而其他进程可能需要陷入’kksfbc child completion’等待中(更多的是cursor:pin S等待事件),但这种等待一般是十分轻微的,你很难从某个”仪表”上观察到这一事件。因为一系列kksfbc相关的bug仅发生在10.2.0.2以后, 可以猜测是由于mutex的引入引起的。

打破’kksfbc child completion’异常等待的一种行之有效的workaround方法是设置隐藏参数_use_kks_mutex_pin为false,即使用传统 的由latch保护的Cursor pin;设置该hidden parameter需要重启实例,具体的设置方法如下:


  1. alter system set "_kks_use_mutex_pin"=false scope=spfile;

  2. restart instance...........

但这种方法也仅仅是行之有效,而非万试万灵。从消极的角度来说,因为放弃了mutex保护cursor pin的机制,在解析频繁的系统中library cache latch的争用将白热化,有可能成为Top 5等待事件。

另一种值得推荐的workaround是减少硬编码的SQL语句,使用session_cached_cursor减少实际的软解析操作也有助于避免引发该Bug;当然这要求我们修改代码,但从长远来看这种改良是值得的。

最后Oracle在10.2.0.4上提供了该Bug的one-off ,其在10.2.0.4 psu4以后的等价补丁为(Equivalent patch)为merge patch 9696904:

8557428 9696904 7527908 Both fixes are needed. 6795880 superceded by 8575528 in 9696904 which includes extra files so may cause new conflicts

但merge patch 9696904目前仅有Linux x86/64平台上的版本,而问题数据库所在平台为IBM AIX on POWER Systems (64-bit)。如果要通过补丁来fix这个问题的话,AIX平台的用户可以要求Oracle development开发部门build一个Power版的9696904 patch,也可以升级到10.2.0.5上去;

注意目 前有IBM AIX on POWER Systems (64-bit)平台上10.2.0.4.4的版本,经过验证该one-off patch可以在10.2.0.4 PSU4以后的版本上实施(包括10.2.0.4.5/10.2.0.4.6等),不需要如以上描述地去apply 9696904这个merge patch。当然升级到10.2.0.5/11.1.0.7或以上版本依然有效。

此外apply以上8575528:后需要修改隐藏参数”_cursor_features_enabled”并重启才能使fix生效,具体设置防范如下:


  1. ALTER SYSTEM SET "_cursor_features_enabled"=10 scope=spfile;

  2. restart instance.............

注意以上参数仅在使用one-off patch时需要设置,而当通过升级到10.2.0.5/11.1.0.7来修复问题时无需设置上述”_cursor_features_enabled”参数。


4 comments to Waiting For KKSFBC CHILD COMPLETION?

  • Getting ORA-00600 [kkshgnc-nextchild] With Wait For [kksfbc child completion]
    Applies to:

    Oracle Server – Enterprise Edition – Version: 10.2.0.4 and later [Release: 10.2 and later ]
    All Platforms
    Symptoms

    The following error can be raised when using 10.2.0.4 (or lower releases of 10gR2) or 11.1.0.6:

    ORA-00600: internal error code, arguments: [kkshgnc-nextchild], [], [], [], [], [], [], []

    The call stack looks similar to:

    ksedmp kgerinv kgeasnmierr kkshgnc kksSearchChildList kksfbc kkspsc0 kksParseCursor opiosq0 kpooprx kpoal8 opiodr ttcpip opitsk opiino opiodr opidrv sou2o opimai_real main

    and the trace file shows waits for “[kksfbc child completion]” similar to:

    KSL WAIT END [latch: cache buffers chains]
    KSL WAIT BEG [kksfbc child completion]
    KSL WAIT END [kksfbc child completion]
    KSL WAIT BEG [cursor: pin S wait on X]
    KSL WAIT END [cursor: pin S wait on X]

    Cause

    The cause of this problem has been identified in Bug 6795880. It is caused by a session going into an infinite spin just after a wait for ‘kksfbc child completion’. The spin can be identified in the call stack by the following routines: kksSearchChildList -> kkshgnc, where kksSearchChildList loops forever.

    This problem can also lead to internal error such as:
    ORA-600 [kksSearchChildList1]
    ORA-600 [kksSearchChildList2]
    ORA-600 [kksSearchChildList3]
    ORA-600 [kkshgnc-nextchild]
    Solution

    Bug 6795880 is fixed in RDBMS 10.2.0.5 patchset (only available for Linux x86 and Linux x64 platforms at time of publishing this article, i.e. MAY 2010) and in 11.1.0.7 patchset. There is no known workaround for this bug.

    Please execute both the actions in Step A and in Step B as follows to resolve this issue.

    A) As this bug introduces the problem described in unpublished Bug 8575528, download and apply Patch 8575528. Steps for obtaining the patch

    1) Go to My Oracle Support
    2) Activate the Patches & Updates tab
    3) Click on Patch ID or Number
    4) Enter the patch number 8575528 in the text field
    5) Select your platform
    6) Click on Search

    AND

    B) Set the following instance parameter and restart the instance.Please make a note that this parameter is needed to enable the fix. Just applying Patch is not sufficient. When using an SPFILE, issue:

    ALTER SYSTEM SET “_cursor_features_enabled”=10 scope=spfile;

    and restart the instance.

    For PFILE usage, set following parameter in PFILE and restart the instance.

    _cursor_features_enabled=10

    The parameter is only necessary when installing the one-off fix. When installing the 10.2.0.5 or 11.1.0.7 patch sets there is no need for this parameter.

  • Bug 6795880 Session spins / OERI after ‘kksfbc child completion’ wait – superceded

    This note gives a brief overview of bug 6795880.
    The content was last updated on: 05-APR-2011
    Click here for details of each of the sections below.
    Affects:

    Product (Component) Oracle Server (Rdbms)
    Range of versions believed to be affected Versions BELOW 11.2
    Versions confirmed as being affected
    11.1.0.6
    10.2.0.4
    10.2.0.3
    Platforms affected Generic (all / most platforms affected)

    Note that this fix can cause / expose the problem described in Bug:8575528

    Note that this fix has been superceded by the fix in Bug:8575528
    Fixed:

    This issue is fixed in
    11.1.0.7 (Server Patch Set)
    10.2.0.5 (Server Patch Set)
    10.2.0.4 Patch 5 on Windows Platforms
    Symptoms:

    Related To:

    Hang (Process Spins)
    Internal Error May Occur (ORA-600)
    ORA-600 [kksSearchChildList1]
    ORA-600 [kksSearchChildList2]
    ORA-600 [kksSearchChildList3]
    ORA-600 [kkshgnc-nextchild]
    Dump in or under kksSearchChildList4
    Waits for “cursor: pin S”
    Waits for “kksfbc child completion”
    This fix is DISABLED by default
    _CURSOR_FEATURES_ENABLED
    Description

    This fix has been superseded by bug:8575528.

    A session may go into an infinite spin just after a wait
    for ‘kksfbc child completion’. The spin occurs with
    a stack including kksSearchChildList -> kkshgnc where
    kksSearchChildList loops forever.

    This problem can also lead to internal error such as any of
    ORA-600 [kksSearchChildList1], ORA-600 [kksSearchChildList2]
    ORA-600 [kksSearchChildList3], ORA-600 [kkshgnc-nextchild]

    Or Trace dumps on kksSearchChildList4 .

    Note:
    This fix is disabled by default in 10g.
    To enable this fix you must explicitly set the following
    parameter for instance startup:
    “_cursor_features_enabled” = 10

  • Applying p8575528_102044_Linux-x86-64.zip on PSU 5 10.2.0.4.5

    [oracle@rh2 patches]$ cp /home/oracle/p8575528_102044_Linux-x86-64.zip ./
    [oracle@rh2 patches]$ unzip p8575528_102044_Linux-x86-64.zip
    Archive: p8575528_102044_Linux-x86-64.zip
    creating: 8575528/
    creating: 8575528/files/
    creating: 8575528/files/lib/
    creating: 8575528/files/lib/libserver10.a/
    inflating: 8575528/files/lib/libserver10.a/kks1.o
    inflating: 8575528/files/lib/libserver10.a/kksc.o
    inflating: 8575528/files/lib/libserver10.a/kksh.o
    inflating: 8575528/files/lib/libserver10.a/ksmp.o
    creating: 8575528/etc/
    creating: 8575528/etc/config/
    inflating: 8575528/etc/config/inventory
    inflating: 8575528/etc/config/actions
    creating: 8575528/etc/xml/
    inflating: 8575528/etc/xml/GenericActions.xml
    inflating: 8575528/etc/xml/ShiphomeDirectoryStructure.xml
    inflating: 8575528/README.txt
    [oracle@rh2 patches]$ cd 8575528/
    [oracle@rh2 8575528]$ $ORACLE_HOME/OPatch/opatch apply
    Invoking OPatch 11.2.0.1.3

    Oracle Interim Patch Installer version 11.2.0.1.3
    Copyright (c) 2010, Oracle Corporation. All rights reserved.

    Oracle Home : /s01/db_1
    Central Inventory : /s01/oraInventory
    from : /etc/oraInst.loc
    OPatch version : 11.2.0.1.3
    OUI version : 10.2.0.4.0
    OUI location : /s01/db_1/oui
    Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-09_22-03-44PM.log

    Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt

    ApplySession applying interim patch ’8575528′ to OH ‘/s01/db_1′

    Running prerequisite checks…
    Provide your email address to be informed of security issues, install and
    initiate Oracle Configuration Manager. Easier for you if you use your My
    Oracle Support Email address/User Name.
    Visit for details.
    Email address/User Name:

    You have not provided an email address for notification of security issues.
    Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Yes

    OPatch detected non-cluster Oracle Home from the inventory and will patch the local system only.

    Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
    (Oracle Home = ‘/s01/db_1′)

    Is the local system ready for patching? [y|n]
    y
    User Responded with: Y
    Backing up files and inventory (not for auto-rollback) for the Oracle Home
    Backing up files affected by the patch ’8575528′ for restore. This might take a while…
    Backing up files affected by the patch ’8575528′ for rollback. This might take a while…

    Patching component oracle.rdbms, 10.2.0.4.0…
    Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kks1.o”
    Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kksc.o”
    Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/kksh.o”
    Updating archive file “/s01/db_1/lib/libserver10.a” with “lib/libserver10.a/ksmp.o”
    Running make for target ioracle
    ApplySession adding interim patch ’8575528′ to inventory

    Verifying the update…
    Inventory check OK: Patch ID 8575528 is registered in Oracle Home inventory with proper meta-data.
    Files check OK: Files from Patch ID 8575528 are present in Oracle Home.

    The local system has been patched and can be restarted.

    OPatch succeeded.

    [oracle@rh2 8575528]$ $ORACLE_HOME/OPatch/opatch lsinventory
    Invoking OPatch 11.2.0.1.3

    Oracle Interim Patch Installer version 11.2.0.1.3
    Copyright (c) 2010, Oracle Corporation. All rights reserved.

    Oracle Home : /s01/db_1
    Central Inventory : /s01/oraInventory
    from : /etc/oraInst.loc
    OPatch version : 11.2.0.1.3
    OUI version : 10.2.0.4.0
    OUI location : /s01/db_1/oui
    Log file location : /s01/db_1/cfgtoollogs/opatch/opatch2011-05-09_22-16-21PM.log

    Patch history file: /s01/db_1/cfgtoollogs/opatch/opatch_history.txt

    Lsinventory Output file location : /s01/db_1/cfgtoollogs/opatch/lsinv/lsinventory2011-05-09_22-16-21PM.txt

    ——————————————————————————–
    Installed Top-level Products (2):

    Oracle Database 10g 10.2.0.1.0
    Oracle Database 10g Release 2 Patch Set 3 10.2.0.4.0
    There are 2 products installed in this Oracle Home.

    Interim patches (4) :

    Patch 8575528 : applied on Mon May 09 22:06:33 CST 2011
    Unique Patch ID: 12363258
    Created on 5 Apr 2010, 16:16:27 hrs PST8PDT
    Bugs fixed:
    8575528
    This patch overlays patches:
    9352164
    This patch needs patches:
    9352164
    as prerequisites

    Patch 7527908 : applied on Fri May 06 22:25:33 CST 2011
    Unique Patch ID: 12546933
    Created on 30 Apr 2010, 12:48:09 hrs PST8PDT
    Bugs fixed:
    7527908
    This patch overlays patches:
    9352164
    This patch needs patches:
    9352164
    as prerequisites

    Patch 9654991 : applied on Fri May 06 21:33:38 CST 2011
    Unique Patch ID: 12816418
    Created on 30 Jun 2010, 18:30:40 hrs PST8PDT
    Bugs fixed:
    9714832, 9713537, 9678695, 9655017, 9173248, 8309642, 9678697, 9678690
    9573054, 9654991
    This patch overlays patches:
    9352164
    This patch needs patches:
    9352164
    as prerequisites

    Patch 9352164 : applied on Fri May 06 21:29:20 CST 2011
    Unique Patch ID: 12307477
    Created on 1 Apr 2010, 11:33:14 hrs PST8PDT
    Bugs fixed:
    9377578, 6418420, 7835247, 7207654, 7592346, 6724797, 7936993, 7331867
    9093300, 7535152, 7552067, 5879114, 8866013, 5457450, 8344348, 7272297
    7136866, 7196894, 6258480, 7013124, 6512622, 6355586, 7196532, 8568395
    8309587, 7557226, 5399699, 6509115, 8568397, 8568398, 7127618, 5701695
    6052226, 7424804, 6817593, 7553884, 6741425, 7513673, 6783812, 8437213
    6452766, 6469211, 7527650, 8309592, 8479537, 5991038, 5686407, 6945157
    7639602, 9119226, 6403091, 7589862, 7552082, 6711853, 8304589, 6052169
    8199266, 6327692, 5756769, 9352191, 7460818, 6268409, 8232056, 6687381
    6972843, 8230457, 6800507, 7027551, 6778714, 6200820, 6645719, 7393804
    6775231, 3934160, 6683178, 6650256, 5085288, 7528105, 7329252, 6378112
    6151380, 6844866, 4723109, 8544896, 5126719, 5890312, 5934363, 7036453
    7610362, 8426816, 8433026, 7270434, 7172531, 6451626, 8247855, 5497338
    6324944, 6874522, 7175513, 6960489, 7341598, 8576156, 6797677, 8342923
    5895190, 6756739, 7150470, 7593835, 7356443, 7044551, 8227106, 4695511
    7298688, 5747462, 8556340, 7197445, 5348308, 7937113, 8341623, 7569205
    8409848, 6053134, 6163771, 6851669, 6181488, 6375150, 6825866, 7210921
    7295780, 6345573, 7033630, 6954722, 6824129, 7523475, 7457766, 7309458
    8324577, 6840740, 6804746, 7375611, 8268054, 6981690, 6512811, 6988017
    7375613, 8344399, 7340448, 8362683, 7375617, 8251247, 5933656, 6005347
    9145204, 6599920, 7238230, 6379441, 6452375, 6352003, 6833965, 7136489
    6610218, 7612639, 6392076, 7225204, 9119194, 5476236, 9442328, 7609057
    7609058, 6605106, 6374297, 6193945, 4693355, 8217795, 7039896, 7432514
    7330909, 6952701, 7190270, 8287155, 7587008, 7207932, 6802650, 7189447
    8481935, 4598439, 9442331, 6615740, 7155655, 6749617, 9442335, 7159505
    5868257, 5727166, 7173005, 6917874, 9442339, 7013768, 7691766, 7385253
    7291739, 7225720, 7257770, 7363767, 7244238, 6941717, 8267348, 7710551
    8354686, 7247217, 8328954, 7299153, 8909984, 6681695, 8702276, 9119284
    8217011, 7661251, 6265559, 6823287, 6991626, 6954829, 5259835, 6500033
    5923486, 7432601, 7022234, 8534387, 5147386, 7697802, 6653934, 7375644
    6490140, 7662491, 8331466, 5623467, 6070225, 6635214, 7396409, 6638558
    7038750, 6714608, 6838714, 6870937, 7219752, 7263842, 7278117, 6882739
    5404871, 8836667, 8373286, 7393292, 6678845, 6903051, 7936793, 6600051
    7155248, 4966512, 7155249, 7197637, 8836308, 8568402, 8568404, 8568405
    8431487, 5704108, 6343150, 7280764, 6923450, 7643632, 6145177, 8836671
    8310931, 6640411, 8347704, 8836675, 7155250, 7155251, 8836677, 7155252
    8836678, 7155253, 8292378, 7155254, 6219529, 7411865, 8227091, 8340379
    7276960, 6145687, 7659217, 5863926, 7022905, 6852598, 7123643, 6596564
    6413089, 6851438, 8836681, 8836683, 8836684, 8836686, 7579469, 7494333
    7315642, 8340383, 6786022, 8340387, 6926448, 7600026, 7462072, 6679303
    8815639, 7197583, 7172752, 7326645, 7008262, 9173244, 9173248, 7573151
    8490879, 7477934, 6725634, 6733655, 6799205, 6980597, 7499353, 6084232
    6014513, 7140204, 7254987, 8833280, 6647480, 6120004, 7693128, 6760697
    6051177, 8247215, 6858062, 7189645, 6844739, 6768251, 7196863, 5630796
    7378661, 7378735, 5970301, 6705822, 8290506, 6658484, 7599944, 9173253
    8309623, 7125408, 7257461, 6987790, 7568556, 6919819, 8886674, 5883691
    6955744, 7801939, 6074620, 7149004, 6857917, 8283650, 6445948, 5929055
    6110752, 7552042, 8210889, 8287504, 6506617, 7306915, 6271590, 5386204
    6976005, 8330783, 7606362, 5377242, 7043989, 8309632, 7575925, 6870047
    8309637, 5902053, 8309639, 7028176, 6827260, 7588384, 4726401, 6720712
    5910650, 6752765, 6971433, 6024730, 8315482, 6628122, 8239142, 9352164
    5695562, 4637902, 7345904, 8309642, 6994160, 8556586, 6404447, 8220734
    6919764, 7597354, 7523787, 6029179, 5231155, 6455659

  • admin

    Earlier there was a dictionary corruption during CTAS in parallel due to inconsistency in SEG$ table. Then customer did PITR. Now the customer is getting an ORA-600 [kksfbc-reparse-infinite-loop] during a data load following a point in time recovery.



阅读(1916) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~