分类:
2008-05-17 17:09:13
|
实例:解决SQL911 RC=68 lock timeout 问题: 客户抱怨数据库应用程序经常会SQL911 RC=68错误退出。找出原因。 server是FP10,因此不大可能是APAR 思路: 我们知道SQL911 RC=68是lock timeout,因为lock timeout和deadlock不同,没有event monitor可以抓到它,我们只能根据TIMEOUT VALUE来写脚本循环作snapshot…… 然后就要决定究竟什么snapshot对我们有帮助 首先snapshot for applications , snapshot for locks不用说了,然后由于已经是8.2了,我们可以用db2pd获取更加详细的信息。 客户的locktimeout=60,所以有脚本如下: while [ 1 -eq 1 ]; do wc= $(db2 "list applications show detail" | grep -i "Lock-Wait" | wc -l) if [ $wc -gt 0 ]; then db2pd -db dwbllog -locks showlocks > 58121.locks db2pd -db dwbllog -transactions > 58121.trans db2pd -db dwbllog -agents > 58121.agents db2pd -db dwbllog -applications > 58121.app db2 "get snapshot for locks on dwbllog global" > 58121.locksnap db2 "get snapshot for applications on dwbllog global" > 58121.appsnap db2pd -db dwbllog -locks showlocks > 58121.locks.2 print "Done snapshots" fi sleep 10 done 经过2个小时lock timeout再次出现,客户拿到最后一次的snapshot,并且出错退出进程为66681 分析: 首先是看application snapshot: Application Snapshot Application handle = 66681 Application status = Lock-wait Status change time = 04/19/2006 09:54:40.879548 Application code page = 1208 Application country/region code = 1 DUOW correlation token = G93D0D4A.AF0E.0DE779133109 Application name = java Application ID = G93D0D4A.AF0E.0DE779133109 Sequence number = 0001 TP Monitor client user ID = TP Monitor client workstation name = TP Monitor client application name = TP Monitor client accounting string = 。。。。 这里Application status = Lock-wait看起来没有错误,继续往下: Subsection waiting for lock = 0 ID of agent holding lock = 65836 Application ID holding lock = G93D0E35.A0E8.057C39131247 Database partition lock wait occurred on = 1 Lock name = 0x53595353483230320619C45A41 Lock attributes = 0x00000000 Release flags = 0x40000000 Lock object type = Internal Plan Lock Lock mode = Exclusive Lock (X) Lock mode requested = Share Lock (S) Lock wait start timestamp = 04/19/2006 09:54:40.879549 看出我们正在等65836的一个X lock,这里S等X很正常,不过Internal Plan Lock的X就比较有疑问了。 我们去65836的locksnapshot看看: Application handle = 65836 Application ID = G93D0E35.A0E8.057C39131247 Sequence number = 0001 Application name = java CONNECT Authorization ID = WCONN Application status = UOW Waiting Status change time = 04/19/2006 09:55:04.481159 Application code page = 1208 Locks held = 56 Total wait time (ms) = 0 List Of Locks Database partition = 1 Lock Name = 0x5359534C48323030BB989F3241 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 2 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = X Database partition = 1 Lock Name = 0x53595353483230313F94F89E41 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 1 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = S Database partition = 1 Lock Name = 0x53514C4445464C540763DD2841 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 1 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = S Database partition = 1 Lock Name = 0x53595353483230320619C45A41 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 2 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = X Database partition = 1 Lock Name = 0x535953534832303028EFECDC41 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 1 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = S Database partition = 2 Lock Name = 0x535953534832303028EFECDC41 Lock Attributes = 0x00000000 Release Flags = 0x40000000 Lock Count = 1 Hold Count = 0 Lock Object Name = 0 Object Type = Internal Plan Lock Mode = S 。。。。(还有其他的N多lock) 我们看到有很多X的Internal Plan Lock,很有意思,从文档中我们知道 - If one attempts an operation that either Binds/Rebinds a package, drops a package, invalidates a package or inoperates a package, the package lock for that package will be acquired in Exclusive mode also for the duration of the unit of work. So, for example if one or more applications are running from a package A, and an attempt is made from another application to drop package A, then the application issuing the drop will be attempting to acquire an X lock on that package, but will have to wait behind the applications that are executing from that package as they have S locks on that package 所以4种可能: 1)正在bind 2)正在drop package 3)invalidate package 4)inoperates a package 我们看一看到底是哪个lock产生的问题。 在db2pd -application中我们66681: 0x07800000003C1C00 66681 [001-01145] 1 3829886 Lock-wait 0 0 108 895 G93D0D4A.AF0E.0DE779133109 然后db2pd -transaction里的66681: 404C8600 66681 [001-01145] 67 4 READ 0x00000000 0x00000000 0x000000000000 0x000000000000 0 0 0x0000350BE23A 1 n/a 得到transaction handle 67 在db2pd -lock中有以下4个lock 0x07800000413C77C0 67 53595353483230313F94F89E41 Internal P ..S G 67 1 0 0x0000 0x40000000 Pkg UniqueID 53595353 48323031 Name 3f94f89e Loading = 0 0x07800000411FC980 67 53514C4445464C540763DD2841 Internal P ..S G 67 1 0 0x0000 0x40000000 Pkg UniqueID 53514c44 45464c54 Name 0763dd28 Loading = 0 0x0780000041235C00 67 53595353483230320619C45A41 Internal P ..S W 30 1 0 0x0000 0x40000000 Pkg UniqueID 53595353 48323032 Name 0619c45a Loading = 0 0x078000004115E0C0 67 535953534832303028EFECDC41 Internal P ..S G 67 1 0 0x0000 0x40000000 Pkg UniqueID 53595353 48323030 Name 28efecdc Loading = 0 正好对应db2pd -transaction里的4 然后同样步骤找到65836的lock。 由于65836的lock很多,我们只需要找到Internal P的X lock: 只有两个: 0x078000004123E200 30 53595353483230320619C45A41 Internal P ..X G 30 2 0 0x0000 0x40000000 Pkg UniqueID 53595353 48323032 Name 0619c45a Loading = 0 0x0780000041241300 30 5359534C48323030BB989F3241 Internal P ..X G 30 2 0 0x0000 0x40000000 Pkg UniqueID 5359534c 48323030 Name bb989f32 Loading = 0 然后对比发现0619c45a 在两个lock里都存在,因而可以断定是这个引起的问题。 然后检查pckage ID: 53595353 48323032 0x53 0x59 0x53 0x53 0x48 0x32 0x30 0x32 在ASCII中是SYSSH202 因此,现在做 select pkgname,last_bind_time from syscat.packages 得到3个SYSSH202 SYSSH202 2006-04-19-10.17.53.372088 SYSSH202 2005-11-15-20.40.59.511091 SYSSH202 2005-11-15-20.34.19.734507 然后做: select * from syscat.packages where pkgname='SYSSH202' 得到: PKGSCHEMA PKGNAME BOUNDBY DEFINER DEFAULT_SCHEMA VALID UNIQUE_ID TOTAL_SECT FORMAT ISOLATION BLOCKING INSERT_BUF LANG_LEVEL FUNC_PATH QUERYOPT EXPLAIN_LEVEL EXPLAIN_MODE EXPLAIN_SNAPSHOT SQLWARN SQLMATHWARN EXPLICIT_BIND_TIME LAST_BIND_TIME CODEPAGE DEGREE MULTINODE_PLANS INTRA_PARALLEL VALIDATE DYNAMICRULES SQLERROR REFRESHAGE REMARKS FEDERATED TRANSFORMGROUP REOPTVAR OS_PTR_SIZE PKGVERSION PKG_CREATE_TIME -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- ----- --------- ---------- ------ --------- -------- ---------- ---------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ------------- ------------ ---------------- ------- ----------- -------------------------- -------------------------- -------- ------ --------------- -------------- -------- ------------ -------- ---------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- ----------- ---------------------------------------------------------------- -------------------------- NULLID SYSSH202 DWBLLOG DWBLLOG DWBLLOG Y SYSLVL01 65 3 CS B N 0 "SYSIBM","SYSFUN","SYSPROC","DWBLLOG" 5 N N Y N 2004-02-14-16.58.21.045669 2006-04-19-10.17.53.372088 819 1 Y N B R N 0.000000 - U - N 64 2004-02-14-16.58.21.045669 NULLIDR1 SYSSH202 DWBLLOG DWBLLOG DWBLLOG Y SYSLVL01 65 3 CS B N 0 "SYSIBM","SYSFUN","SYSPROC","DWBLLOG" 7 N N Y N 2005-11-15-20.34.19.734507 2005-11-15-20.34.19.734507 819 ANY Y N B R N 0.000000 - U - O 32 2005-11-15-20.34.19.734507 NULLIDRA SYSSH202 DWBLLOG DWBLLOG DWBLLOG Y SYSLVL01 65 3 CS B N 0 "SYSIBM","SYSFUN","SYSPROC","DWBLLOG" 7 N N Y N 2005-11-15-20.40.59.511091 2005-11-15-20.40.59.511091 819 ANY Y N B R N 0.000000 - U - A 32 2005-11-15-20.40.59.511091 看来NULLID schema下的package的最新绑定是错误附近,因而我们可以想象估计是绑定包问题。 询问得知最近他们从32bit省级到64bit,然后让客户运行: db2 bind @db2ubind.lst db2 bind @db2cli.lst 问题解决 后备解决方案: DB2_NO_PKG_LOCK=ON 但是产生的副作用过于严重,不推荐 |