分类: DB2/Informix
2013-07-21 09:23:39
有两个DB2的数据库,由于用于存放归档日志的存储(NBU)出现故障,导致归档出现问题,日志疯涨,现将过程记录如下。
第一个故障的数据库的故障现象:
db2pb8> db2pd -d pb8 -logs
Database Partition 0 -- Database PB8 -- Active -- Up 82 days 09:58:53 -- Date 07/19/2013 09:25:28
Logs:
Current Log Number 447561
Pages Written 30864
Cur Commit Disk Log Reads 0
Cur Commit Total Log Reads 0
Method 1 Archive Status Success
Method 1 Next Log to Archive 447505
Method 1 First Failure n/a
Method 2 Archive Status n/a
Method 2 Next Log to Archive n/a
Method 2 First Failure n/a
Log Chain ID 3
Current LSN 0x000033A33F07840B
Address StartLSN State Size Pages Filename
0x0A000307E64C87B0 000033A3200E8010 0x00000000 32000 32000 S0447558.LOG
db2pd发现归档没有问题(success),但是,间隔几分钟执行,发现,Next Log to Archive始终是447505
db2pb8> db2pd -d pb8 -logs
Database Partition 0 -- Database PB8 -- Active -- Up 82 days 10:00:07 -- Date 07/19/2013 09:26:42
Logs:
Current Log Number 447561
Pages Written 30952
Cur Commit Disk Log Reads 0
Cur Commit Total Log Reads 0
Method 1 Archive Status Success
Method 1 Next Log to Archive 447505
Method 1 First Failure n/a
Method 2 Archive Status n/a
Method 2 Next Log to Archive n/a
Method 2 First Failure n/a
Log Chain ID 3
Current LSN 0x000033A33F0D0DC4
Address StartLSN State Size Pages Filename
0x0A000307E64C87B0 000033A3200E8010 0x00000000 32000 32000 S0447558.LOG
查看归档历史
db2 list history archive log since 20130719 for pb8 | more
List History File for pb8
Number of matching file entries = 151
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719000119 1 O S0447420.LOG C0000003
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719000119
End Time: 20130719000452
Status: A
----------------------------------------------------------------------------
EID: 450564 Location: /usr/openv/netbackup/bin/nbdb2.sl64
(以上是成功的)
……………………………………….
(以下是失败的)
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719081539 P D S0447559.LOG C0000003
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719081539
End Time:
Status: A
----------------------------------------------------------------------------
EID: 450712 Location: /db2/PB8a/log_dir/NODE0000/S0447559.LOG
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719083717 P D S0447560.LOG C0000003
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719083717
End Time:
Status: A
----------------------------------------------------------------------------
EID: 450713 Location: /db2/PB8a/log_dir/NODE0000/S0447560.LOG
可以发现,失败的归档,没有end time, 而且,归档的路径也不对(直接放在活动日志目录了)。
第二个数据库的故障现象:
arlpsap13:db2pb9:PB9:NODE0:/db2/PB9/db2dump> db2pd -d pb9 -logs
Database Partition 0 -- Database PB9 -- Active -- Up 61 days 11:47:29 -- Date 07/19/2013 09:49:19
Logs:
Current Log Number 230447
Pages Written 49791
Cur Commit Disk Log Reads 962
Cur Commit Total Log Reads 8697
Method 1 Archive Status Success
Method 1 Next Log to Archive 230395
Method 1 First Failure 230394
Method 2 Archive Status n/a
Method 2 Next Log to Archive n/a
Method 2 First Failure n/a
Log Chain ID 0
Current LSN 0x000021D79A8478F5
Address StartLSN State Size Pages Filename
0x0A000201F8CE5550 000021D77D458010 0x00000000 70000 34333 S0230446.LOG
反复执行几次,发现,日志始终没有归档成功:
arlpsap13:db2pb9:PB9:NODE0:/db2/PB9/db2dump> db2pd -d pb9 -logs
Database Partition 0 -- Database PB9 -- Active -- Up 61 days 11:49:34 -- Date 07/19/2013 09:51:24
Logs:
Current Log Number 230447
Pages Written 50916
Cur Commit Disk Log Reads 962
Cur Commit Total Log Reads 8697
Method 1 Archive Status Success
Method 1 Next Log to Archive 230395
Method 1 First Failure 230394
Method 2 Archive Status n/a
Method 2 Next Log to Archive n/a
Method 2 First Failure n/a
Log Chain ID 0
Current LSN 0x000021D79ACAC25A
Address StartLSN State Size Pages Filename
0x0A000201F8CE5550 000021D77D458010 0x00000000 70000 34333 S0230446.LOG
查看归档历史:
db2 list history archive log since 20130719 for pb9 | more
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719000221 1 O S0230346.LOG C0000000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719000221
End Time: 20130719000426
Status: A
----------------------------------------------------------------------------
EID: 306259 Location: /usr/openv/netbackup/bin/nbdb2.sl64
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719000311 1 O S0230347.LOG C0000000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719000311
End Time: 20130719001544
Status: A
----------------------------------------------------------------------------
EID: 306261 Location: /usr/openv/netbackup/bin/nbdb2.sl64
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719001427 1 O S0230348.LOG C0000000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719001427
End Time: 20130719001708
Status: A
----------------------------------------------------------------------------
EID: 306262 Location: /usr/openv/netbackup/bin/nbdb2.sl64
(以上是正确的归档记录)
…………………………………………………………………………………….
(以下是错误的归档记录)
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719083556 P D S0230445.LOG C0000000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719083556
End Time:
Status: A
----------------------------------------------------------------------------
EID: 306369 Location: /db2/PB9/NODE0000/log_dir/NODE0000/S0230445.LOG
Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID
-- --- ------------------ ---- --- ------------ ------------ --------------
X D 20130719085100 P D S0230446.LOG C0000000
----------------------------------------------------------------------------
----------------------------------------------------------------------------
Comment:
Start Time: 20130719085100
End Time:
Status: A
----------------------------------------------------------------------------
EID: 306370 Location: /db2/PB9/NODE0000/log_dir/NODE0000/S0230446.LOG
可以发现,失败的归档,没有end time, 而且,归档的路径也不对(直接放在活动日志目录了)。
结论:失败的归档,没有end time, 而且,归档的路径也不对(直接放在活动日志目录了)。
存在的问题和疑惑:
为什么两个数据库的db2pd报出的log信息不一致,一个有failure,一个没有?难道是因为dpf的原因?(第一个是单节点,第二个是dpf)。这个还不清楚,需要持续调查。