分类: DB2/Informix
2013-07-27 21:27:34
我们知道,在进行在线备份时,会启动一个application进行备份,同时这个application会拉起几个EDU来为其服务,共同完成这个备份工作。
下面就来调查一下在在线备份过程中,备份用的application和由其拉起的EDU之间的关系。
1、启动前,查看application的状况
db2 list applications
SQL1611W No data was returned by Database System Monitor.
2、在一个CLP中启动一个在线备份
db2 backup database nit online to /db2/NIT/db2backup
3、另一个CLP中进行观察application的状态
db2 list applications show detail
CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path
Handle Agents partition number pid/thread
-------------------------------------------------------------------------------------------------------------------------------- -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- --------------------
DB2NIT db2fw4 38151 *LOCAL.DB2.130724075437 00001 1 0 9370 Connect Completed 07/24/2013 09:54:28.243810 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2taskd 38144 *LOCAL.DB2.130724075430 00001 1 0 2707 UOW Waiting 07/27/2013 13:58:44.453084 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw3 38150 *LOCAL.DB2.130724075436 00001 1 0 9113 Connect Completed 07/24/2013 09:54:28.242194 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2stmm 38143 *LOCAL.DB2.130724075429 00001 1 0 15486 UOW Waiting 07/27/2013 14:01:48.055214 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw2 38149 *LOCAL.DB2.130724075435 00001 1 0 8856 Connect Completed 07/24/2013 09:54:28.240569 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2evmg_DB2DETAILDEA 38155 *LOCAL.DB2.130724075441 00001 1 0 10398 Connect Completed 07/24/2013 09:54:28.354142 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw1 38148 *LOCAL.DB2.130724075434 00001 1 0 8599 Connect Completed 07/24/2013 09:54:28.238961 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw7 38154 *LOCAL.DB2.130724075440 00001 1 0 10141 Connect Completed 07/24/2013 09:54:28.248635 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2bp 43779 *LOCAL.db2nit.130727120328 00001 5 0 15743 Performing a Backup 07/27/2013 14:03:28.830789 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw0 38147 *LOCAL.DB2.130724075433 00001 1 0 3734 Connect Completed 07/24/2013 09:54:28.237365 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw6 38153 *LOCAL.DB2.130724075439 00001 1 0 9884 Connect Completed 07/24/2013 09:54:28.247030 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2lused 38146 *LOCAL.DB2.130724075432 00001 1 0 3477 UOW Waiting 07/27/2013 13:58:45.254383 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw5 38152 *LOCAL.DB2.130724075438 00001 1 0 9627 Connect Completed 07/24/2013 09:54:28.245419 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2wlmd 38145 *LOCAL.DB2.130724075431 00001 1 0 3220 Connect Completed 07/24/2013 09:54:28.233807 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
注意,application 43779在进行备份,同时为其服务的EID(Number of agents)有5个
使用db2pd查看agents
db2pd -db nit - agents
Option -agents is an instance scope option. The database option has been ignored.
Database Partition 0 -- Active -- Up 15 days 00:10:55 -- Date 07/27/2013 14:03:55
Agents:
Current agents: 22
Idle agents: 0
Active coord agents: 15
Active agents total: 19
Pooled coord agents: 3
Pooled agents total: 3
Address AppHandl [nod-index] AgentEDUID Priority Type State ClientPid Userid ClientNm Rowsread Rowswrtn LkTmOt DBName LastApplId LastPooled
0x0780000000C40080 38143 [000-38143] 15486 0 Coord Inst-Active 13828348 db2nit db2stmm 0 0 NotSet NIT *LOCAL.DB2.130724075429 Wed Jul 24 09:53:09
0x0780000000B260A0 38144 [000-38144] 2707 0 Coord Inst-Active 13828348 db2nit db2taskd 2 0 NotSet NIT *LOCAL.DB2.130724075430 n/a
0x0780000000B00080 38145 [000-38145] 3220 0 Coord Inst-Active 13828348 db2nit db2wlmd 0 0 NotSet NIT *LOCAL.DB2.130724075431 n/a
0x0780000000B060A0 38146 [000-38146] 3477 0 Coord Inst-Active 13828348 db2nit db2lused 0 121 3 NIT *LOCAL.DB2.130724075432 n/a
0x0780000000F70080 38147 [000-38147] 3734 0 Coord Inst-Active 13828348 db2nit db2fw0 0 0 3 NIT *LOCAL.DB2.130724075433 n/a
0x0780000000F760A0 38148 [000-38148] 8599 0 Coord Inst-Active 13828348 db2nit db2fw1 0 0 3 NIT *LOCAL.DB2.130724075434 n/a
0x0780000000F50080 38149 [000-38149] 8856 0 Coord Inst-Active 13828348 db2nit db2fw2 0 0 3 NIT *LOCAL.DB2.130724075435 n/a
0x0780000000F560A0 38150 [000-38150] 9113 0 Coord Inst-Active 13828348 db2nit db2fw3 0 0 3 NIT *LOCAL.DB2.130724075436 n/a
0x0780000000D00080 38151 [000-38151] 9370 0 Coord Inst-Active 13828348 db2nit db2fw4 0 0 3 NIT *LOCAL.DB2.130724075437 n/a
0x0780000000D060A0 38152 [000-38152] 9627 0 Coord Inst-Active 13828348 db2nit db2fw5 0 0 3 NIT *LOCAL.DB2.130724075438 n/a
0x0780000000D30080 38153 [000-38153] 9884 0 Coord Inst-Active 13828348 db2nit db2fw6 0 0 3 NIT *LOCAL.DB2.130724075439 n/a
0x0780000000D360A0 38154 [000-38154] 10141 0 Coord Inst-Active 13828348 db2nit db2fw7 0 0 3 NIT *LOCAL.DB2.130724075440 n/a
0x0780000000B40080 38155 [000-38155] 10398 0 Coord Inst-Active 13828348 db2nit db2evmg_ 0 0 3 NIT *LOCAL.DB2.130724075441 n/a
0x0780000000B460A0 43772 [000-43772] 10998 0 Coord Inst-Active 19529738 db2nit db2bp 0 0 NotSet n/a *LOCAL.db2nit.130727120150 Sat Jul 27 14:01:50
0x0780000000C460A0 43779 [000-43779] 15743 0 Coord Inst-Active 12189750 db2nit db2bp 0 0 NotSet NIT *LOCAL.db2nit.130727120328 Sat Jul 27 14:03:08
0x0780000000C60080 43779 [000-43779] 11255 0 Coord Inst-Active 12189750 db2nit db2bp 0 0 0 n/a *LOCAL.db2nit.130727120328 n/a
0x0780000000C660A0 43779 [000-43779] 11512 0 Coord Inst-Active 12189750 db2nit db2bp 0 0 0 n/a *LOCAL.db2nit.130727120328 n/a
0x0780000000C00080 43779 [000-43779] 11769 0 Coord Inst-Active 12189750 db2nit db2bp 0 0 0 n/a *LOCAL.db2nit.130727120328 n/a
0x0780000000C060A0 43779 [000-43779] 12282 0 Coord Inst-Active 12189750 db2nit db2bp 0 0 0 n/a *LOCAL.db2nit.130727120328 n/a
0x0780000000C10080 0 [000-00000] 10741 0 Coord Pooled n/a n/a n/a 0 0 NotSet NIT *LOCAL.db2nit.130727120311 Sat Jul 27 14:03:08
0x0780000000B20080 0 [000-00000] 3036 0 Coord Pooled n/a n/a n/a 0 0 NotSet NIT *LOCAL.db2nit.130727115811 Sat Jul 27 13:58:08
0x0780000000C160A0 0 [000-00000] 12010 0 Coord Pooled n/a n/a n/a 0 0 NotSet NIT *LOCAL.db2nit.130727101311 Sat Jul 27 12:13:08
可以看到5个为application 43779服务的EDU;
使用db2pd 查看edu
db2nit> db2pd -edus
Database Partition 0 -- Active -- Up 15 days 00:10:44 -- Date 07/27/2013 14:03:44
List of all EDUs for database partition 0
db2sysc PID: 15204508
db2wdog PID: 5898334
db2acd PID: 11993336
EDU ID TID Kernel TID EDU Name USR (s) SYS (s)
========================================================================================================================================
12539 12539 28377129 db2med.15743.0 (NIT) 0 0.009949 3.527274
12282 12282 54460625 db2bm.15743.3 (NIT) 0 0.086214 0.069770
11769 11769 52297955 db2bm.15743.2 (NIT) 0 0.095598 0.079419
11512 11512 50921689 db2bm.15743.1 (NIT) 0 0.087625 0.071135
11255 11255 27918559 db2bm.15743.0 (NIT) 0 0.096776 0.090947
10998 10998 44695611 db2agent (instance) 0 2.666427 1.181999
10741 10741 18940019 db2agntdp (NIT ) 0 0.949897 0.434677
12010 12010 32702677 db2agntdp (NIT ) 0 0.045836 0.042250
3036 3036 32178401 db2agntdp (NIT ) 0 12.292270 5.721113
10398 10398 48496773 db2evmgi (DB2DETAILDEADLOCK) 0 0.034382 0.061275
10141 10141 43057229 db2fw7 (NIT) 0 0.021623 0.031903
9884 9884 12779697 db2fw6 (NIT) 0 0.021317 0.031419
9627 9627 53674083 db2fw5 (NIT) 0 0.019852 0.030094
9370 9370 43712753 db2fw4 (NIT) 0 0.020648 0.031635
9113 9113 19595459 db2fw3 (NIT) 0 0.026875 0.054205
8856 8856 31326355 db2fw2 (NIT) 0 0.023148 0.040782
8599 8599 18677861 db2fw1 (NIT) 0 0.023846 0.042452
3734 3734 14942213 db2fw0 (NIT) 0 0.022034 0.033279
3477 3477 18022487 db2lused (NIT) 0 3.086854 4.427099
3220 3220 20119721 db2wlmd (NIT) 0 0.025568 0.034810
2707 2707 13107337 db2taskd (NIT) 0 2.891258 4.199364
15743 15743 27721767 db2agent (NIT) 0 10.609634 5.167409
15486 15486 43909123 db2stmm (NIT) 0 4.798147 6.865923
8226 8226 11862151 db2hadrp (NIT) 0 57.869623 76.315442
7969 7969 20840581 db2pfchr (NIT) 0 0.095132 1.245918
7712 7712 19398811 db2pfchr (NIT) 0 0.109968 1.521396
7455 7455 53870787 db2pfchr (NIT) 0 0.132954 1.647029
7198 7198 11468949 db2pfchr (NIT) 0 0.308849 1.862162
6941 6941 22020127 db2pclnr (NIT) 0 0.001253 0.002788
6684 6684 37748757 db2pclnr (NIT) 0 0.001457 0.003638
6427 6427 21758149 db2pclnr (NIT) 0 0.001411 0.003269
6170 6170 19464325 db2pclnr (NIT) 0 0.001507 0.004287
5913 5913 54525969 db2pclnr (NIT) 0 0.001033 0.001735
5656 5656 13762703 db2pclnr (NIT) 0 0.000934 0.001045
5399 5399 28704993 db2pclnr (NIT) 0 0.001145 0.001963
5142 5142 12320979 db2dlock (NIT) 0 510.050841 2.735717
4885 4885 13959197 db2lfr (NIT) 0 0.022174 0.130779
4628 4628 13893729 db2loggw (NIT) 0 19.547848 29.913097
4371 4371 19726443 db2loggr (NIT) 0 20.021171 125.796453
4114 4114 30474269 db2logmgr (NIT) 0 4.097033 5.137900
3857 3857 33947759 db2logts (NIT) 0 12.317832 2.196172
2314 2314 34340895 db2resync 0 0.015151 0.049634
2057 2057 30015721 db2tcpcm 0 0.000042 0.000011
1800 1800 46923981 db2tcpcm 0 0.000051 0.000011
1543 1543 36569253 db2tcpcm 0 0.000062 0.000011
1286 1286 46334151 db2tcpcm 0 0.000097 0.000017
1029 1029 42795011 db2ipccm 0 3.755539 4.190704
772 772 38469701 db2licc 0 0.001021 0.010097
515 515 51052777 db2thcln 0 0.000725 0.000228
2 2 52166777 db2alarm 0 5.667825 2.984628
258 258 54394911 db2sysc 0 85.164899 128.548930
可以看到对应的EDUID的edu。
4、强制停止正在进行的数据库在线备份
在CLP2中:
db2 list applications show detail
CONNECT Auth Id Application Name Appl. Application Id Seq# Number of Coordinating DB Coordinator Status Status Change Time DB Name DB Path
Handle Agents partition number pid/thread
-------------------------------------------------------------------------------------------------------------------------------- -------------------- ---------- -------------------------------------------------------------- ----- ---------- ---------------- --------------- ------------------------------ -------------------------- -------- --------------------
DB2NIT db2fw4 38151 *LOCAL.DB2.130724075437 00001 1 0 9370 Connect Completed 07/24/2013 09:54:28.243810 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2taskd 38144 *LOCAL.DB2.130724075430 00001 1 0 2707 UOW Waiting 07/27/2013 14:03:44.799227 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw3 38150 *LOCAL.DB2.130724075436 00001 1 0 9113 Connect Completed 07/24/2013 09:54:28.242194 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2stmm 38143 *LOCAL.DB2.130724075429 00001 1 0 15486 UOW Waiting 07/27/2013 14:01:48.055214 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw2 38149 *LOCAL.DB2.130724075435 00001 1 0 8856 Connect Completed 07/24/2013 09:54:28.240569 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2evmg_DB2DETAILDEA 38155 *LOCAL.DB2.130724075441 00001 1 0 10398 Connect Completed 07/24/2013 09:54:28.354142 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw1 38148 *LOCAL.DB2.130724075434 00001 1 0 8599 Connect Completed 07/24/2013 09:54:28.238961 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw7 38154 *LOCAL.DB2.130724075440 00001 1 0 10141 Connect Completed 07/24/2013 09:54:28.248635 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2bp 43779 *LOCAL.db2nit.130727120328 00001 5 0 15743 Performing a Backup 07/27/2013 14:03:28.830789 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw0 38147 *LOCAL.DB2.130724075433 00001 1 0 3734 Connect Completed 07/24/2013 09:54:28.237365 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw6 38153 *LOCAL.DB2.130724075439 00001 1 0 9884 Connect Completed 07/24/2013 09:54:28.247030 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2lused 38146 *LOCAL.DB2.130724075432 00001 1 0 3477 UOW Waiting 07/27/2013 13:58:45.254383 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2fw5 38152 *LOCAL.DB2.130724075438 00001 1 0 9627 Connect Completed 07/24/2013 09:54:28.245419 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
DB2NIT db2wlmd 38145 *LOCAL.DB2.130724075431 00001 1 0 3220 Connect Completed 07/24/2013 09:54:28.233807 NIT /db2/NIT/db2nit/NODE0000/SQL00001/
db2 "force application(43779): "
DB20000I The FORCE APPLICATION command completed successfully.
DB21024I This command is asynchronous and may not be effective immediately.
在CLP1中:
db2 backup database nit online to /db2/NIT/db2backup
SQL1224N The database manager is not able to accept new requests, has
terminated all requests in progress, or has terminated the specified request
because of an error or a forced interrupt. SQLSTATE=55032
备份已经失败。
ls
失败的备份,不产生备份文件。
5、查看db2diag文件中的事件记录。
<
首先,数据库备份准备启动:
2013-07-27-14.03.28.680300+120 E1822A471 LEVEL: Info
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetDegreeParallelism, probe:762
DATA #1 :
Autonomic BAR - using parallelism = 4.
2013-07-27-14.03.28.724577+120 E2294A506 LEVEL: Info
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetAvailableHeapPages, probe:876
DATA #1 :
Autonomic BAR - heap consumption.
Targetting (50%) - 4960 of 9920 pages.
2013-07-27-14.03.28.724764+120 E2801A497 LEVEL: Info
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubTuneBuffers, probe:1127
DATA #1 :
Autonomic backup - tuning enabled.
Using buffer size = 1233, number = 4.
我们发现EDUID为15743的这个EDU(它有独立身份,是db2agent)在进行了一些准备工作后(准备完毕后,开始备份,并由其唤醒其他EDU来工作),以后就空闲了,这也和上面的db2pd -edus中显示的内容是一致的;
注意,这里提到了,并行度是4(using parallelism = 4.)
然后,数据库备份过程启动:
2013-07-27-14.03.28.850786+120 E3299A443 LEVEL: Info
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:1604
MESSAGE : Starting an online db backup.
这是EDUID为15743的这个EDU,它启动了真正的备份过程,其他的被唤醒的EDU都可以看作是它找来的工人,而他自己变成了监工,如下所示:
2013-07-27-14.03.28.977211+120 E3743A516 LEVEL: Info
PID : 15204508 TID : 11255 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11255 EDUNAME: db2bm.15743.0 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1221
MESSAGE : Backing up tablespace
DATA #1 : Pool ID, PD_TYPE_POOL_ID, 2 bytes
18
DATA #2 : String, 10 bytes
NIT#ES701D
2013-07-27-14.03.28.980869+120 E4260A516 LEVEL: Info
PID : 15204508 TID : 11512 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11512 EDUNAME: db2bm.15743.1 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1221
MESSAGE : Backing up tablespace
DATA #1 : Pool ID, PD_TYPE_POOL_ID, 2 bytes
19
DATA #2 : String, 10 bytes
NIT#ES701I
2013-07-27-14.03.28.981929+120 E4777A514 LEVEL: Info
PID : 15204508 TID : 11769 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11769 EDUNAME: db2bm.15743.2 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1221
MESSAGE : Backing up tablespace
DATA #1 : Pool ID, PD_TYPE_POOL_ID, 2 bytes
12
DATA #2 : String, 9 bytes
NIT#STABD
2013-07-27-14.03.28.982959+120 E5292A513 LEVEL: Info
PID : 15204508 TID : 12282 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 12282 EDUNAME: db2bm.15743.3 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1221
MESSAGE : Backing up tablespace
DATA #1 : Pool ID, PD_TYPE_POOL_ID, 2 bytes
6
DATA #2 : String, 9 bytes
NIT#DDICD
…………………...
注意观察上面标红的输出,发现,这些EDU都是由15743这个EDU进行唤醒的(所以,他们的身份都是EDUNAME: db2bm.15743.x (NIT) 0,x是不同的edu的序号这里是0123);而这个也是能和
Db2pd -edus中的edu的状态(db2bm.15743.x (NIT) 0,x的含义同上)呼应上的;
运行一段时间后,有些表空间备份完了,这时候会输出如下信息:
………………………………………….
2013-07-27-14.04.17.999657+120 E20236A430 LEVEL: Info
PID : 15204508 TID : 11512 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11512 EDUNAME: db2bm.15743.1 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1480
MESSAGE : Finished tablespaces
2013-07-27-14.04.18.016114+120 E20667A430 LEVEL: Info
PID : 15204508 TID : 12282 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 12282 EDUNAME: db2bm.15743.3 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1480
MESSAGE : Finished tablespaces
2013-07-27-14.04.18.027257+120 E21098A430 LEVEL: Info
PID : 15204508 TID : 11769 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11769 EDUNAME: db2bm.15743.2 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1480
MESSAGE : Finished tablespaces
2013-07-27-14.05.17.991725+120 E21529A430 LEVEL: Info
PID : 15204508 TID : 11255 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 11255 EDUNAME: db2bm.15743.0 (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubBMCont, probe:1480
MESSAGE : Finished tablespaces
………………………………………..
但是,进行备份工作的,还是那4个被唤醒的EDU;
后来,我们在前端手动发出了force命令,这个时候,在db2diag文件中有如下记录:
…………………..
2013-07-27-14.05.18.074037+120 E21960A542 LEVEL: Severe
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubPollMsg, probe:160
DATA #1 : Sqlcode, PD_TYPE_SQLCODE, 4 bytes
-1224
DATA #2 : Hexdump, 4 bytes
0x0A00020058623190 : FFFF FB38 ...8
2013-07-27-14.05.18.074210+120 E22503A1038 LEVEL: Severe
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubPollMsg, probe:160
MESSAGE : SQL1224N The database manager is not able to accept new requests,
has terminated all requests in progress, or has terminated the
specified request because of an error or a forced interrupt.
DATA #1 : SQLCA, PD_DB2_TYPE_SQLCA, 136 bytes
sqlcaid : SQLCA sqlcabc: 136 sqlcode: -1224 sqlerrml: 0
sqlerrmc:
sqlerrp : sqlubPol
sqlerrd : (1) 0x00000000 (2) 0x00000000 (3) 0x00000000
(4) 0x00000000 (5) 0x00000000 (6) 0x00000000
sqlwarn : (1) (2) (3) (4) (5) (6)
(7) (8) (9) (10) (11)
sqlstate:
…………………………..
2013-07-27-14.05.19.866952+120 E25855A421 LEVEL: Severe
PID : 15204508 TID : 15743 PROC : db2sysc 0
INSTANCE: db2nit NODE : 000 DB : NIT
APPHDL : 0-43779 APPID: *LOCAL.db2nit.130727120328
AUTHID : DB2NIT
EDUID : 15743 EDUNAME: db2agent (NIT) 0
FUNCTION: DB2 UDB, database utilities, sqlubcka, probe:911
MESSAGE : Backup terminated.
最先响应这个force命令的,是监工15743,也就是唤醒其他edu来工作的那个edu,而且也是由它最后汇报说,备份被取消了。
由此可见,在线备份过程中,备份的application和为其服务的各个EDU的关系,通俗的讲,很类似这样一种方式:
application就像是manager,他接到一个叫备份的项目(APPHDL : 0-43779);这个时候他找来一个team leader,也就是那个15743的EDU,把备份这个项目分给了这个team leader,并由其自己组织团队;
然后呢,team leader (15743)先准备一下项目环境,然后就找来了4个team member (那4个EDU)组成项目团队,来进行具体的备份工作,而他(15743)就进行一些管理工作;
虽然大家都属于一个项目,在一个manager的领导下(application 43779),但是,team member对team leader汇报,而team Leander对manager汇报。
具体的备份工作做完了,team member说Finished tablespaces;当发现被force了的时候,team leader说Backup terminated.然后,大家就解散(回到内存池)了,但是manager还在(在代理池,除非代理也被销毁),还要继续接项目…
(这真的跟现实中的IT项目一样一样的……)
角色清单如下:
manager:【application 43779】
Project name:【APPHDL : 0-43779】
Team leader:【EDUID : 15743 EDUNAME: db2agent (NIT) 0】
Team member:
【EDUID : 12282 EDUNAME: db2bm.15743.3 (NIT) 0】
【EDUID : 11512 EDUNAME: db2bm.15743.1 (NIT) 0】
【EDUID : 11769 EDUNAME: db2bm.15743.2 (NIT) 0】
【EDUID : 12282 EDUNAME: db2bm.15743.3 (NIT) 0】
结论:
1、在进行在线备份的时候,会启动一个新的独立的application来进行备份工作;
2、这个application会找来若干个EDU来进行具体的备份实干(具体有多少个,有可能和并行度设置相关,会找来设置的并行度+1个EDU来进行实干;需要进一步验证)
3、由application找来的EDU中的某一个,做为管理EDU,在设置一些备份需要的基础环境后,会唤醒一些听命于它的EDU,来进行备份(db2bm),而它自己则进行管理、衔接,响应application的请求和返回下级EDU的状态;
4、如果备份失败,不会生成最终的备份文件;