无聊之人--除了技术,还是技术,你懂得
分类: DB2/Informix
2016-02-21 16:39:01
Why Active/Active ? - Business RequirementsSENARIO OF TWO NODE QREPObjects needed in WebSphere MQOBJECTS IN QPREPARE ACTIONQ CONTROL TABLES -PARMTABLESCreate Q MapsCreate Q subscriptionsQ HOUSE-KEEPINT WORK ACTIONCHANLE ACTION:MQ QUEUE ACTIONQAPPLYLatency analysisLATENCY OUTLOOKCAPTURE LATENCY ANALYSISQ Apply latencyQREP RECOVERY
现在IDC大行其道,同城/异地双活甚至多活的概念已经日益普及。12年的时候,写过一篇mysql中关于MASTER/SLAVE的实现mysql replication---master/salve IN ONE PC 。针对不同的DBMS,双活具体的实现不同,但都基于相同的理论或是基于SQL REPLICATION,或是基于LOG REPLICATION。这里介绍一下IBM中关于A/A的实现,关于Q的架构以及实现后面专门抽出一章来进行介绍,这里是实践贴,使你对Q有一个感性认识。关于本文最后的Q latency部分,也是每一个系统运维人员都必须要掌握的,因为它对于RPO/RTO的重要性不言而喻。
Replication Technologies for Business Continuity
Requirements and trade-offs to consider in selecting technologies:
What needs to be recovered?: Application Data vs. DBMS vs. Entire Systems How long does it take? Recovery Time Objective (RTO): One hours or more vs. few seconds How much data could you lose? Recovery Point Objective (RPO): No data loss vs. seconds of data Distance required between sites?: 10s of kilometers vs. 100s of kilometers Hardware Utilization: Standby vs. Active Impact on applications: Direct overhead (synchronous technologies) vs. no impact (async technologies) CPU Overhead: Negligible (hardware e.g., PPRC) vs. Proportional to the workload (transaction replay technology)
OBJECT DEFINE DESCRIPTION
OBJECT NAMES | NODE1 | NODE2 |
---|---|---|
HOSTNAME | SITEA | SITEB |
IP | 10.1.1.1 | 10.2.1.1 |
DB2 VERSION | 10 | 11 |
SUBSYSTEM | DB1 | DB2 |
LOCATION | DB0A | DB0B |
DATABASE PORT | 8000 | 9000 |
DATABASE | DB1 | DB2 |
Q MGR | QMGR1 | QMGR2 |
MQ PROT | 8001 | 9001 |
RESTARTQ | MQ1.RESTARTQ | MQ2.RESTARTQ |
ADMINQ | MQ1.ADMINQ | MQ2.ADMINQ |
SENDQ | MQ1.SENDQ.Q1 | MQ2.SENDQ.Q1 |
SENDQ | MQ1.SENDQ.Q2 | MQ2.SENDQ.Q2 |
SENDQ | MQ1.SENDQ.Q3 | MQ2.SENDQ.Q3 |
REP SCHEMA | ASN1 | ASN2 |
BIND REPLICATION PROGRAM
APF AUTHORIZE Q-RELATED PROGRAM
ENABLE Q TABEL DATA CAPTURES
ALTER TABLE QTABLE DATA CAPTURE CHANGES;
CONFIG CDB TABLE
远程DB2连接使用DRDA协议,在使用之前需要配置对应的CDB信息。CDB 主要是Qapply用来连接Qcapture的属性配置,第一条SQLP配置了本端的连接信息,后面3条SQL配置了,对端的连接信息。
NODE1:
INSERT INTO SYSIBM.LOCATIONS(LOCATION, LINKNAME, PORT) VALUES ('DB0A', 'DB0A', ''); INSERT INTO SYSIBM.IPNAMES(LINKNAME, SECURITY_OUT, USERNAMES, IPADDR) VALUES ('DB0B', 'P', 'O', 'demo.TEST.com'); INSERT INTO SYSIBM.USERNAMES(TYPE, LINKNAME, NEWAUTHID, PASSWORD) VALUES ('O', 'DB0B', 'TESTUSER', 'NOTTELLYOU'); INSERT INTO SYSIBM.LOCATIONS(LOCATION, LINKNAME, PORT) VALUES ('DB0B', 'DB0B', '9000');
NODE2:
INSERT INTO SYSIBM.LOCATIONS(LOCATION, LINKNAME, PORT) VALUES ('DB0B', 'DB0B', ''); INSERT INTO SYSIBM.IPNAMES(LINKNAME, SECURITY_OUT, USERNAMES, IPADDR) VALUES ('DB0A', 'P', 'O', 'DEMO.TEST.com'); INSERT INTO SYSIBM.USERNAMES(TYPE, LINKNAME, NEWAUTHID, PASSWORD) VALUES ('O', 'DB0A', 'BRIDDEL', 'XXXXXXXX'); INSERT INTO SYSIBM.LOCATIONS(LOCATION, LINKNAME, PORT) VALUES ('DB0A', 'DB0A', '8000');
定义MQ传输使用的各种对象。
DEFPSIST
YES means that unless instructed otherwise, the Queue Manager logs transactions to these queues and can recover those messages in the event of failure or restart.
All control tables that are located on a node have to have the same schema.both the Q Capture and Q Apply on NODE 1 will use the replication schema ASN1 AND CONTROL TABLE ON NODE USE ASN2.
这里配置Q脚本是通过 ASNCLP配置的。它既支持交互方式,也支持批量方式。
举例说明
SET QMANAGER "MQ1A" FOR NODE 1; SET QMANAGER "MQ1B" FOR NODE 2; CREATE CONTROL TABLES FOR NODE 1 USING CAPPARMS RESTARTQ "MQ1.RESTARTQ" ADMINQ "MQ1.ADMINQ" MONITOR INTERVAL 10000 APPPARMS IN ZOS PAGE LOCK DB DEMODB QCNTLAP CREATE ROW LOCK DB DEMODB2 QCNTLAR CREATE MONITOR INTERVAL 10000; CREATE CONTROL TABLES FOR NODE 2 USING CAPPARMS RESTARTQ "MQ2.RESTARTQ" ADMINQ "MQ2.ADMINQ" MONITOR INTERVAL 10000 APPPARMS IN ZOS PAGE LOCK DB RBBDEMO QCNTLAP CREATE ROW LOCK DB RBBDEMO QCNTLAR CREATE MONITOR INTERVAL 10000;
Two Q Maps were used for the first part of this exercise. Even though a single Q
Map can efficiently contain hundreds of Q Subscriptions, you might want to
separate your subscriptions into multiple Q Maps for administrative purposes.
这里即确定了表的订阅关系。
CREATE REPLQMAP BIDIND1_TO_BIDIND2_MAP1 ( NODE 1, NODE 2 ) USING ADMINQ "BIDIND1.ADMINQ" RECVQ "BIDIND1.TO.BIDIND2.DATAQ1" SENDQ "BIDIND1.TO.BIDIND2.DATAQ1" NUM APPLY AGENTS 2; CREATE REPLQMAP BIDIND1_TO_BIDIND2_MAP2 ( NODE 1, NODE 2 ) USING ADMINQ "BIDIND1.ADMINQ" RECVQ "BIDIND1.TO.BIDIND2.DATAQ2" SENDQ "BIDIND1.TO.BIDIND2.DATAQ2" NUM APPLY AGENTS 2;
定义一张表属于哪一个Qmap,以及是否进行Q复制。
# # Use MAP1 for TABLE1 and TABLE2 # SET CONNECTION SOURCE DB1A.ASNB1 TARGET DB1C.ASNB2 REPLQMAP BIDIND1_TO_BIDIND2_MAP1 ; SET TABLES (DB1.ASN1.TABLE1); CREATE QSUB SUBTYPE U FROM NODE DB1A.ASNB1 SOURCE HAS LOAD PHASE N TARGET CONFLICT ACTION I FROM NODE DB1C.ASNB2 SOURCE HAS LOAD PHASE N TARGET CONFLICT ACTION F;
START CHANNEL(MQ1A.TO.MQ1B)
DISPLAY CHSTATUS(MQ1BA.TO.MQ1B)
START CHANNEL(MQ1B.TO.MQ1A)
DISPLAY CHSTATUS(MQ1B.TO.MQ1A)
+MQ2 CLEAR QLOCA(MQ2.SENDQ.Q1) DISPLAY QLOCAL(MQ2.SENDQ.Q1) CURDEPTH
关于Q latency它与我们搭建Q的目标息息相关,特别是对IDC RPT/ROT 两个指标的影响巨大。因此分析Q lantency变成了DBA一项必须掌握的技能。要想进行latency分析,你必须明确latency是如何定义的,现有的工具以及如何进行调整。
Control tables for replication monitoring
? IBMQREP_CAPMON
? IBMQREP_CAPQMON
? IBMQREP_APPLYMON
上图中每一个数字对应的动作如下:
t1 = transaction committed at the source site
t2 = the Q Capture program read the transaction from the log
t3 = the Q Capture program committed the message to the send queue
t4 = the Q Apply program read the message from the receive queue
t5 = the Q Apply successfully committed the transaction at the target site
对应的Q END-TO-END latency=T5-T1
即一个交易在目标端与源端commit之间的时间间隔。影响Q latency的因素一个有10个,具体的分布如下:
需要说明的是背景为蓝色的影响因素对Q 的配置以及SQL语句关联较大。而白色背景部分的影响相对可控,前提是MQ,DBMS WORK WELL.
下面给出上述10个影响因素的定义,对应的monitor信息,以及如何调优。
seq | Factor impacting latency | Relevant statistics |
---|---|---|
1 | Q Capture log latency | IBMQREP_CAPMON:CURRENT_LOG_TIME |
2 | Q Capture time reading DB2 log records (DB2 IFI calls) | IBMQREP_CAPMON:LOGREAD_API_TIME |
3 | Q Capture time waiting for memory | IBMQREP_CAPMON:LOGRDR_SLEEPTIME,NUM_END_OF_LOGS |
4 | Q Capture time to put WebSphere MQ messages in a send queue and Q Capture time to commit all WebSphere MQ transactions | IBMQREP_CAPQMON: MQPUT_TIME, QFULL_ERROR_COUNT IBMQREP_CAPMON:MQCMIT_TIME |
5 | WebSphere MQ time sending and staging WebSphere MQ messages | not saved in monitor tables |
6 | Q Apply time to retrieve WebSphere MQ messages from a receive queue | IBMQREP_APPLYMON:MQGET_TIME |
7 | Q Apply time waiting due to transaction dependencies | IBMQREP_APPLYMON:DEPENDENCY_DELAY |
8 | Q Apply waiting for agents to pick up DBMS transactions ready to be applied | IBMQREP_APPLYMON:WORKQ_WAIT_TIME |
9 | Q Apply time retrying SQL due to RI, unique violation,or deadlock | IBMQREP_APPLYMON:RETRY_TIME |
10 | Q Apply time in DB2 processing successful transactions | IBMQREP_APPLYMON:DBMS_TIME |
LOGREAD_API_TIME:IBMQREP_CAPMON The time spent in DB2 returning log records to Q Capture NUM_END_OF_LOGS:IBMQREP_CAPMON table, lists the number of times that the Q Capture program reached the end of the DB2 log LOGRDR_SLEEPTIME:IBMQREP_CAPMON LOGREADER SLEEP TIME FOR NUM_END_OF+_LOGS OR it reached its memory limit threshold MQPUT_TIME:IBMQREP_CAPMONThe time delay spent in WebSphere MQ to put new messages in the send queue MQCMIT_TIME: IBMQREP_CAPMON The time delay spent in WebSphere MQ to commit the MQ transactions XMITQDEPTH : IBMQREP_CAPQMON The transmit queue depth valu If the transmit queue depth keeps growing and there is a lot of I/O to the physical pageset of the transmit queue, this means that the capacity of the MQ channel has been exceeded
TRAN_BATCH_SZIE
平均每一个交易的大小 = MQ_BYTES/TRANS_PUBLISHED
平均每一个消息所包含的交易数目=MQ_MESSAGES /TRANS_PUBLISHED,该值应该近似等于TRAN_BATCH_SIZE
WORKQ_WAIT_TIME:IBMQREP_APPLYMON table tracks how long it takes transactions that are ready to be applied at the target site to be picked up by a Q Apply agent RETRY_TIME :IBMQREP_APPLYMON Q Apply is retrying some of the SQL statements included in the replicated transactions,SUCH RI,UI,TIMEOUT,DEADLOCK The DBMS_TIME :IBMQREP_APPLYMON tracks the average time spent in DB2 applying the transactions for a given Q Apply monitor interval. The transactions include user tables (replicated tables) and Q Replication tables (control table data that tracks the Q Apply progress)
STARTLRSN: lsn value from where it is safe to start reading the DB2 log
commit lsn: lsn value up to which it is safe to skip committed transactions as Q Capture is reading the log records, because the transactions were previously published.
本文所有的内容均整理自互联网,仅供参考学习,如有涉及版权问题,请自行删除本文,谢谢。