半个PostgreSQL DBA,热衷于数据库相关的技术。我的ppt分享https://pan.baidu.com/s/1eRQsdAa https://github.com/chenhuajun https://chenhuajun.github.io
分类: Mysql/postgreSQL
2016-05-08 02:23:48
在众多的PostgreSQL HA方案中,流复制HA方案是性能,可靠性,部署成本等方面都比较好的,也是目前被普遍采用的方案。而用于管理流复制集群的工具中,Pacemaker+Corosync又是比较成熟可靠的。
但是原生的基于Pacemaker+Corosync搭建PostgreSQL流复制HA集群存在配置复杂,设置参数众多,不易使用的问题,尤其对于缺少Pacemaker使用经验的用户。因此在Pacemaker+Corosync PostgreSQL流复制HA集群方案的基础上简化了集群配置并封装了常用的集群操作命令,目的在于简化集群的部署和使用。同时对Resource Agent 3.9.7的pgsql RA进行了增强,引入分布式锁服务,防止双节点集群出现脑裂,并确保同步复制下failover后数据不丢失。
封装和修改过的内容可从以下位置获取,后面是相关的详细说明。
安装过程以在以下环境下部署双节点HA集群为例说明。
在所有节点执行:
setenforce 0 sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config systemctl disable firewalld.service systemctl stop firewalld.service iptables -F
在所有节点执行:
yum install -y pacemaker pcs psmisc policycoreutils-python
注:如果OS自带的Pacemaker比较旧,建议下载新版的。之前在Pacemaker 1.1.7上遇到了不少Bug,因此不建议使用这个版本或更老的版本。
在所有节点执行:
systemctl start corosync.service systemctl enable corosync.service systemctl start pacemaker.service systemctl enable pacemaker.service systemctl start pcsd.service systemctl enable pcsd.service
在所有节点执行:
echo hacluster | passwd hacluster --stdin
在任何一个节点上执行:
pcs cluster auth -u hacluster -p hacluster node1 node2
在任何一个节点上执行:
pcs cluster setup --last_man_standing=1 --name pgcluster node1 node2
在任何一个节点上执行:
pcs cluster start --all
在所有节点执行:
yum install postgresql-server
OS自带的PostgreSQL往往比较旧,可参考 ,安装最新版PostgreSQL.
在node1节点执行:
创建数据目录
mkdir -p /data/postgresql/data chown -R postgres:postgres /data/postgresql/ chmod 0700 /data/postgresql/data
初始化db
su - postgres initdb -D /data/postgresql/data/
修改postgresql.conf
listen_addresses = '*' wal_level = hot_standby synchronous_commit = on max_wal_senders=5 wal_keep_segments = 32 hot_standby = on replication_timeout = 5000 wal_receiver_status_interval = 2 max_standby_streaming_delay = -1 max_standby_archive_delay = -1 restart_after_crash = off hot_standby_feedback = on
注:PostgreSQL 9.3以上版本,应将replication_timeout替换成wal_sender_timeout;PostgreSQL 9.5以上版本,可加上"wal_log_hints = on",使得可以使用pg_rewind修复旧Master。
修改pg_hba.conf
local all all trust host all all 192.168.41.0/24 md5 host replication all 192.168.41.0/24 md5
启动
pg_ctl -D /data/postgresql/data/ start
创建复制用户
createuser --login --replication replication -P
9.5以上版本如需要支持pg_rewind,需加上“-s”选项。
createuser --login --replication replication -P -s
在node2节点执行:
创建数据目录
mkdir -p /data/postgresql/data chown -R postgres:postgres /data/postgresql/ chmod 0700 /data/postgresql/data
创建基础备份
su - postgres pg_basebackup -h node1 -U replication -D /data/postgresql/data/ -X stream -P
在node1上执行:
pg_ctl -D /data/postgresql/data/ stop
分布式锁服务的作用是防止双节点集群出现脑裂。当网络发生故障形成分区时,备可能会被提升为主,同时旧主会将同步复制切换到异步复制,这可能导致数据丢失。通过分布式锁服务可以确保新主的提升和旧主的切换到异步复制同时只能有一个成功。
分布式锁服务通过HA集群外部的另外一个PostgreSQL服务实现。需要事先创建锁表。
create table if not exists distlock(lockname text primary key,owner text not null,ts timestamptz not null,expired_time interval not null);
可选地,可以创建锁的历史表,每次锁的owner变更(主从角色切换)都会记录到历史表(distlock_history)中。
create table if not exists distlock_history(id serial primary key,lockname text not null,owner text not null,ts timestamptz not null,expired_time interval not null); CREATE OR REPLACE FUNCTION distlock_log_update() RETURNS trigger AS $$ BEGIN IF TG_OP = 'INSERT' or NEW.owner <> OLD.owner THEN INSERT INTO distlock_history(lockname, owner, ts, expired_time) values(NEW.lockname, NEW.owner, NEW.ts, NEW.expired_time); END IF; RETURN NEW; END; $$ LANGUAGE plpgsql; DROP TRIGGER IF EXISTS distlock_log_update ON distlock; CREATE TRIGGER distlock_log_update AFTER INSERT OR UPDATE ON distlock FOR EACH ROW EXECUTE PROCEDURE distlock_log_update();
在任意一个节点上执行:
下载pha4pgsql
git clone git://github.com/Chenhuajun/pha4pgsql.git
编辑config.ini
cluster_type=dual OCF_ROOT=/usr/lib/ocf RESOURCE_LIST="msPostgresql vip-master vip-slave" pha4pgsql_dir=/opt/pha4pgsql writer_vip=192.168.41.136 reader_vip=192.168.41.137 node1=node1 node2=node2 vip_nic=eno33554984 vip_cidr_netmask=24 pgsql_pgctl=/usr/bin/pg_ctl pgsql_psql=/usr/bin/psql pgsql_pgdata=/data/postgresql/data pgsql_restore_command="" pgsql_rep_mode=sync pgsql_repuser=replication pgsql_reppassord=replication pgsql_enable_distlock=true pgsql_distlock_psql_cmd='/usr/bin/psql \\"host=node3 port=5439 dbname=postgres user=postgres connect_timeout=5\\"' pgsql_distlock_lockname=pgsql_cls1
需要根据实际环境修改上面的参数。当多个多个集群使用锁服务时,确保每个集群的pgsql_distlock_lockname值必须是唯一的。
安装pha4pgsql
sh install.sh ./setup.sh
注意,安装过程只需在一个节点上执行即可。
设置环境变量
export PATH=/opt/pha4pgsql/bin:$PATH
启动集群
cls_start
确认集群状态
cls_status
cls_status的输出示例如下:
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 02:01:01 2016 Last change: Fri Apr 22 02:01:00 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Slaves: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 00000000070000D0 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : -INFINITY + pgsql-data-status : STREAMING|ASYNC + pgsql-status : HS:async Migration summary: * Node node2: * Node node1: pgsql_REPL_INFO:node1|1|00000000070000D0
检查集群的健康状态。完全健康的集群需要满足以下条件:
pgsql-data-status的取值详细可参考下面的说明
The transitional state of data is displayed. This state remains after stopping pacemaker. When starting pacemaker next time, this state is used to judge whether my data is old or not. DISCONNECT Master changes other node state into DISCONNECT if Master can't detect connection of replication because of LAN failure or breakdown of Slave and so on. {state}|{sync_state} Master changes other node state into {state}|{sync_state} if Master detects connection of replication. {state} and {sync_state} means state of replication which is retrieved using "select state and sync_state from pg_stat_replication" on Master. For example, INIT, CATCHUP, and STREAMING are displayed in {state} and ASYNC, SYNC are displayed in {sync_state} LATEST It's displayed when it's Master. These states are the transitional state of final data, and it may be not consistent with the state of actual data. For instance, During PRI, the state is "LATEST". But the node is stopped or down, this state "LATEST" is maintained if Master doesn't exist in other nodes. It never changes to "DISCONNECT" for oneself. When other node newly is promoted, this new Master changes the state of old Master to "DISCONNECT". When any node can not become Master, this "LATEST" will be keeped.
强制杀死Master上的postgres进程
[root@node1 pha4pgsql]# killall postgres
检查集群状态
由于设置了migration-threshold="3",发生一次普通的错误,Pacemaker会在原地重新启动postgres进程,不发生主从切换。 (如果Master的物理机或网络发生故障,直接进行failover。)
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 02:03:17 2016 Last change: Fri Apr 22 02:03:10 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Slaves: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000007000250 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 100 + pgsql-data-status : STREAMING|SYNC + pgsql-status : HS:sync Migration summary: * Node node2: * Node node1: pgsql: migration-threshold=3 fail-count=1 last-failure='Mon Apr 18 09:14:28 2016' Failed actions: pgsql_monitor_3000 on node1 'unknown error' (1): call=205, status=complete, exit-reason='none', last-rc-change='Fri Apr 22 02:02:56 2016', queued=0ms, exec=0ms pgsql_REPL_INFO:node1|1|00000000070000D0
再强制杀死Master上的postgres进程2次后检查集群状态。
这时已经发生了failover,产生了新的Master,并提升了时间线。
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 02:07:33 2016 Last change: Fri Apr 22 02:07:31 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node2 vip-slave (ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Masters: [ node2 ] Stopped: [ node1 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : -INFINITY + pgsql-data-status : DISCONNECT + pgsql-status : STOP * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000007000410 + pgsql-status : PRI Migration summary: * Node node2: * Node node1: pgsql: migration-threshold=3 fail-count=3 last-failure='Mon Apr 18 09:18:58 2016' Failed actions: pgsql_monitor_3000 on node1 'not running' (7): call=237, status=complete, exit-reason='none', last-rc-change='Fri Apr 22 02:07:26 2016', queued=0ms, exec=0ms pgsql_REPL_INFO:node2|2|0000000007000410
修复旧Master
通过pg_baseback修复旧Master
[root@node1 pha4pgsql]# rm -rf /data/postgresql/data/* [root@node1 pha4pgsql]# cls_rebuild_slave 22636/22636 kB (100%), 1/1 tablespace All resources/stonith devices successfully cleaned up wait for recovery complete ..... slave recovery of node1 successed [root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 02:40:48 2016 Last change: Fri Apr 22 02:40:36 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node2 vip-slave (ocf::heartbeat:IPaddr2): Started node1 Master/Slave Set: msPostgresql [pgsql] Masters: [ node2 ] Slaves: [ node1 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 100 + pgsql-data-status : STREAMING|SYNC + pgsql-status : HS:sync * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000007000410 + pgsql-status : PRI Migration summary: * Node node2: * Node node1: pgsql_REPL_INFO:node2|2|0000000007000410
9.5以上版本还可以通过pg_rewind修复旧Master
[root@node1 pha4pgsql]# cls_repair_slave connected to server servers diverged at WAL position 0/7000410 on timeline 2 rewinding from last common checkpoint at 0/7000368 on timeline 2 reading source file list reading target file list reading WAL in target need to copy 67 MB (total source directory size is 85 MB) 69591/69591 kB (100%) copied creating backup label and updating control file syncing target data directory Done! All resources/stonith devices successfully cleaned up wait for recovery complete .... slave recovery of node1 successed
故障前的集群状态
故障前的Master是node1
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 11:28:26 2016 Last change: Fri Apr 22 11:25:56 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Slaves: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000009044898 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 100 + pgsql-data-status : STREAMING|SYNC + pgsql-status : HS:sync Migration summary: * Node node2: * Node node1: pgsql_REPL_INFO:node1|12|0000000009044898
阻断Master和其它节点的通信
[root@node1 pha4pgsql]# iptables -A INPUT -j DROP -s node2 [root@node1 pha4pgsql]# iptables -A OUTPUT -j DROP -d node2 [root@node1 pha4pgsql]# iptables -A INPUT -j DROP -s node3 [root@node1 pha4pgsql]# iptables -A OUTPUT -j DROP -d node3
等10几秒后检查集群状态
在node1(旧Master)上查看,由于失去分布式锁,node1已经停止了部署在自身上面的所有资源。
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 11:34:46 2016 Last change: Fri Apr 22 11:25:56 2016 by root via crm_resource on node1 Stack: corosync Current DC: node1 (1) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 ] OFFLINE: [ node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Stopped vip-slave (ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Stopped: [ node1 node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : -INFINITY + pgsql-data-status : LATEST + pgsql-status : STOP Migration summary: * Node node1: pgsql: migration-threshold=3 fail-count=2 last-failure='Fri Apr 22 11:34:23 2016' Failed actions: pgsql_promote_0 on node1 'unknown error' (1): call=990, status=complete, exit-reason='none', last-rc-change='Fri Apr 22 11:34:15 2016', queued=0ms, exec=7756ms pgsql_REPL_INFO:node1|12|0000000009044898
在node2上查看,发现node2已经被提升为新Master,PostgreSQL的时间线也从12增长到了13。
[root@node2 ~]# cls_status Last updated: Sun May 8 01:02:04 2016 Last change: Sun May 8 00:57:47 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node2 ] OFFLINE: [ node1 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node2 vip-slave (ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Masters: [ node2 ] Stopped: [ node1 ] Node Attributes: * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000009045828 + pgsql-status : PRI Migration summary: * Node node2: pgsql_REPL_INFO:node2|13|0000000009045828
请注意,这时发生了网络分区,node1和node2各自保存的集群状态是不同的。
恢复node1上的网络
[root@node1 pha4pgsql]# iptables -F
再次在node1上检查集群状态
再次在node1上检查集群状态,发现node1和node2两个分区合并后,集群采纳了node2的配置而不是node1,这正是我们想要的(由于node2上的集群配置的版本更高,所以采纳node2而不是node1的配置)。同时,Pacemaker试图重新启动node1上的PostgreSQL进程时,发现它的最近一次checkpoint位置大于等于上次时间线提升的位置,不能作为Slave连到新Master上所以报错并阻止它上线。
[root@node1 pha4pgsql]# cls_status Last updated: Fri Apr 22 11:49:44 2016 Last change: Sun May 8 00:57:47 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node2 vip-slave (ocf::heartbeat:IPaddr2): Started node1 Master/Slave Set: msPostgresql [pgsql] Masters: [ node2 ] Stopped: [ node1 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : -INFINITY + pgsql-data-status : DISCONNECT + pgsql-status : STOP * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 0000000009045828 + pgsql-status : PRI Migration summary: * Node node2: * Node node1: pgsql: migration-threshold=3 fail-count=1000000 last-failure='Sun May 8 01:12:57 2016' Failed actions: pgsql_start_0 on node1 'unknown error' (1): call=1022, status=complete, exit-reason='The master's timeline forked off current database system timeline 13 before latest checkpoint location 0000000009045828, REPL_IN', last-rc-change='Fri Apr 22 11:49:35 2016', queued=0ms, exec=2123ms pgsql_REPL_INFO:node2|13|0000000009045828
修复node1(旧Master)
修复node1(旧Master)的方法和前面一样,使用cls_repair_slave或cls_rebuild_slave。
[root@node1 pha4pgsql]# cls_repair_slave connected to server servers diverged at WAL position 0/9045828 on timeline 13 rewinding from last common checkpoint at 0/9045780 on timeline 13 reading source file list reading target file list reading WAL in target need to copy 211 MB (total source directory size is 229 MB) 216927/216927 kB (100%) copied creating backup label and updating control file syncing target data directory Done! All resources/stonith devices successfully cleaned up wait for recovery complete .......... slave recovery of node1 successed
强制杀死Slave上的postgres进程
[root@node2 pha4pgsql]# killall postgres
检查集群状态
由于设置了migration-threshold="3",发生一次普通的错误,Pacemaker会在原地重新启动postgres进程。
[root@node2 ~]# cls_status Last updated: Sun May 8 01:34:36 2016 Last change: Sun May 8 01:33:01 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Slaves: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 00000000090650F8 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 100 + pgsql-data-status : STREAMING|SYNC + pgsql-status : HS:sync Migration summary: * Node node2: pgsql: migration-threshold=3 fail-count=1 last-failure='Sun May 8 01:32:44 2016' * Node node1: Failed actions: pgsql_monitor_4000 on node2 'not running' (7): call=227, status=complete, exit-reason='none', last-rc-change='Sun May 8 01:32:44 2016', queued=0ms, exec=0ms pgsql_REPL_INFO:node1|14|00000000090650F8
再强制杀死Master上的postgres进程2次后检查集群状态。
fail-count增加到3后,Pacemaker不再启动PostgreSQL,保持其为停止状态。
[root@node2 ~]# cls_status Last updated: Sun May 8 01:36:16 2016 Last change: Sun May 8 01:36:07 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Stopped Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Stopped: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 00000000090650F8 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : -INFINITY + pgsql-data-status : DISCONNECT + pgsql-status : STOP Migration summary: * Node node2: pgsql: migration-threshold=3 fail-count=3 last-failure='Sun May 8 01:36:08 2016' * Node node1: Failed actions: pgsql_monitor_4000 on node2 'not running' (7): call=240, status=complete, exit-reason='none', last-rc-change='Sun May 8 01:36:08 2016', queued=0ms, exec=0ms pgsql_REPL_INFO:node1|14|00000000090650F8
同时,Master(node1)上的复制模式被自动切换到异步复制,防止写操作hang住。
[root@node1 pha4pgsql]# tail /var/lib/pgsql/tmp/rep_mode.conf synchronous_standby_names = ''
修复Salve
在node2上执行cls_cleanup,清除fail-count后,Pacemaker会再次启动PostgreSQL进程。
[root@node2 ~]# cls_cleanup All resources/stonith devices successfully cleaned up [root@node2 ~]# cls_status Last updated: Sun May 8 01:43:13 2016 Last change: Sun May 8 01:43:08 2016 by root via crm_resource on node1 Stack: corosync Current DC: node2 (2) - partition with quorum Version: 1.1.12-a14efad 2 Nodes configured 4 Resources configured Online: [ node1 node2 ] Full list of resources: vip-master (ocf::heartbeat:IPaddr2): Started node1 vip-slave (ocf::heartbeat:IPaddr2): Started node2 Master/Slave Set: msPostgresql [pgsql] Masters: [ node1 ] Slaves: [ node2 ] Node Attributes: * Node node1: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 1000 + pgsql-data-status : LATEST + pgsql-master-baseline : 00000000090650F8 + pgsql-status : PRI * Node node2: + #cluster-name : pgcluster + #site-name : pgcluster + master-pgsql : 100 + pgsql-data-status : STREAMING|SYNC + pgsql-status : HS:sync Migration summary: * Node node2: * Node node1: pgsql_REPL_INFO:node1|14|00000000090650F8
同时,Master(node1)上的复制模式又自动切换回到同步复制。
[root@node1 pha4pgsql]# tail /var/lib/pgsql/tmp/rep_mode.conf synchronous_standby_names = 'node2'
本项目使用的expgsql RA是在Resource Agent 3.9.7中的pgsql RA的基础上做的修改。修改内容如下:
引入分布式锁服务防止双节点集群出现脑裂,并防止在failover过程中丢失数据。
promote和monitor的同步复制切换为异步复制前都需要先获取锁,因此确保这两件事不能同时发生,也就防止了在同步复制模式下failover出现数据丢失。相应的引入以下参数:
并且内置了一个基于PostgreSQL的分布式锁实现,即tools\distlock。
根据Master是否发生变更动态采取restart或pg_ctl promote的方式提升Slave为Master。
当Master发生变更时采用pg_ctl promote的方式提升Slave为Master;未发生变更时采用restart的方式提升。 相应地废弃原pgsql RA的restart_on_promote参数。
记录PostgreSQL上次时间线切换前的时间线和xlog位置信息
这些信息记录在集群配置变量pgsql_REPL_INFO中。pgsql_REPL_INFO的值由以下3个部分组成,通过‘|’连接在一起。
RA启动时,会检查当前节点和pgsql_REPL_INFO中记录的状态是否有冲突,如有报错不允许资源启动。 因为有这个检查废弃原pgsql RA的PGSQL.lock锁文件。
资源启动时通过pgsql_REPL_INFO中记录的Master节点名,继续沿用原Master。
通过这种方式加速集群的启动,并避免不必要的主从切换。集群仅在初始启动pgsql_REPL_INFO的值为空时,才通过xlog比较确定哪个节点作为Master。
关于pgsql RA的原始功能请参考: