Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2801867
  • 博文数量: 423
  • 博客积分: 7770
  • 博客等级: 少将
  • 技术积分: 4766
  • 用 户 组: 普通用户
  • 注册时间: 2006-11-09 11:58
个人简介

Oracle/DB2/Postgresql/Mysql/Hadoop/Greenplum/Postgres-xl/Mongodb

文章分类

全部博文(423)

文章存档

2019年(3)

2018年(6)

2017年(27)

2016年(23)

2015年(30)

2014年(16)

2013年(31)

2012年(73)

2011年(45)

2010年(14)

2009年(30)

2008年(30)

2007年(63)

2006年(32)

分类: Mysql/postgreSQL

2016-07-19 12:28:00

当前环境:


主节点:108.88.3.116
备节点:108.88.3.180




一、主节点(116)环境配置及状态


[postgres@116 data]$  postgresql.conf 


#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------


# - Settings -


wal_level = logical # minimal, replica, or logical
# (change requires restart)
#fsync = on # flush data to disk for crash safety
# (turning this off can cause
# unrecoverable data corruption)




#checkpoint_timeout = 5min # range 30s-1h
max_wal_size = 1GB
min_wal_size = 80MB




# - Sending Server(s) -


# Set these on the master and on any standby that will send replication data.


max_wal_senders = 5 # max number of walsender processes
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables




# - Standby Servers -


# These settings are ignored on a master server.


hot_standby = on # "on" allows queries during recovery
# (change requires restart)




# - Archiving -


archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = 'cp %p /archivelog/%f' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
#               %f = file name only


[postgres@116 data]$ vi pg_hba.conf 


# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             108.88.3.0/24            trust  //这里为了方便直接设置trust






[postgres@116 data]$ vi recovery.done  //用于节点降为备节点的准备文件,在主节点不起作用,为了切换备节点时方便


#---------------------------------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------------------------------
#
# standby_mode
#
# When standby_mode is enabled, the PostgreSQL server will work as a
# standby. It will continuously wait for the additional XLOG records, using
# restore_command and/or primary_conninfo.
#
standby_mode = on
#
# primary_conninfo
#
# If set, the PostgreSQL server will try to connect to the primary using this
# connection string and receive XLOG records continuously.
#
primary_conninfo = 'host=108.88.3.180 port=5432  user=replication password=6220104' # e.g. 'host=localhost port=5432'
#
# If set, the PostgreSQL server will use the specified replication slot when
# connecting to the primary via streaming replication to control resource
# removal on the upstream node. This setting has no effect if primary_conninfo
# is not set.
#
#primary_slot_name = ''
#
# By default, a standby server keeps restoring XLOG records from the
# primary indefinitely. If you want to stop the standby mode, finish recovery
# and open the system in read/write mode, specify a path to a trigger file.
# The server will poll the trigger file path periodically and start as a
# primary server when it's found.
#
trigger_file = 'faile'




[postgres@node02 data]$  psql -d ghan
ghan=# \x
扩展显示已打开。
ghan=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+-----------------------------
pid              | 8101
usesysid         | 16396
usename          | replication
application_name | walreceiver
client_addr      | 108.88.3.180
client_hostname  | 
client_port      | 51184
backend_start    | 2016-07-11 13:10:58.32561+08
backend_xmin     | 
state            | streaming
sent_location    | 1/174A1240
write_location   | 1/174A1240
flush_location   | 1/174A1240
replay_location  | 1/174A1240
sync_priority    | 0
sync_state       | async


ghan=# 










二、备节点(180)环境配置及状态




[postgres@180 data]$  postgresql.conf 
#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------


# - Settings -


wal_level = logical # minimal, replica, or logical
# (change requires restart)
#fsync = on # flush data to disk for crash safety
# (turning this off can cause
# unrecoverable data corruption)




#checkpoint_timeout = 5min # range 30s-1h
max_wal_size = 1GB
min_wal_size = 80MB




# - Sending Server(s) -


# Set these on the master and on any standby that will send replication data.


max_wal_senders = 5 # max number of walsender processes
# (change requires restart)
#wal_keep_segments = 0 # in logfile segments, 16MB each; 0 disables




# - Standby Servers -


# These settings are ignored on a master server.


hot_standby = on # "on" allows queries during recovery
# (change requires restart)




# - Archiving -


archive_mode = on # enables archiving; off, on, or always
# (change requires restart)
archive_command = 'cp %p /archivelog/%f' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
#               %f = file name only




[postgres@180 data]$ vi pg_hba.conf 


# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             108.88.3.0/24            trust  //这里为了方便直接设置trust




[postgres@node01 data]$ vi recovery.conf   //由于正处于备份节点,故所以为recovery.conf


#---------------------------------------------------------------------------
# STANDBY SERVER PARAMETERS
#---------------------------------------------------------------------------
#
# standby_mode
#
# When standby_mode is enabled, the PostgreSQL server will work as a
# standby. It will continuously wait for the additional XLOG records, using
# restore_command and/or primary_conninfo.
#
standby_mode = on
#
# primary_conninfo
#
# If set, the PostgreSQL server will try to connect to the primary using this
# connection string and receive XLOG records continuously.
#
primary_conninfo = 'host=108.88.3.116 port=5432  user=replication password=6220104' # e.g. 'host=localhost port=5432'
#
# If set, the PostgreSQL server will use the specified replication slot when
# connecting to the primary via streaming replication to control resource
# removal on the upstream node. This setting has no effect if primary_conninfo
# is not set.
#
#primary_slot_name = ''
#
# By default, a standby server keeps restoring XLOG records from the
# primary indefinitely. If you want to stop the standby mode, finish recovery
# and open the system in read/write mode, specify a path to a trigger file.
# The server will poll the trigger file path periodically and start as a
# primary server when it's found.
#
trigger_file = 'faile'








三、备节点(180)提升为主节点


[postgres@180 data]$ touch faile
[postgres@180 data]$ cd ../
[postgres@180 data]$ tail -f 999.log 
LOG:  trigger file found: faile
FATAL:  terminating walreceiver process due to administrator command
LOG:  invalid record length at 1/174A1320: wanted 24, got 0
LOG:  redo done at 1/174A12E8
LOG:  last completed transaction was at log time 2016-07-11 13:09:25.476144+08
LOG:  selected new timeline ID: 9
LOG:  archive recovery complete
LOG:  MultiXact member wraparound protections are now enabled
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started








四、新主节点(180)增加数据(表及数据)


[postgres@node01 data]$ psql -d ghan
WARNING: password file "/home/postgres/.pgpass" has group or world access; permissions should be u=rw (0600) or less
psql (9.6beta2)
Type "help" for help.


ghan=# create table t11(id int);
CREATE TABLE
ghan=#              
insert into t11 values(generate_series(1,122179));
INSERT 0 122179
ghan=#              
insert into t11 values(generate_series(1,122179));
INSERT 0 122179
ghan=#              
insert into t11 values(generate_series(1,122179));
INSERT 0 122179
ghan=#              
insert into t11 values(generate_series(1,122179));
INSERT 0 122179
ghan=# select count(*) from t11
;
 count  
--------
 488716
(1 row)


ghan=#              
insert into t8 values(generate_series(1,122179));
INSERT 0 122179
ghan=# select count(*) from t11
;
 count  
--------
 488716
(1 row)


ghan=# select count(*) from t8;
 count  
--------
 382375
(1 row)


ghan=# checkpoint;








五、旧主节点(116)降为备节点




[postgres@116 data]$   pg_ctl -D data/ stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@116 data]$


[postgres@116 data]$  pg_rewind --target-pgdata data --source-server='host=108.88.3.180 port=5432 user=postgres dbname=ghan'
servers diverged at WAL position 1/174A1320 on timeline 8
rewinding from last common checkpoint at 1/174A1278 on timeline 8
Done!
[postgres@116 data]$ 


[postgres@116 data]$ mv recovery.done recovery.conf
[postgres@116 data]$ vi recovery.conf


primary_conninfo = 'host=108.88.3.180 port=5432  user=replication password=6220104' # e.g. 'host=localhost port=5432'






[postgres@116 data]$ pg_ctl -D data start -l ok.log   //如果长时间没有做切换,部份新备机的pg_xlog会找不到部份文件而无法启动,这时需要时主节点下/archivelg下归档复制到新pg_xlog 
server starting
[postgres@node02 data]$ tail -f ok.log 
LOG:  database system was interrupted while in recovery at log time 2016-07-11 13:40:26 CST
HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
LOG:  entering standby mode
LOG:  redo starts at 1/174A1240
LOG:  invalid record length at 1/1A33B220: wanted 24, got 0
LOG:  started streaming WAL from primary at 1/1A000000 on timeline 9
LOG:  consistent recovery state reached at 1/1A33B258
LOG:  database system is ready to accept read only connections














[postgres@116 data]$ psql -d ghan
psql (9.4.4, 服务器 9.6beta2)
警告:psql 主版本9.4,服务器主版本为9.6.
     一些psql功能可能无法工作.
输入 "help" 来获取帮助信息.


ghan=# select count(*) from t8;
 count  
--------
 382375
(1 行记录)


ghan=# select count(*) from t11;
 count  
--------
 488716
(1 行记录)


ghan=# 






六、主备节点状态




备节点状态:


postgres  8271     1  0 13:48 pts/0    00:00:00 /usr/local/pg9.6/bin/postgres -D data
postgres  8272  8271  0 13:48 ?        00:00:00 postgres: startup process   recovering 00000009000000010000001A
postgres  8274  8271  0 13:48 ?        00:00:00 postgres: checkpointer process       
postgres  8275  8271  0 13:48 ?        00:00:00 postgres: writer process             
postgres  8276  8271  0 13:48 ?        00:00:00 postgres: wal receiver process   streaming 1/1A33E648
postgres  8277  8271  0 13:48 ?        00:00:00 postgres: stats collector process    
postgres  8290  7839  0 13:50 pts/0    00:00:00 ps -ef
[postgres@116 data]$






主节点


postgres 20924     1  0 13:09 pts/0    00:00:00 /usr/local/pg9.6/bin/postgres -D data
postgres 20930 20924  0 13:10 ?        00:00:00 postgres: checkpointer process       
postgres 20931 20924  0 13:10 ?        00:00:00 postgres: writer process             
postgres 20932 20924  0 13:10 ?        00:00:00 postgres: stats collector process    
postgres 21009 20924  0 13:40 ?        00:00:00 postgres: wal writer process         
postgres 21010 20924  0 13:40 ?        00:00:00 postgres: autovacuum launcher process   
postgres 21011 20924  0 13:40 ?        00:00:00 postgres: archiver process   last was 000000090000000100000019
postgres 21044 20924  0 13:47 ?        00:00:00 postgres: wal sender process replication 108.88.3.116(36878) streaming 1/1A33E648
postgres 21055 20731  0 13:49 pts/0    00:00:00 ps -ef
[postgres@180 archivelog]$ 


ghan=# \x
Expanded display is on.
ghan=# SELECT * FROM pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 21044
usesysid         | 16396
usename          | replication
application_name | walreceiver
client_addr      | 108.88.3.116
client_hostname  | 
client_port      | 36878
backend_start    | 2016-07-11 13:47:55.931392+08
backend_xmin     | 
state            | streaming
sent_location    | 1/1A33E728
write_location   | 1/1A33E728
flush_location   | 1/1A33E728
replay_location  | 1/1A33E728
sync_priority    | 0
sync_state       | async


ghan=# 








错误解决:




pg_rewind—使一个PostgreSQL数据目录与另一个数据目录(该目录从第一个PostgreSQL数据目录创建而来)一致。


描述


pg_rewind是一个在集群的时间线参数偏离之后,用于使一个PostgreSQL集群与另一个相同集群的拷贝同步的工具。一个典型的场景是在故障转移之后,
让一个老的主服务器重新在线作为一个standby跟随新主服务器。


其结果相当于使用源数据目录替换目标数据目录。所有的文件都被拷贝,包括配置文件。与做一个基础备份或者像rsync这样的工具相比,
pg_rewind的优势是pg_rewind不需要读取所有集群中没有更改的文件。当数据库很大,并且只有一小部分不同的集群之间,使它的速度快得多。


pg_rewind检查源集群与目标集群的时间线历史来检测它们产生分歧的点,并希望在目标集群的pg_xlog目录找到WAL回到分歧点的所有方式。
在典型的故障转移场景:目标集群在分歧之后立即被关闭,那是没有问题的,但是,如果目标集群在分歧之后运行了很长一段时间,老的WAL文件可能不存在了。
在这种情况下,它们可以手动从WAL归档复制到pg_xlog目录。目前不支持从一个WAL归档中自动获取丢失的文件。


在运行pg_rewind之后,当目标服务器第一次被启动,它将进入恢复模式并重放从分歧点之后源服务器产生的所有WAL。
当pg_rewind被运行时,如果一些 WAL在源服务器上不再可用,因此不能用pg_rewind回话复制,当目标服务器被启动时时可以的。
这可以通过在目标数据目录创建一个带有合适的restore_command命令的recovery.conf文件来实现。






阅读(4542) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~