redis master-slave 切换演练
环境
IP地址
|
端口
|
属性
|
192.168.31.208
|
6379
|
master
|
192.168.31.208
|
6378
|
slave
|
192.168.31.209
|
6379
|
master
|
192.168.31.209
|
6378
|
slave
|
192.168.31.210
|
6379
|
master
|
192.168.31.210
|
6378
|
slave
|
环境变量
PATH=$PATH:$HOME/cpprelease/redis-3.0.2/src/:$HOME/bin
BASE_PATH=/home/beidou_soa/cpprelease/
export PATH
alias redisstart1='cd ~/redis/ && redis-server $BASE_PATH/cfg/redis/redis1.conf && cd -'
alias redisstart2='cd ~/redis/ && redis-server $BASE_PATH/cfg/redis/redis2.conf && cd -'
alias redisstop1='cd ~/redis/ && redis-cli -p 6379 shutdown && cd -'
alias redisstop2='cd ~/redis/ && redis-cli -p 6378 shutdown && cd -'
1.检查环境
Connecting to node 192.168.31.208:6379: OK
Connecting to node 192.168.31.210:6378: OK
Connecting to node 192.168.31.209:6378: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.208:6378: OK
>>> Performing Cluster Check (using node 192.168.31.208:6379)
M: 273aa3c0416e7d1795ce678d56bd2db148613f7e 192.168.31.208:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 08f61dcd66389dae5c39e375d4f52e1defa77ec1 192.168.31.210:6378
slots: (0 slots) slave
replicates 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0
S: d8e8369bfdbf4f9e0bebd04911181785e3ee1129 192.168.31.209:6378
slots: (0 slots) slave
replicates 4f11d4265178d72e0ccf7edf0ddabf835e9c56df
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 4b6e2b13b1be1a081db2153dc4beaf430b489605 192.168.31.208:6378
slots: (0 slots) slave
replicates 273aa3c0416e7d1795ce678d56bd2db148613f7e
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> get testkey001
(nil)
127.0.0.1:6379>
127.0.0.1:6379>
127.0.0.1:6379> set testkey002 testvalue002
-> Redirected to slot [401] located at 192.168.31.210:6379
OK
192.168.31.210:6379>
192.168.31.210:6379> get testkey002
"testvalue002"
192.168.31.210:6379> set testkey003 testvalue003
OK
2.准备关闭208的master
3.检查集群状态
查看slave的日志
6874:S 30 Dec 15:28:04.755 # Error condition on socket for SYNC: Connection refused
6874:S 30 Dec 15:28:05.758 * Connecting to MASTER 192.168.31.208:6379
6874:S 30 Dec 15:28:05.758 * MASTER <-> SLAVE sync started
6874:S 30 Dec 15:28:05.759 # Error condition on socket for SYNC: Connection refused
6874:S 30 Dec 15:28:06.647 * FAIL message received from 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 about 273aa3c0416e7d1795ce678d56bd2db148613f7e
6874:S 30 Dec 15:28:06.647 # Cluster state changed: fail
6874:S 30 Dec 15:28:06.662 # Start of election delayed for 842 milliseconds (rank #0, offset 105547).
6874:S 30 Dec 15:28:06.762 * Connecting to MASTER 192.168.31.208:6379
6874:S 30 Dec 15:28:06.762 * MASTER <-> SLAVE sync started
6874:S 30 Dec 15:28:06.763 # Error condition on socket for SYNC: Connection refused
6874:S 30 Dec 15:28:07.565 # Starting a failover election for epoch 4.
6874:S 30 Dec 15:28:07.567 # Failover election won: I'm the new master.
6874:S 30 Dec 15:28:07.567 # configEpoch set to 4 after successful failover
6874:M 30 Dec 15:28:07.567 * Discarding previously cached master state.
6874:M 30 Dec 15:28:07.567 # Cluster state changed: ok
查看集群状态
Connecting to node 192.168.31.208:6378: OK
Connecting to node 192.168.31.209:6378: OK
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6378: OK
Connecting to node 192.168.31.210:6379: OK
>>> Performing Cluster Check (using node 192.168.31.208:6378)
M: 4b6e2b13b1be1a081db2153dc4beaf430b489605 192.168.31.208:6378
slots:10923-16383 (5461 slots) master
0 additional replica(s)
S: d8e8369bfdbf4f9e0bebd04911181785e3ee1129 192.168.31.209:6378
slots: (0 slots) slave
replicates 4f11d4265178d72e0ccf7edf0ddabf835e9c56df
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 08f61dcd66389dae5c39e375d4f52e1defa77ec1 192.168.31.210:6378
slots: (0 slots) slave
replicates 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
4.关闭slave
5.检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.210:6378: OK
Connecting to node 192.168.31.209:6378: OK
Connecting to node 192.168.31.208:6378: OK
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: 08f61dcd66389dae5c39e375d4f52e1defa77ec1 192.168.31.210:6378
slots: (0 slots) slave
replicates 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0
S: d8e8369bfdbf4f9e0bebd04911181785e3ee1129 192.168.31.209:6378
slots: (0 slots) slave
replicates 4f11d4265178d72e0ccf7edf0ddabf835e9c56df
M: 4b6e2b13b1be1a081db2153dc4beaf430b489605 192.168.31.208:6378
slots:10923-16383 (5461 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
6.关闭3台slave
7.检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.208:6378: OK
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
M: 4b6e2b13b1be1a081db2153dc4beaf430b489605 192.168.31.208:6378
slots:10923-16383 (5461 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
8.在关闭slave的情况下关闭一台master
9.检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
集群进入fail状态,不可用
10.在集群down的状态开启一台slave
11.检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.208:6379: OK
*** WARNING: 192.168.31.208:6379 claims to be slave of unknown node ID 4b6e2b13b1be1a081db2153dc4beaf430b489605.
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
S: 273aa3c0416e7d1795ce678d56bd2db148613f7e 192.168.31.208:6379
slots: (0 slots) slave
replicates 4b6e2b13b1be1a081db2153dc4beaf430b489605
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes
集群在fail状态,无法自动选举master,集群不可用
12开启master,检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.208:6379: OK
Connecting to node 192.168.31.208:6378: OK
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
S: 273aa3c0416e7d1795ce678d56bd2db148613f7e 192.168.31.208:6379
slots: (0 slots) slave
replicates 4b6e2b13b1be1a081db2153dc4beaf430b489605
M: 4b6e2b13b1be1a081db2153dc4beaf430b489605 192.168.31.208:6378
slots:10923-16383 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
13.关闭一台master
14.检查集群状态
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: ^[[AOK
Connecting to node 192.168.31.208:6379: OK
*** WARNING: 192.168.31.208:6379 claims to be slave of unknown node ID 4b6e2b13b1be1a081db2153dc4beaf430b489605.
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
S: 273aa3c0416e7d1795ce678d56bd2db148613f7e 192.168.31.208:6379
slots: (0 slots) slave
replicates 4b6e2b13b1be1a081db2153dc4beaf430b489605
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[ERR] Not all 16384 slots are covered by nodes.
预计1分钟之后
Connecting to node 192.168.31.209:6379: OK
Connecting to node 192.168.31.210:6379: OK
Connecting to node 192.168.31.208:6379: OK
>>> Performing Cluster Check (using node 192.168.31.209:6379)
M: 4f11d4265178d72e0ccf7edf0ddabf835e9c56df 192.168.31.209:6379
slots:5461-10922 (5462 slots) master
0 additional replica(s)
M: 40cecda23f32cb3b8ff60752c00514f2d7d9c3d0 192.168.31.210:6379
slots:0-5460 (5461 slots) master
0 additional replica(s)
M: 273aa3c0416e7d1795ce678d56bd2db148613f7e 192.168.31.208:6379
slots:10923-16383 (5461 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
选举完成。redis集群恢复。
redis架构
架构细节:
(1)所有的redis节点彼此互联(PING-PONG机制),内部使用二进制协议优化传输速度和带宽.
(2)节点的fail是通过集群中超过半数的节点检测失效时才生效.
(3)客户端与redis节点直连,不需要中间proxy层.客户端不需要连接集群所有节点,连接集群中任何一个可用节点即可
(4)redis-cluster把所有的物理节点映射到[0-16383]slot上,cluster 负责维护node<->slot<->value
2) redis-cluster选举:容错
(1)领着选举过程是集群中所有master参与,如果半数以上master节点与master节点通信超过(cluster-node-timeout),认为当前master节点挂掉.
(2):什么时候整个集群不可用(cluster_state:fail),当集群不可用时,所有对集群的操作做都不可用,收到((error) CLUSTERDOWN The cluster is down)错误
a:
如果集群任意master挂掉,且当前master没有slave.集群进入fail状态,也可以理解成进群的slot映射[0-16383]不完成时进入fail状态.
b:
如果进群超过半数以上master挂掉,无论是否有slave集群进入fail状态.