Heartbeating用于监控网络接口、通讯设备、IP label(servie,non-service,persistent)的可用性,以及节点的可用性。
从HACMP5.1开始,HA只使用基于RSCT拓扑服务的心跳,经典心跳不再使用(经典心跳使用网络接口模块NIMs,直接由clstrmgrES监控)。
HA通过在每个节点之间,在每个通讯接口和设备上交换消息来实现心跳。每个节点按指定的时间间隔向其他节点发送心跳信号,并期望在指定的时间间隔内收到相
应节点的心跳信号。如果没有收到心跳信号,则RSCT认为发生错误,报告给HACMP,由HACMP采取相应的恢复措施。
心跳信息可以通过2种途径交换:
基于IP网络
基于non-IP网络
Cluster孤岛:由于TCPIP网络的原因(交换机、路由器、HUB),基于IP的心跳不能正常发送接收,如果没有其他non-ip的心跳交换,每个
节点都会认为其他节点失败,自己请求获得资源,这将影响数据的一致性和完整性,所以HACMP应该能够区分是IP网络故障还是节点故障。防止孤岛的出现。
NON-IP网络心跳不使用TCPIP网络传输心跳,所以能有效的避免由于TCPIP网络故障造成的Cluster孤岛。
基于磁盘的心跳
HACMP5.1以后才支持。
此类型的心跳支持SSA、SCSI、FC类型的存储,使用磁盘(diskhb)交换心跳信息。该磁盘需要属于增强的concurrent vg。同时,此磁盘也可以用于存储其他共享信息。
1、 一块盘属于一个网络(2节点),2个节点上该磁盘的ID要一致
2、 每对节点配置一个网络
3、 该磁盘需要是增强的concurrent vg的一部分,但和RG无关但部分current vgn
基于IP别名的心跳
使用基于IP别名的心跳,当HACMP启动时,在每个存在的IP上添加一个IP别名用于心跳信息交换,该别名需要使用不同的子网,并且不属于任何名字解
析。RSCT使用该别名为每个通讯接口建立通讯组(心跳环),来交换心跳信息。该方式的心跳不再监控baseIP地址,而监视通讯接口和service
IP。IP别名的子网掩码需要和sercie IP的子网掩码相同。配置基于IP别名心跳的HACMP,你需要指定起始的IP地址。
例如:一个2节点的HA,每个节点2个网络接口en0、en1,起始用于心跳的IP别名是192.168.1.l则
Adapter/Node
Node1
Node2
ring
en0
192.168.1.1
192.168.1.2
Ring1
en1
192.168.2.1
192.168.2.2
Ring2
使用IP别名的适配器存储在HACMPadapter ODM类中。
心跳通讯测试
系统环境:H80(OS520008+HA5.3)F50(OS520008+HA5.3)FAStT600
心跳配置:网络别名心跳,心跳别名初始化IP:10.0.3.1
串口心跳,分别连主机的串口3---àtty1
磁盘心跳,增强的并行vg,hbvg--àhdisk4,FAStT存储
测试:
1、 在2个节点启动HA
2、 在主节点执行lssrc –ls topsvcs
# lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 491574 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 10.0.4.2 10.0.4.2
net_ether_01_0 [ 0] en1 0x4508653b 0x45086568
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 249 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 387 ICMP 0 Dropped: 0
NIM's PID: 454686
net_ether_01_1 [ 1] 2 2 S 10.0.3.2 10.0.3.2
net_ether_01_1 [ 1] en0 0x4508653c 0x45086569
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 249 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 387 ICMP 0 Dropped: 0
NIM's PID: 503972
rs232_0 [ 2] 2 2 S 255.255.0.1 255.255.0.1
rs232_0 [ 2] tty1 0x8508656b 0x8508656e
HB Interval = 2.000 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 186 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 178 ICMP 0 Dropped: 0
NIM's PID: 532510
diskhb_0 [ 3] 2 2 S 255.255.10.1 255.255.10.1
diskhb_0 [ 3] rhdisk4 0x8508653a 0x8508656c
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 126 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 125 ICMP 0 Dropped: 0
NIM's PID: 434228
2 locally connected Clients with PIDs:
haemd(413824) hagsd(450668)
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Configuration Instance = 3
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 767 KB. Static data segment size: 957 KB.
Dynamic data segment size: 4233. Number of outstanding malloc: 222
User time 0 sec. System time 0 sec.
Number of page faults: 263. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
用于心跳的进程
# ps -ef|grep nim
root 434228 491574 0 15:08:23 - 0:00 /usr/sbin/rsct/bin/hats_diskhb_nim
root 454686 491574 0 15:08:23 - 0:00 /usr/sbin/rsct/bin/hats_nim
root 503972 491574 0 15:08:23 - 0:00 /usr/sbin/rsct/bin/hats_nim
root 532510 491574 1 15:08:23 - 0:01 /usr/sbin/rsct/bin/hats_rs232_nim
3、 备份节点上运行lssrc –ls topsvcs
# lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 29482 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
net_ether_01_0 [ 0] 2 2 S 10.0.4.1 10.0.4.2
net_ether_01_0 [ 0] en1 0x45087231 0x45086568
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 1 Current group: 1
Packets sent : 1174 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 1726 ICMP 0 Dropped: 0
NIM's PID: 28006
net_ether_01_1 [ 1] 2 2 S 10.0.3.1 10.0.3.2
net_ether_01_1 [ 1] en0 0x45087232 0x45086569
HB Interval = 1.000 secs. Sensitivity = 10 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 1175 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 1724 ICMP 0 Dropped: 0
NIM's PID: 27640
rs232_0 [ 2] 2 2 S 255.255.0.0 255.255.0.1
rs232_0 [ 2] tty1 0x85087233 0x8508656e
HB Interval = 2.000 secs. Sensitivity = 5 missed beats
Missed HBs: Total: 1 Current group: 1
Packets sent : 13173 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 810 ICMP 0 Dropped: 0
NIM's PID: 25294
diskhb_0 [ 3] 2 2 S 255.255.10.0 255.255.10.1
diskhb_0 [ 3] rhdisk4 0x85087234 0x8508656c
HB Interval = 2.000 secs. Sensitivity = 4 missed beats
Missed HBs: Total: 0 Current group: 0
Packets sent : 568 ICMP 0 Errors: 0 No mbuf: 0
Packets received: 570 ICMP 0 Dropped: 0
NIM's PID: 26892
2 locally connected Clients with PIDs:
haemd( 26836) hagsd( 27390)
Dead Man Switch Enabled:
reset interval = 1 seconds
trip interval = 20 seconds
Configuration Instance = 3
Daemon employs no security
Segments pinned: Text Data.
Text segment size: 767 KB. Static data segment size: 957 KB.
Dynamic data segment size: 4169. Number of outstanding malloc: 222
User time 1 sec. System time 3 sec.
Number of page faults: 423. Process swapped out 0 times.
Number of nodes up: 2. Number of nodes down: 0.
根据上面的信息可以看出,一共有4个心跳环在传输心跳信号,2个以太网,1个串口,1个磁盘。心跳信号在心跳环内进行传输。
我们也可以通过日志来查看心跳传输的情况:
/var/ha/log目录下的nim.topsvcs.en0.wh,nim.topsvcs.en1.wh,nim.topsvcs.tty1.wh,nim.topsvcs.rhdisk4.wh
3、down 网卡en0
HA进行了正常的网卡swap操作。此时查看心跳日志。Nim.topsvcs.en0.wh
09/14 10:22:05.356: Error sending to 10.0.3.1: Bad file number.
09/14 10:22:05.356: Dispatching netmon request while another in progress.
09/14 10:22:05.356: Received a SEND MSG command. Dst: 10.0.3.1.
09/14 10:22:05.376: Error sending to 10.0.3.1: Network is down.
09/14 10:22:05.376: Error sending to 10.0.3.1: Network is down.
09/14 10:22:05.376: Error sending to 10.0.3.1: Network is down.
09/14 10:22:05.376: Error sending to 10.0.3.1: Network is down.
09/14 10:22:05.376: Error sending to 10.0.3.1: Network is down.
09/14 10:22:08.538: netmon response: Adapter is down
09/14 10:22:08.538: Adapter status successfully sent.
此时en0不再发送心跳信息。备份节点的en0发现发送给10.0.3.2地址的心跳失败,并收到停止发送心跳信息的命令,随后发送心跳信息的地址变成10.0.3.255
启动网卡en0后,心跳有开始正常传输。
4、其他心跳环类似。
5、更改心跳相关的参数
Extended Configuration----àExtended Topology Configuration----à
Configure HACMP Network Modules----à
Change a Network Module using Predefined Values
分别选择ether,diskhb,rs232
Change a Cluster Network Module using Pre-defined Values
[Entry Fields]
* Network Module Name diskhb
Description Disk Heartbeat Serial protocol
Failure Detection Rate Slow
NOTE: Changes made to this panel must be
propagated to the other nodes by
Verifying and Synchronizing the cluster
Slow区域可以改成Normal ,Fast
也可以用Show a Network Module菜单进行查看
阅读(1166) | 评论(0) | 转发(0) |