Could not establish cib_ro connection: Connection refused (111)错误的解决
安装rpm的pacemaker,corosync和crmsh 防火墙关了,Selinux是disable状态,其他条件都配置了:
一.出错
出错时的/etc/corosync/corosync.conf内容为:
totem {
version: 2
secauth: off
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.85.0
mcastaddr: 230.100.100.7
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
to_syslog: no
logfile: /var/log/cluster/corosync.log
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
amf {
mode: disabled
}
service {
ver: 0
name: pacemaker #定义corosync在启动时自动启动pacemaker
}
aisexec { #表示启动corosync的ais功能,以哪个用户的身份运行
user: root
group: root
}
启动两个节点上的corosync都能启动,日志信息为:
1.查看corosync引擎是否启动:
[root@node1 ~]# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
Nov 05 21:32:49 corosync [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Nov 05 21:32:49 corosync [MAIN ] Successfully read main configuration file '/etc/corosync/corosync.conf'.
2.检查集群关系有没有正确建立:
[root@node1 ~]# grep TOTEM /var/log/cluster/corosync.log
Nov 05 21:32:49 corosync [TOTEM ] Initializing transport (UDP/IP Multicast).
Nov 05 21:32:49 corosync [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Nov 05 21:32:49 corosync [TOTEM ] The network interface [192.168.85.144] is now up.
Nov 05 21:32:49 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Nov 05 21:34:06 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
3.检查pacemaker是否启动:
[root@node1 ~]# grep pcmk_startup /var/log/cluster/corosync.log
Nov 05 21:32:49 corosync [pcmk ] info: pcmk_startup: CRM: Initialized
Nov 05 21:32:49 corosync [pcmk ] Logging: Initialized pcmk_startup
Nov 05 21:32:49 corosync [pcmk ] info: pcmk_startup: Maximum core file size is: 4294967295
Nov 05 21:32:49 corosync [pcmk ] info: pcmk_startup: Service: 9
Nov 05 21:32:49 corosync [pcmk ] info: pcmk_startup: Local hostname: node1.a.com
4.检查启动过程中是否产生了错误
[root@node1 ~]# grep ERROR: /var/log/cluster/corosync.log | grep -v unpack_resources
Nov 05 21:32:49 corosync [pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
Nov 05 21:32:49 corosync [pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch' () for details on using Pacemaker with CMAN
[pcmk ] ERROR: process_ais_conf: You have configured a cluster using the Pacemaker plugin for Corosync. The plugin is not supported in this environment and will be removed very soon.
[pcmk ] ERROR: process_ais_conf: Please see Chapter 8 of 'Clusters from Scratch'() for details on using Pacemaker with CMAN
[pcmk ] ERROR: pcmk_wait_dispatch: Child process cib terminated with signal 6 (pid=1511, core=true)
下面就是很多内容差不多的错误信息.....
5.查看集群状态:
[root@node1 ~]# crm status
Could not establish cib_ro connection: Connection refused (111)
ERROR: status: crm_mon (rc=107): Connection to cluster failed: Transport endpoint is not connected
[root@node1 ~]# crm_mon
也是出现错误,说是无法连接其他节点,我看了一下pacemaker的状态没有启动,而且,此时service pacemaker start总是启动失败,两个节点都是这样;
二.解决
看了一下官网的资料,其他的部分都不变,只修改了配置文件:
[root@node1 corosync]# cat corosync.conf | egrep -v '#'
compatibility: whitetank
totem {
version: 2
secauth: on
threads: 0
interface {
ringnumber: 0
bindnetaddr: 192.168.85.0
mcastaddr: 239.245.4.1
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: no
debug: off
timestamp: on
logger_subsys {
subsys: AMF
debug: off
}
}
service {
name: pacemaker
ver: 1
}
担心是由于认证文件的问题,将之前用投机方法获取的密钥文件改为手工输入字符生成的密钥文件;
然后重新启动corosync服务都成功了,然后启动pacemaker也成功了;
查看pacemaker是否启动了子进程:
[root@node1 ~]# grep -e pacemakerd.*get_config_opt -e pacemakerd.*start_child -e "Starting Pacemaker" /var/log/cluster/corosync.log
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: get_config_opt: Found 'no' for option: to_syslog
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: get_config_opt: Defaulting to 'daemon' for option: syslog_facility
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: notice: main: Starting Pacemaker 1.1.11 (Build: 97629de): generated-manpages agent-manpages ascii-docs ncurses libqb-logging libqb-ipc nagios corosync-plugin cman acls
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Using uid=189 and group=189 for process cib
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30944 for process cib
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30945 for process stonith-ng
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30946 for process lrmd
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Using uid=189 and group=189 for process attrd
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30947 for process attrd
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Using uid=189 and group=189 for process pengine
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30948 for process pengine
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Using uid=189 and group=189 for process crmd
Nov 05 21:35:12 [30938] node1.a.com pacemakerd: info: start_child: Forked child 30949 for process crmd
[root@node1 ~]# ps axf
............
30938 pts/1 S 0:00 pacemakerd
30944 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
30945 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
30946 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
30947 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
30948 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
30949 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
检查集群状态
[root@node1 ~]# crm_mon
Last updated: Thu Nov 5 21:39:24 2015
Last change: Thu Nov 5 21:39:09 2015
Stack: classic openais (with plugin)
Current DC: node1.a.com - partition with quorum
Version: 1.1.11-97629de
2 Nodes configured, 2 expected votes
0 Resources configured
Online: [ node1.a.com node2.a.com ]
补充:
1.对于我的错误,可能是不支持corosync启动后无法将pacemaker作为插件来启动这个功能,所以corosync能启动,但pacemaker总是出问题,事实证明,做错的时候确实是这样,corosync启动后我查看pacemaker状态是stop而且无法手动启动,因为pacemaker是corosync的插件,是由corosync来启动的而不是靠我手动启动的;
2.对于service选项段中的ver值(0和1)
简而言之:
0表示pacemaker作为corosync的插件来运行;
1表示pacemaker作为单一的守护进程来运行,也就是说在启动corosync后还必须手动启动pacemaker守护进程;
3.附上相关的资料:
service {
name: pacemaker
ver: 1
}
When run in version 1 mode, the plugin does not start the Pacemaker daemons. Instead
it just sets up the quorum and messaging interfaces needed by the rest of the stack.
Starting the dameons occurs when the Pacemaker init script is invoked. This resolves
two long standing issues:
a. Forking inside a multi-threaded process like Corosync causes all sorts of pain.
This has been problematic for Pacemaker as it needs a number of daemons to be
spawned.
b. Corosync was never designed for staggered shutdown - something previously needed
in order to prevent the cluster from leaving before Pacemaker could stop all
active resources.
阅读(5733) | 评论(0) | 转发(0) |