分类:
2011-12-16 15:37:09
原文地址:RHEL5下安装GFS集群文件系统 作者:chenwenming
rpm -ivh perl-Net-Telnet-3.03-5.noarch.rpm
rpm -ivh perl-XML-SAX-0.14-5.noarch.rpm
rpm -ivh perl-XML-NamespaceSupport-1.09-1.2.1.noarch.rpm
rpm -ivh perl-XML-LibXML-Common-0.13-8.2.2.i386.rpm
rpm -ivh perl-XML-LibXML-1.58-5.i386.rpm
rpm -ivh pexpect-2.3-1.el5.noarch.rpm
rpm -ivh openais-0.80.3-22.el5.i386.rpm
rpm -ivh ipvsadm-1.24-8.1.i386.rpm
rpm -ivh piranha-0.8.4-11.el5.i386.rpm
rpm -ivh gfs2-utils-0.1.53-1.el5.i386.rpm
rpm -ivh gfs-utils-0.1.18-1.el5.i386.rpm
rpm -ivh kmod-gfs-xen-0.1.31-3.el5.i686.rpm
rpm -ivh cman-2.0.98-1.el5.i386.rpm
rpm -ivh rgmanager-2.0.46-1.el5.centos.i386.rpm
rpm -ivh system-config-cluster-1.0.55-1.0.noarch.rpm
设置hosts
vi /etc/hosts 加入
192.168.0.23 gfs3
192.168.0.22 gfs2
192.168.0.21 gfs1
设置集群配置文件
vi /etc/cluster/cluster.conf 加入
测式fence设备
fence_ilo -a 192.168.0.11 -l Administrator -p 123456 -o status
Status: ON
fence_ilo -a 192.168.0.12 -l Administrator -p 123456 -o status
Status: ON
fence_ilo -a 192.168.0.13 -l Administrator -p 123456 -o status
Status: ON
说明三台服务器fence设备正常.
启动集群服务
[root@gfs1 ~]# service cman start
Starting cluster:
Enabling workaround for Xend bridged networking... done
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... done
[确定]
[root@gfs1 ~]# service rgmanager start
分别在三台服务器上启动
显示集群状态
root@gfs1 ~]# clustat
Cluster Status for alpha_cluster @ Fri Sep 11 16:06:05 2009
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
gfs1 1 Online, Local
gfs2 2 Online
gfs3 3 Online
到目前为止集群已经配置成功了,还差gfs服务.
由于环境没有nas 或san阵列环境 我用软件iscsi-initiator scsi-target-utils 组合来模拟
步骤见我blog上另一篇文章.
创建gfs系统
gfs_mkfs -p lock_dlm -t alpha_cluster:gfs -j 3 /dev/sda1
It appears to contain a GFS filesystem.
Are you sure you want to proceed? [y/n] y
Device: /dev/sda1
Blocksize: 4096
Filesystem Size: 669344
Journals: 2
Resource Groups: 12
Locking Protocol: lock_dlm
Lock Table: alpha_cluster:gfs
Syncing...
All Done
在三个节点挂载文件系统
[root@gfs1 cluster] # mount -t gfs /dev/sda1 /mnt/gfs
[root@gfs2 cluster] # mount -t gfs /dev/sda1 /mnt/gfs
[root@gfs3 cluster] # mount -t gfs /dev/sda1 /mnt/gfs
收尾步骤:
vi /etc/fstab 加入
/dev/sda1 /mnt/gfs1 gfs defaults 0 0
加入开机启动参数(注意先后循序)
chkconfig --level 2345 rgmanager on
chkconfig --level 2345 gfs on
chkconfig --level 2345 cman on
以下是redhat官方的解释
-----------------------------------------------------------------------
故障测试:
把gfs3主机的网线拔了
在gfs1上看日志如下
Sep 11 16:38:01 gfs1 openais[3408]: [TOTEM] The token was lost in the OPERATIONAL state.
Sep 11 16:38:01 gfs1 openais[3408]: [TOTEM] Receive multicast socket recv buffer size (288000 bytes).
Sep 11 16:38:01 gfs1 openais[3408]: [TOTEM] Transmit multicast socket send buffer size (262142 bytes).
Sep 11 16:38:01 gfs1 openais[3408]: [TOTEM] entering GATHER state from 2.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] entering GATHER state from 0.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] Creating commit token because I am the rep.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] Saving state aru 50 high seq received 50
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] Storing new sequence id for ring 153b0
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] entering COMMIT state.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] entering RECOVERY state.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] position [0] member 192.168.0.21:
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] previous ring seq 86956 rep 192.168.0.21
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] aru 50 high delivered 50 received flag 1
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] position [1] member 192.168.0.22:
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] previous ring seq 86956 rep 192.168.0.21
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] aru 50 high delivered 50 received flag 1
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] Did not need to originate any messages in recovery.
Sep 11 16:38:06 gfs1 kernel: dlm: closing connection to node 3
Sep 11 16:38:06 gfs1 fenced[3428]: gfs3 not a cluster member after 0 sec post_fail_delay
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] Sending initial ORF token
Sep 11 16:38:06 gfs1 fenced[3428]: fencing node "gfs3"
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] CLM CONFIGURATION CHANGE
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] New Configuration:
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] r(0) ip(192.168.0.21)
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] r(0) ip(192.168.0.22)
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] Members Left:
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] r(0) ip(192.168.0.23)
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] Members Joined:
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] CLM CONFIGURATION CHANGE
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] New Configuration:
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] r(0) ip(192.168.0.21)
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] r(0) ip(192.168.0.22)
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] Members Left:
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] Members Joined:
Sep 11 16:38:06 gfs1 openais[3408]: [SYNC ] This node is within the primary component and will provide service.
Sep 11 16:38:06 gfs1 openais[3408]: [TOTEM] entering OPERATIONAL state.
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] got nodejoin message 192.168.0.21
Sep 11 16:38:06 gfs1 openais[3408]: [CLM ] got nodejoin message 192.168.0.22
Sep 11 16:38:06 gfs1 openais[3408]: [CPG ] got joinlist message from node 2
Sep 11 16:38:06 gfs1 openais[3408]: [CPG ] got joinlist message from node 1
Sep 11 16:38:19 gfs1 fenced[3428]: fence "gfs3" success
Sep 11 16:38:19 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Trying to acquire journal lock...
Sep 11 16:38:19 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Looking at journal...
Sep 11 16:38:20 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Acquiring the transaction lock...
Sep 11 16:38:20 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Replaying journal...
Sep 11 16:38:22 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Replayed 0 of 1 blocks
Sep 11 16:38:22 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: replays = 0, skips = 0, sames = 1
Sep 11 16:38:22 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Journal replayed in 3s
Sep 11 16:38:22 gfs1 kernel: GFS: fsid=alpha_cluster:gfs.2: jid=0: Done
刚才被拔网线的gfs3主机被fence_ilo 成功,也就是被fence_ilo命令重启.
[root@gfs3 ~]# last
root pts/1 192.168.13.120 Sat Sep 12 00:45 still logged in
reboot system boot 2.6.18-128.el5xe Sat Sep 12 00:42 (00:03)