分类: LINUX
2009-08-12 09:51:48
本文参考文献:
REDHAT官方网站:
Red Hat GFS 6.1 Administrator's Guide
Red Hat Cluster Suite Configuring and Managing a Cluster
CU网站:
GFS6.1 ON RHAS4 U2安装文档
RedHat GFS 集群文件系统入门和进阶 资源帖
用VMware GSX和W2K 群集服务实现Exchange群集
注视:本文后半部分5-10部分基本抄袭suran007 同学的文章,但是去掉了LVM部分,因为没有这个环境。特此GPL公告。
后面我将写一个前端LVS,后端GFS的配置文档,希望对大家有所帮助。
一、测试环境
主机:一台PC,AMD-64位的芯片,1G内存,安装CentOS-4.4-x86_64版本的操作系统
在这个主机上面安装了三个虚拟机,全部安装CentOS-4.4-x86_64版本的操作系统,未进行内核定制,也未打内核补丁,同时安装了X-windows系统,同时参考
这篇文章创建了共享磁盘,三个虚拟机使用一个共享磁盘,不过我的网卡一开始就设了两个
二、需要安装的包以及顺序
包下载地址:
2.1. 在所有节点上安装必须的软件包,软件包完整列表请参考GFS6.1用户手册 (
INSTALL)
rgmanager — Manages cluster services and resources
system-config-cluster — Contains the Cluster Configuration Tool, used to graphically configure the cluster and the display of the current status of the
nodes, resources, fencing agents, and cluster services
ccsd — Contains the cluster configuration services daemon (ccsd) and associated files
magma — Contains an interface library for cluster lock management
magma-plugins — Contains plugins for the magma library
cman — Contains the Cluster Manager (CMAN), which is used for managing cluster membership, messaging, and notification
cman-kernel — Contains required CMAN kernel modules
dlm — Contains distributed lock management (DLM) library
dlm-kernel — Contains required DLM kernel modules
fence — The cluster I/O fencing system that allows cluster nodes to connect to a variety of network power switches, fibre channel switches, and integrated
power management interfaces
gulm — Contains the GULM lock management userspace tools and libraries (an alternative to using CMAN and DLM).
iddev — Contains libraries used to identify the file system (or volume manager) in which a device is formatted
Also, you can optionally install Red Hat GFS on your Red Hat Cluster Suite. Red Hat GFS consists of the following RPMs:
GFS — The Red Hat GFS module
GFS-kernel — The Red Hat GFS kernel module
gnbd — The GFS Network Block Device module
gnbd-kernel — Kernel module for the GFS Network Block Device
lvm2-cluster — Cluster extensions for the logical volume manager
GFS-kernheaders — GFS kernel header files
gnbd-kernheaders — gnbd kernel header files
2.2 安装软件和顺序
安装脚本,install.sh
#!/bin/bash
rpm -ivh kernel-smp-2.6.9-42.EL.x86_64.rpm
rpm -ivh kernel-smp-devel-2.6.9-42.EL.x86_64.rpm
rpm -ivh perl-Net-Telnet-3.03-3.noarch.rpm
rpm -ivh magma-1.0.6-0.x86_64.rpm
rpm -ivh magma-devel-1.0.6-0.x86_64.rpm
rpm -ivh ccs-1.0.7-0.x86_64.rpm
rpm -ivh ccs-devel-1.0.7-0.x86_64.rpm
rpm -ivh cman-kernel-2.6.9-45.4.centos4.x86_64.rpm
rpm -ivh cman-kernheaders-2.6.9-45.4.centos4.x86_64.rpm
rpm -ivh cman-1.0.11-0.x86_64.rpm
rpm -ivh cman-devel-1.0.11-0.x86_64.rpm
rpm -ivh dlm-kernel-2.6.9-42.12.centos4.x86_64.rpm
rpm -ivh dlm-kernheaders-2.6.9-42.12.centos4.x86_64.rpm
rpm -ivh dlm-1.0.1-1.x86_64.rpm
rpm -ivh dlm-devel-1.0.1-1.x86_64.rpm
rpm -ivh fence-1.32.25-1.x86_64.rpm
rpm -ivh GFS-6.1.6-1.x86_64.rpm
rpm -ivh GFS-kernel-2.6.9-58.2.centos4.x86_64.rpm
rpm -ivh GFS-kernheaders-2.6.9-58.2.centos4.x86_64.rpm
#gnbd:
rpm -ivh gnbd-kernel-2.6.9-9.43.centos4.x86_64.rpm
rpm -ivh gnbd-kernheaders-2.6.9-9.43.centos4.x86_64.rpm
rpm -ivh gnbd-1.0.7-0.x86_64.rpm
rpm -ivh gulm-1.0.7-0.x86_64.rpm
rpm -ivh gulm-devel-1.0.7-0.x86_64.rpm
rpm -ivh iddev-2.0.0-3.x86_64.rpm
rpm -ivh iddev-devel-2.0.0-3.x86_64.rpm
rpm -ivh magma-plugins-1.0.9-0.x86_64.rpm
rpm -ivh rgmanager-1.9.53-0.x86_64.rpm
rpm -ivh system-config-cluster-1.0.25-1.0.noarch.rpm
rpm -ivh ipvsadm-1.24-6.x86_64.rpm
rpm ivh piranha-0.8.2-1.x86_64.rpm --nodeps
注视:有些包有依赖关系,使用nodeps开关进行安装即可,另外因为本人安装了X-windows,所以gnome的包没写在这里
2.3、修改各个节点上的/etc/hosts文件(每个节点都一样)
如下:
[root@gfs-node01 etc]# cat hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
#127.0.0.1 gfs-node01 localhost.localdomain localhost
192.168.10.1 gfs-node01
192.168.10.2 gfs-node02
192.168.10.3 gfs-node03
192.168.10.1 gnbd-server
[root@gfs-node01 etc]#
备注:我把gnbs-server和node01放在了同一台server上,实际运行如果有条件的话中最好分开
三、在node-01上进行分区
#dmesg |grep scsi察看scsi设备,如下:
[root@gfs-node01 ~]# dmesg|grep scsi
scsi0 : ioc0: LSI53C1030, FwRev=00000000h, Ports=1, MaxQ=128, IRQ=169
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
说明认出了scsi设备,然后用fdisk进行了分区,注意,只进行分区,不要格式化,在这里我使用fdisk分了两个大小为3G的分区,分别为:
/dev/sda1和/dev/sda2,如下:
[root@gfs-node01 ~]# fdisk -l
Disk /dev/hda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 83 Linux
/dev/hda2 14 1044 8281507+ 8e Linux LVM
Disk /dev/sda: 8589 MB, 8589934592 bytes
255 heads, 63 sectors/track, 1044 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 1 366 2939863+ 83 Linux
/dev/sda2 367 732 2939895 83 Linux
[root@gfs-node01 ~]#
四、运行system-config-cluster进行配置
(注视“因为system-config-cluster需要X-windows运行,所以第一次测试时安装了X-windows,实际配置时候只需要根据下面的文件进行修改即可)
增加三个节点,节点的权置全部设置为1,即Quorum值设置为1
三个节点的名称为:
gfs-node01
gfs-node02
gfs-node03
修改cluster.conf文件,如下:
[root@gfs-node01 ~]# cat /etc/cluster/cluster.conf
[root@gfs-node01 ~]#
使用scp命令把这个配置文件copy到02/03节点上
五、 在01/02/03节点上启动dlm,ccsd,fence等服务
5.1、在三个节点上加载dlm模块
[root@gfs-node01 cluster]# modprobe lock_dlm
[root@gfs-node02 cluster]# modprobe lock_dlm
[root@gfs-node03 cluster]# modprobe lock_dlm
5.2、启动ccsd服务
[root@gfs-node01 cluster]# ccsd
[root@gfs-node02 cluster]# ccsd
[root@gfs-node03 cluster]# ccsd
5.3、在两个节点上启动集群管理器(cman)
root@one # /sbin/cman_tool join
root@tow # /sbin/cman_tool join
5.4、测试ccsd服务
(注意:ccsd的测试要等cman启动完成后,然后才可以进行下面的测试
[root@gfs-node01 cluster]# ccs_test connect
[root@gfs-node02 cluster]# ccs_test connect
[root@gfs-node03 cluster]# ccs_test connect
# ccs_test connect 各个节点的返回如下:
node 1:
[root@gfs-node01 cluster]# ccs_test connect
Connect successful.
Connection descriptor = 0
node 2:
[root@gfs-node02 cluster]# ccs_test connect
Connect successful.
Connection descriptor = 30
node 3:
[root@gfs-node03 cluster]# ccs_test connect
Connect successful.
Connection descriptor = 60
5.5、查看节点状态
cat /proc/cluster/nodes,应该返回
[root@gfs-node01 cluster]# cat /proc/cluster/nodes
Node Votes Exp Sts Name
1 1 3 M gfs-node01
2 1 3 M gfs-node02
3 1 3 M gfs-node03
[root@gfs-node01 cluster]#
六、加入fence域:
[root@gfs-node01 cluster]# /sbin/fence_tool join
[root@gfs-node02 cluster]# /sbin/fence_tool join
[root@gfs-node03 cluster]# /sbin/fence_tool join
七、 查看集群状态
Node 1:
[root@gfs-node01 cluster]# cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 1
Cluster name: alpha_cluster
Cluster ID: 50356
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 1
Node name: gfs-node01
Node ID: 1
Node addresses: 192.168.10.1
Node 2
[root@gfs-node02 cluster]# cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 1
Cluster name: alpha_cluster
Cluster ID: 50356
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 1
Node name: gfs-node02
Node ID: 2
Node addresses: 192.168.10.2
Node3:
[root@gfs-node03 cluster]# cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 1
Cluster name: alpha_cluster
Cluster ID: 50356
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 3
Expected_votes: 3
Total_votes: 3
Quorum: 2
Active subsystems: 1
Node name: gfs-node03
Node ID: 3
Node addresses: 192.168.10.3
八、 在gnbd-server 导出设备
8.1、启动gnbd_serv进程
[root@gfs-node01 cluster]# /sbin/gnbd_serv –v –n
gnbd_serv: startup succeeded
导出设备
[root@gfs-node01 cluster]# gnbd_export -v -e gfs -d /dev/sda1 -c
gnbd_export: created GNBD gfs serving file /dev/sda1
[root@gfs-node01 cluster]#
查看export状态信息
[root@gfs-node01 cluster]# gnbd_export -v -l
Server[1] : gfs
--------------------------
file : /dev/sda1
sectors : 5879727
readonly : no
cached : yes
timeout : no
[root@gfs-node01 cluster]#
8.2、 在01/02/03三个节点导入设备
[root@gfs-node01 cluster] # modprobe gnbd
[root@gfs-node02 cluster] # modprobe gnbd
[root@gfs-node03 cluster] # modprobe gnbd
导入设备
Node1:
[root@gfs-node01 cluster]# gnbd_import -v -i gnbd-server
gnbd_import: created gnbd device gfs
gnbd_recvd: gnbd_recvd started
[root@gfs-node01 cluster]#
注视:此处的gnbd-server即是你写在/etc/hosts里面的冬冬
Node2:
[root@gfs-node02 cluster]# gnbd_import -v -i gnbd-server
gnbd_import: created directory /dev/gnbd
gnbd_import: created gnbd device gfs
gnbd_recvd: gnbd_recvd started
[root@gfs-node02 cluster]#
Node3:
[root@gfs-node03 cluster]# gnbd_import -v -i gnbd-server
gnbd_import: created directory /dev/gnbd
gnbd_import: created gnbd device gfs
gnbd_recvd: gnbd_recvd started
[root@gfs-node03 cluster]#
8.3、查看导入状态信息 (三个节点都要做,我只写一个节点)
[root@gfs-node01 cluster]# gnbd_import -v -l
Device name : gfs
----------------------
Minor # : 0
sysfs name : /block/gnbd0
Server : gnbd-server
Port : 14567
State : Close Connected Clear
Readonly : No
Sectors : 5879727
[root@gfs-node01 cluster]#
九、 建立gfs文件系统并且挂载
9.1、在三个节点加载gfs模块
[root@gfs-node01 cluster] # modprobe gfs
[root@gfs-node02 cluster] # modprobe gfs
[root@gfs-node03 cluster] # modprobe gfs
9.2、在gnbd-server(我这里也是node01节点)上建立gfs文件系统
[root@gfs-node01 cluster]# gfs_mkfs -p lock_dlm -t alpha_cluster:gfs -j 3 /dev/gnbd/gfs
This will destroy any data on /dev/gnbd/gfs.
It appears to contain a GFS filesystem.
Are you sure you want to proceed? [y/n] y
Device: /dev/gnbd/gfs
Blocksize: 4096
Filesystem Size: 669344
Journals: 2
Resource Groups: 12
Locking Protocol: lock_dlm
Lock Table: alpha_cluster:gfs
Syncing...
All Done
[root@gfs-node01 cluster]#
9.3、在三个节点挂载文件系统
在三个节点的根目录下都创建一个名为gfstest的目录,然后
[root@gfs-node01 cluster] # mount -t gfs /dev/gnbd/gfs /gfstest
[root@gfs-node02 cluster] # mount -t gfs /dev/gnbd/gfs /gfstest
[root@gfs-node03 cluster] # mount -t gfs /dev/gnbd/gfs /gfstest
注视:要在node01挂载文件完全完成后,再在02节点上进行mount,02节点mount完成后,再在03节点进行mount
注视:我在进行node03的mount时出现了如下错误
[root@gfs-node03 cluster]# mount -t gfs /dev/gnbd/gfs /gfstest
mount: wrong fs type, bad option, bad superblock on /dev/gnbd/gfs,
or too many mounted file systems
经查,发现我是按照gfs_mkfs -p lock_dlm -t alpha_cluster:gfs -j 2 /dev/gnbd/gfs做的,是两个节点的,后来从第九部开始重做并进行了umount,然后再mount就对了,希望对大家有所帮助,再次说明一下:
gfs_mkfs -p lock_dlm -t alpha_cluster:gfs -j 3 /dev/gnbd/gfs
其中的3表示三个节点,实际上的话你是几个就写几个就可以了
为了以后服务器重启能自动加载:
方法1: 在rc.local里加入(这种方法是启用gnbd服务前提)
/sbin/modprobe gnbd
/sbin/modprobe gfs
/sbin/gnbd_serv
#/sbin/gnbd_export -d /dev/sdb1 -e gfs-1 -U -t 5
/sbin/gnbd_export -v -e gfs2 -d /dev/sdc1 -c
/sbin/gnbd_export -v -e gfs1 -d /dev/sdb1 -c
sleep 2
/sbin/gnbd_import -i 192.168.0.50
sleep 2
mount -t gfs /dev/gnbd/gfs2 /mnt/gfs2
sleep 2
mount -t gfs /dev/gnbd/gfs1 /mnt/gfs1
另下一台gfs客户端:rc.local
/sbin/modprobe gnbd
/sbin/modprobe gfs
/sbin/gnbd_import -i 192.168.0.50
mount -t gfs /dev/gnbd/gfs1 /mnt/gfs1
sleep 2
mount -t gfs /dev/gnbd/gfs2 /mnt/gfs2
方法2: 不用gnbd主接挂接盘阵(这各方法已经没有gnbd server服务 并发写是否出错有提研究,但我做实验没发现问题,效果跟gnbd一样,同一个文件只能让一台服务器写说明有加锁)
/etc/fstab 加入
/dev/sdb1 /mnt/gfs1 gfs _netdev 0 0
/dev/sdc1 /mnt/gfs2 gfs _netdev 0 0