分类: 服务器与存储
2011-03-08 13:25:31
This section discusses how to install and configure Red Hat Cluster Suite and Global File System on your Dell|Red Hat HA Cluster system using Conga and CLI Tools.
Conga is a configuration and management suite based on a server/agent model. You can access the management server luci using a standard web browser from anywhere on the network. Luci communicates to the client agent ricci on the nodes and installs all required packages, synchronizes the cluster configuration file, and manages the storage cluster. Though there are other possible methods such as system-config-cluster and creating an xml configuration file by hand, it is recommended that you use Conga to configure and manage your cluster.
[] Setting Up a High-Availability ClusterThe following section provides an overview to installing your cluster using Conga. For more information on using Conga, see the section Configuring Red Hat Cluster With Conga in the Cluster Administration guide at .
[] Preparing the Cluster Nodes for CongaRun the following commands on each cluster node install and start ricci:
Start the ricci service:
[root]# service ricci startExecute the following commands on the management node to install the Conga server luci:
NOTE: You can configure luci on any node, but it is recommended that you install luci on a dedicated management node. If you do have a dedicated management node, that node will need access to the Cluster channel on RHN or a Satellite server. You can also login to RHN and manually download luci for installation on your management node.
For more information on configuring your cluster with Conga, see the section Configuring Red Hat Cluster With Conga on the Red Hat website at or locally from the Cluster_Administration-en-US package.
[] Creating Your Cluster Using CongaConga automatically installs the software required for clustering, on all cluster nodes. Ensure you have completed the steps in before proceeding. Verify the steps in were completed for each cluster node, other wise luci will not be able to communicate with ricci.
1. Connect to the luci server from any browser on the same network as the management node. In your web browser, enter:
Where {management_node_hostname_or_IP_address} is the hostname or IP address of the management server running luci.
NOTE: If you encounter any errors, see section .
2. Enter your username and password to securely log in to the luci server.
3. Go to the cluster tab.
4. Click Create a New Cluster.
5. Enter a cluster name of 15 characters or less.
6. Add the fully qualified private hostname or IP address and root password for each cluster node.
NOTE: You may also select Check if node passwords are identical and only enter the password for the first node.
NOTE: The password is sent in an encrypted SSL session, and not saved.
7. Ensure that the option for Enable Shared Storage Support is selected and click Submit.
Conga downloads and installs all the required cluster software, creates a configuration file, and reboots each cluster node. Watch the Conga status window for details on the progress of each cluster node.
NOTE: If an error message such as An error occurred when trying to contact any of the nodes in the cluster appears in luci server webpage, wait a few minutes and refresh your browser.
[] Configuring Fencing Using CongaFencing ensures data integrity on the shared storage file system by removing any problematic nodes from the cluster. This is accomplished by cutting off power to the system to ensure it does not attempt to write to the storage device.
In your Dell|Red Hat HA Cluster system, network power switches provide the most reliable fencing method. Remote access controllers such as DRAC or IPMI should be used as secondary fencing methods. However, if no network power switches are available for primary fencing, a secondary method such as manual can be used, but is not supported. Use a log watching utility to notify you if your primary fencing method is failing.
When using a Dell M1000e modular blade enclosure, the Dell CMC may be used as a primary fencing method instead, as it controls power to each individual blades. In this case each blade's individual iDRAC or IPMI may be used as secondary fencing methods.
For more information, see and the section Fencing in the Cluster Suite Overview at .
Configure any network power switches and remote access controllers (DRAC or IPMI) on the same private network as the cluster nodes. Refer to the section for more information. For details on configuring your network power switches for remote access, see the documentation for that product.
To configure fencing:
Depending on the specific DRAC model your systems are using, one or more of the following sections may be applicable.
[] Configure iDRAC6 FencingPowerEdge Dell servers using iDRAC6 will need specific parameters set in order to function properly. For the latest information on support for this in Conga, see .
where your_iDRAC6_prompt is the one you copied in step 2. (e.g. admin1->)
Example:
Find the line for each fence device. This example shows a two node cluster with DRAC fencing:
Change the agent to fence_drac5 and add the option cmd_prompt="admin1->" on each line:
NOTE: You must update the cluster configuration as described in .
[] Configure DRAC CMC FencingPowerEdge M1000e Chassis Management Controller (CMC) acts as a network power switch of sorts. You configure a single IP address on the CMC, and connect to that IP for management. Individual blade slots can be powered up or down as needed. At this time Conga does not have an entry for the Dell CMC when configuring fencing. The steps in this section describe how to manually configure fencing for the Dell CMC. See for details on Conga support.
NOTE: At the time of this writing, there is a bug that prevents the CMC from powering the blade back up after it is fenced. To recover from a fenced outage, manually power the blade on (or connect to the CMC and issue the command racadm serveraction -m server-# powerup). New code available for testing can correct this behavior. See for beta code and further discussions on this issue.
NOTE: Using the individual iDRAC on each Dell Blade is not supported at this time. Instead use the Dell CMC as described in this section. If desired, you may configure IPMI as your secondary fencing method for individual Dell Blades. For information on support of the Dell iDRAC, see .
To configure your nodes for DRAC CMC fencing:
Example:
Find the line for each fence device. This example shows a two node cluster with DRAC CMC fencing:
Change the agent to fence_drac5 and change the option modulename= to module_name= on each line:
NOTE: You must update the cluster configuration as described in .
[] Configure DRAC SSH FencingBy default any DRAC5/iDRAC/iDRAC6 has SSH enabled, but telnet disabled.
To use DRAC5/iDRAC/iDRAC6 fencing over SSH check the Use SSH option while adding a fencing device to a node.
NOTE: This SSH option in Conga is included with luci-0.12.1-7.3.el5_3 and greater.
If you make any manual edits to /etc/cluster/cluster.conf, you will need to update all nodes in the cluster with the new configuration. Perform these steps from any one node to update the cluster configuration:
1. Edit /etc/cluster/cluster.conf and change the config_version number at the top of the file:
Increment it by one:
2. Save your changes and distribute the cluster configuration file to all nodes:
[root]# ccs_tool update /etc/cluster/cluster.confThis section describes the procedure to set up a Global File System (GFS) that is shared between the cluster nodes. Verify that the high-availability Red Hat cluster is running before setting up the storage cluster. The Dell|Red Hat HA Cluster is comprised of a Red Hat Cluster Suite high-availability cluster and a Red Hat GFS storage cluster.
Configuring shared storage consists of the following steps:
For more information, see the LVM Administrator's Guide and Global File System on . The procedure for configuring a Storage Cluster is documented using both Conga and CLI tools. You can use either method, but only one needs to be completed.
[] Configuring a Storage Cluster With Conga[] Configuring a Storage Cluster With CLI Tools
[] Managing the Cluster Infrastructure
It may be necessary to start or stop the cluster infrstructure on one or more nodes at any time. This can be accomplished through the Conga user interface, or individually on each node via the cli.
[] Managing the Cluster Infrastructure with CongaThe easiest way to start and stop the cluster is using the Conga management interface. This will start and stop all cluster infrastructure daemons on all nodes simulateously.
The proper procedure for starting and stoping the cluster infrastructure from the CLI is outlined below. Note that these commands need to be executed on each node. It is best to run these commands as close to parallel as possible.
Before proceeding further, make sure all the above mentioned services are started in the order listed above.
This section describes the procedure to create and test HA cluster services on your Dell|Red Hat HA Cluster system.
The following steps provide an overview for creating resources:
After clicking on Add a Resource following the steps above
NOTE: Among the mouting options, it is crtical to mention debug because, this option makes the cluster nodes to panic and there by fence, in case there was a problem accessing the shared storage.
[] Creating a Failover Domain (Optional)A Failover Domain is a group of nodes in the cluster. By default all nodes can run any cluster service. To provide better administrative control over cluster services, Failover Domains limit which nodes are permitted to run a service or establish node preference. For more information see Configuring a Failover Domain in the Cluster Administration guide at .
[] Creating ServicesThe following steps provide an overview for creating services:
Each user that has access would also need their home directory changed to the GFS root if desired. Each node will also need to reference the same users through a central authentication mechanism such as NIS or LDAP, or by creating the same username and passwords on each node. See man vsftpd.conf for more information.
[] Example Configuration of HTTPConfiguring the Script resource as a child to GFS ensures that the file system is mounted before the Samba service attempts to start, as the Samba share will reside on the GFS file system.
You can manage cluster services from Conga by:
You can also use the CLI to manage services. Use the following command:
[root]# clusvcadmFor example, to relocate a service from node 1 to node 2, enter the following command:
[root]# clusvcadm -r service_name node2Use the command clustat to view cluster status:
[root]# clustatItem | Verified |
---|---|
Cluster and Cluster Storage | |
Red Hat Cluster Suite installed and configured | |
Nodes participating in cluster | |
Fencing configured on nodes | |
Clustered logical volume | |
Global File System | |
Services created |
Red Hat Clustering nodes use Multicast to communicate. Your switches must be configured to enable multicast addresses and support IGMP. See the Cluster Administration guide in section 2.6. Multicast Addresses on for more information, and the documentation that came with your switches.
[] Cluster StatusConga will allow you to monitor your cluster. Alternatively, you may run the command clustat from any node. For example:
[root]# clustatOther utilities that may help:
[root]# cman_tool nodesLogging: Any important messages are logged to /var/log/messages. The following is an example of loss of network connectivity on node1 which causes node2 to fence it.
Nov 28 15:37:56 node2 openais[3450]: [TOTEM] previous ring seq 24 rep 172.16.0.1The following sections describe issues you may encounter while creating the cluster initially and the possible work-around.
[] Running luci on a Cluster NodeIf you are using a cluster node also as a management node and running luci, you have to restart luci manually after the initial configuration. For example:
[root]# service luci restartluci can be started in debug mode, by changing the settings in /var/lib/luci/etc/zope.conf file. Change the debug-mode value to on and restart luci on the management node. The debug messages will be directed to /var/log/messages files after setting the debug mode.
[] Issues While Creating a Cluster InitiallyIf the following error appears when initially installing the cluster:
The following errors occurred:This error occurs when the luci server cannot communicate with the ricci agent. Verify that ricci is installed and started on each node. Ensure that the firewall has been configured correctly, and that Security-Enhanced Linux (SELinux) is not the issue. Check /var/log/audit/audit.log for details on SELinux issues.
Make sure your nodes have the latest SELinux policy with the following command:
[root]# yum update selinux-policyIf you continue to encounter errors, it may be necessary to disable SELinux. This is not recommended, and should only be used as a last resort. Disable SELinux with the command:
[root]# setenforce 0See Security and SELinux in the Deployment Guide on .
[] Configuration File IssuesConfiguration errors manifest themselves as the following error in /var/log/messages:
"AIS Executive exiting (-9)"Check for syntax errors in your /etc/cluster/cluster.conf file. This is unlikely to happen if you are using Conga to manage your cluster configuration file.
[] Logical Volume IssuesIt may be necessary to restart the clustered logical volume manager with the command:
[root]# service clvmd restartEnsure all nodes have a consistent view of the shared storage with the command partprobe or clicking reprobe storage in Conga. As a last resort, reboot all nodes, or select restart cluster in Conga.
In some cases you may need to rescan for logical volumes if you still cannot see the shared volume:
[root]# partprobe -sIf you are experiencing errors when creating the clustered logical
volume, you may need to wipe any previous labels from the virtual disk.
NOTICE: This will destroy all data on the shared storage disk!
Execute the following command from one node:
[root@node1 ~]# pvremove -ff {/dev/sdXY}Where {/dev/sdXY} is the partition intended for data. See the output of /proc/mpp to verify. For example:
[root@node1 ~]# pvremove -ff /dev/sdb1If you are using Conga, click reprobe storage, otherwise type:
[root@node1 ~]# partprobe -s /dev/sdbIf you have imaged the nodes with a cloning method, then the unique identifier (uuid) for the system logical volumes may be the same. It may be necessary to change the uuid with the commands pvchange --uuid or vgchange --uuid. For more information, see LVM Administrator's Guide on the Red Hat website at .
[] Testing Fencing MechanismsFence each node to ensure that fencing is working properly.
1. Watch the logs from node 1 with the following command:
2. Fence the node 2 by executing the following command:
3. View the logs on node1 and the console node2. Node 1 should successfully fence node2.
4. Continue to watch the messages file for status changes.
You can also use the Cluster Status tool to see the cluster view of a
node. The parameter-i 2 refreshes the tool every two seconds. For more
information on clusters see: