Chinaunix首页 | 论坛 | 博客
  • 博客访问: 327934
  • 博文数量: 63
  • 博客积分: 1456
  • 博客等级: 上尉
  • 技术积分: 774
  • 用 户 组: 普通用户
  • 注册时间: 2009-04-11 12:13
个人简介

梦想的实现需要野心!

文章存档

2015年(3)

2014年(19)

2013年(8)

2011年(5)

2010年(11)

2009年(17)

我的朋友

分类: LINUX

2009-04-18 19:54:13

RHEL Configuration

From CAEFF

Contents

[]

[]

About HyperThreading on Intel Xeon CPUs

Intel's "HyperThreading" (HT) is not useful for compute clusters because floating point or integer operations are usually completed in succession on each processor, rather than a mixture of operations concurrently. Enabling HT on compute nodes could create a situation where a virtual processor (the primary feature of HT) receives a job assignment from the resource manager, resulting in sub-optimal performance.

In setting-up new cluster nodes, the BIOS is configured as follows: Load setup defaults, disable HyperThreading, disable boot logo display.

[]

Initial RHEL Installation (from CDs)

The initial installation & configuration tests performed on node25.ribosome.cluster. RHEL4 AS installed from CDs, using defaults unless otherwise specified.

Disk Druid: Created 4 Gb swap partition, followed by ext3 / partition with remaining space.

Firewall disabled, and SELinux disabled.

Package Group Selection: Deselected (removed) Test-Based Internet, Server Configuration Tools, Web Server, Windows File Server, Printing Support. Selected (added) Development Tools, Legacy Software Development, rsh-server (under Legacy Network Server), and system-config-kickstart (under Admin Tools).

System booted RHEL AS (2.6.9-42.ELsmp) without errors.

When prompted for RHLogin, selected the "Tell me why..." option, which made the "I cannot complete..." option appear (and selected that so I could proceed without logging-in).

Logged-in as root and activated the RHEL software (expect a 30-45 second delay):

rhnreg_ks  --activationkey=[KEY]
[]

Connecting to NFS Shares

As root,

mkdir /software

Edit /etc/fstab by adding the following lines:

homehost.cluster:/home   /home   nfs   defaults   0 0
apphost.cluster:/software /software nfs defaults 0 0

Verify the NFS configuration:

mount homehost.cluster:/home /home
cd /home
ls
mount apphost.cluster:/software /software
cd /software
ls
[]

Copying the Hosts File

Copied this /etc/hosts file from node1.ribosome.cluster to each new node:

127.0.0.1       localhost.localdomain   localhost ribosome

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Copying this file is needed for the DHCP hostname and dnsdomainname assignments to function.

[]

DNS Database Updates

Added lines in the DNS databases to prepare for the new ribosome nodes.

In fileserver:/etc/bind/db.192.168:

125.40           IN PTR          node25.ribosome.cluster.
126.40 IN PTR node26.ribosome.cluster.
127.40 IN PTR node27.ribosome.cluster.
128.40 IN PTR node28.ribosome.cluster.

In fileserver:/etc/bind/db.cluster:

node25.ribosome         IN A            192.168.40.125
node26.ribosome IN A 192.168.40.126
node27.ribosome IN A 192.168.40.127
node28.ribosome IN A 192.168.40.128

DHCP and Bind were restarted on fileserver as follows:

/etc/init.d/dhcp3-server restart
/etc/init.d/bind9 restart

Then, on head.ribosome.cluster, lines were added for the new nodes.

In head.ribosome.cluster:/etc/dhcp3/dhcpd.conf:

host node25.ribosome.cluster { hardware ethernet 00:15:f2:8a:33:69; fixed-address 192.168.40.125; }
host node26.ribosome.cluster { hardware ethernet 00:15:f2:80:2d:3c; fixed-address 192.168.40.126; }
host node27.ribosome.cluster { hardware ethernet 00:13:d4:99:48:d0; fixed-address 192.168.40.127; }
host node28.ribosome.cluster { hardware ethernet 00:e0:18:00:12:13; fixed-address 192.168.40.128; }

DHCP was then restarted on head.ribosome.cluster as follows:

/etc/init.d/dhcp3-server restart

At this point, each new node was rebooted.


[]

Configuring RHEL NIS Clients

As root,

authconfig

Select "Use NIS"

Domain: cluster
Server: 192.168.0.140

No reboot or daemon restart is necessary.

[]

Configuring rsh Access

The rsh-server package from the RHEL4 EMT64 AS installation CD (disc #4) must be installed, if it's not already.

As root, under Applications > System Settings > Server Settings > Services, ensure that the rsh, rlogin, rexec services are selected to start on boot.

To start these services immediately, restart the xinetd service.

Copy the /etc/hosts.equiv file from node1.ribosome.cluster to /etc on the new node before rebooting the system or restarting the xinetd service.

Ensure that the list of hosts in /etc/hosts.equiv is complete to ensure access from all appropriate hosts.

[]

Installing Ganglia (Node Configuration)

The default Ganglia installation installs files in `usr/local/bin`, `usr/local/man`, etc.

As root,

cd /root/ganglia-3.0.3
./configure
make
make check

All self-tests passed with `OK`.

make install

No problems reported.

The Ganglia installation folder contains an init.d script for starting/stopping the gmond Ganglia client daemon. In the current installation of RHEL, runlevel 5 is the default.

cp gmond.init /etc/init.d
ln -s /etc/init.d/gmond.init /etc/rc.d/rc5.d/S66gmond.init

Now, when the system is rebooted, the Ganglia client daemon is started during the boot process.

[]

Installing Torque (Node Configuration)

As root,

tar -xzvf torque-2.1.6.tar.gz
cd torque-2.1.6
./configure
make
make install

Copied the /sbin/start-stop-daemon from node1.ribosome.cluster to /sbin on the new node (this script was not installed with the installation steps listed above).

mv /var/spool/torque /var/spool/torque-1.2.0p6
chkconfig --add torquemom

Copied the following configuration files from /var/spool/torque-1.2.0p6 of node1.ribosome.cluster to the same locations on the new node:

/var/spool/torque-1.2.0p6/server_name
/var/spool/torque-1.2.0p6/mom_priv/config

Copied the /etc/init.d/torquemom from node1.ribosome.cluster to /etc/init.d of the new node.

chkconfig --add torquemom

Started the Torque client daemon:

/etc/rc.d/rc5.d/S50torquemom start
Starting Torque MOM.

Rebooted the node to verify that the torquemom daemon could start successfully following a reboot.

[]

Torque Resource Manager Updates

Added lines to /var/spool/torque-1.2.0p6/server_priv/nodes on fileserver to add the new node to the pool:

node25.ribosome.cluster np=2 ribosome
node26.ribosome.cluster np=2 ribosome
node27.ribosome.cluster np=2 ribosome
node28.ribosome.cluster np=2 ribosome

Torque will need to be restarted following these changes.

[]

Installing Intel Fortran Compiler 9.1.036

This is a test installation of the "evaluation version" of this compiler.

As root,

cd l_fc_c_9.1.036
./install.sh

Choose `1` from the menu to install the compiler.

Enter serial number: [SERIAL] (the installer will validate the serial online)

Choose `2` for a custom installation.

Choose `1` to install the compiler.

`accept` the license agreement.

Install in /usr/local/intel/fc91

The installer briefly tests the installation when finished (the test was passed successfully).

Set the environment variables to use the compiler (current session only):

PATH=$PATH:/usr/local/intel/fc91/bin;export PATH
LD_LIBRARY_PATH=/usr/local/intel/fc91/lib;export LD_LIBRARY_PATH
[]

Installing Intel C++ Compiler 9.1.043

This is a test installation of the "evaluation version" of this compiler.

As root,

cd l_cc_c_9.1.043
./install.sh

Choose `1` from the menu to install the compiler.

Enter serial number: [SERIAL] (the installer will validate the serial online)

Choose `2` for a custom installation.

Choose `1` to install the compiler.

`accept` the license agreement.

Install in /usr/local/intel/cc91

The installer briefly tests the installation when finished (the test was passed successfully).

Set the environment variables to use the compiler (current session only):

PATH=$PATH:/usr/local/intel/cc91/bin;export PATH
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/intel/cc91/lib;export LD_LIBRARY_PATH
[]

Compiling CHARMM c32b2

As root, with the PATH and LD_LIBRARY_PATH environment variables set as described in the compiler installation procedures (above),

mv c32b2 /usr/local
cd /usr/local/c32b2
./install.com gnu xxlarge ifort

No errors reported. The CHARMM executable is /usr/local/c32b2/exec/gnu/charmm.

[]

Installing SystemImager Client

As root,

wget http://download.systemimager.org/pub/sis-install/install
chmod +x install
./install --list
./install --verbose systemimager-client

Problem:

./install --verbose systemimager-client
Using pre-existing package list: /tmp/sis-packages/stable.list
Downloading: http://install.sisuite.org/sourceforge/systemimager/systemimager-client-3.6.3-1.noarch.rpm...done!
Downloading: http://install.sisuite.org/sourceforge/systemimager/systemimager-common-3.6.3-1.noarch.rpm...done!
Downloading: !
Downloading: !
error: open of perl-AppConfig-1.52-4.noarch.rpm failed: No such file or directory
rpm -Uhv systemimager-client-3.6.3-1.noarch.rpm systemimager-common-3.6.3-1.noarch.rpm systemconfigurator-2.2.2-1.noarch.rpm perl-AppConfig-1.52-4.noarch.rpm
error: open of perl-AppConfig-1.52-4.noarch.rpm failed: No such file or directory

Solution:

Find the failed download.

updatedb
find perl-App*
/tmp/sis-packages/perl-AppConfig-1.52-4.noarch.rpm?download&failedmirror=belnet.dl.sourceforge.net

Manually download perl-AppConfig-1.52-4.noarch.rpm from and copy it to the location of the failed download.

As root,

cp perl-AppConfig-1.52-4.noarch.rpm /tmp/sis-packages
./install --verbose systemimager-client
Using pre-existing package list: /tmp/sis-packages/stable.list
Checking integrity of systemimager-client-3.6.3-1.noarch.rpm: md5 OK
Checking integrity of systemimager-common-3.6.3-1.noarch.rpm: md5 OK
Checking integrity of systemconfigurator-2.2.2-1.noarch.rpm: sha1 md5 OK
Checking integrity of perl-AppConfig-1.52-4.noarch.rpm: md5 OK
rpm -Uhv systemimager-client-3.6.3-1.noarch.rpm systemimager-common-3.6.3-1.noarch.rpm systemconfigurator-2.2.2-1.noarch.rpm perl-AppConfig-1.52-4.noarch.rpm
Preparing... ########################################### [100%]
1:perl-AppConfig ########################################### [ 25%]
2:systemconfigurator ########################################### [ 50%]
3:systemimager-common ########################################### [ 75%]
4:systemimager-client ########################################### [100%]
The System Installation Suite packages you've chosen are now installed!

This version of SystemImager (3.6.3) uses a different naming scheme for its scripts than previous versions, so the names provided in the older (3.1) documentation are not always correct. For example, the prepareclient script has been renamed to si_prepareclient.

slocate -u
slocate prepareclient
/usr/sbin/si_prepareclient
/usr/sbin/si_prepareclient --server head.ribosome.cluster

Answered "y" to having /etc/services and /tmp/rsyncd.conf.20761 modified and the /etc/systemimager directory created.

Error:

Using "sfdisk" to gather information about disk:
/dev/sda
Use of uninitialized value in hash element at /usr/lib/systemimager/perl/SystemImager/Common.pm line 1042, line 7.
rsync: link_stat "/usr/share/systemimager/boot/i386/standard/initrd_template/." failed: No such file or directory (2)
rsync error: some files could not be transferred (code 23) at main.c(702)
Couldn't rsync -a /usr/share/systemimager/boot/i386/standard/initrd_template/ /tmp/.systemimager.1/. at /usr/lib/systemimager/perl/SystemImager/UseYourOwnKernel.pm line 58.

As root on head.ribosome.cluster,

/usr/sbin/getimage --quiet --image node25.ribosome.2006.10.20 --golden-client node25.ribosome.cluster

Error:

rsync: failed to connect to node25.ribosome.cluster: Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(99)
Failed to retrieve /etc/systemimager/mounted_filesystems from node25.ribosome.cluster.
getimage: Have you run "prepareclient" on node25.ribosome.cluster?

Giving up for now, to revisit this issue later. Proceeding with G4U cloning, instead.

[]

Cloning Nodes Using G4U

There is no need to open the chassis of any node system in order to do this, but these systems do not have hot-swap hard drives, so they must be shutdown before installing or removing a drive.

Target and original drives must be of same make, size, and model.

Install target (blank/expendable) drive in secondary HD bay (left side, on Atipa i1002) of the node running the original image to be cloned.

Boot the original system to the G4U CD-ROM.

Copy the contents of the original drive to the target drive (here, drive order is critical). This can take about 1 hour, depending on drive size and speed.

copydisk wd0 wd1

Reinstall the target drive in its chassis and boot the machine.

On boot, this machine will not recognize its MAC addresses, so the Kudzu hardware detection tool will interrupt the boot process and permit you to delete the configurations of the 2 missing MAC addresses (do this), then configure both "new" MAC addresses for DHCP.

Allow the system to boot normally. No other changes are required.

If there are minor hardware differences between the new node and the original node, use Kudzu to remove the "missing" hardware and configure the "new" hardware.

阅读(1335) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~