Chinaunix首页 | 论坛 | 博客
  • 博客访问: 299445
  • 博文数量: 83
  • 博客积分: 3193
  • 博客等级: 中校
  • 技术积分: 1679
  • 用 户 组: 普通用户
  • 注册时间: 2006-04-03 12:04
文章分类

全部博文(83)

文章存档

2013年(2)

2012年(6)

2011年(72)

2010年(2)

2009年(1)

分类:

2010-02-01 15:22:38



 
Discussion
OSLO

Contents

[]

[] What is it?

OSLO(the abbreviation of "Operating System LOader") is the code name for the new setup node project.
OSLO is designed to be a complete replacement of the old slow and buggy setup_nodes.sh and its related helper scripts.
It provides faster setup speed, a smooth setup procedure, a better command line interface, much more fault tolerance and highly modularized components.

[] What platform does it support?

Currently the supported distributions are RHEL3, RHEL4, RHEL5, CENTOS5, SLES9, SLES10, SLES11, Scientific Linux 5, Oracle Enterprise Linux 5 and a virtual vanilla distro, which is in fact SLES10.
The supported architectures are i686, x86_64 and ia64.
We are planning to , in the near future.
Note that sles9/ia64, centos5/ia64, sl5/ia64 and sles11/ia64, which are not used at all, are not supported now.

[] Where is it?

It is available on

 lts-head:/bin/oslo

and

 lts-head:/bin/setupnode

They are equivalents. So you can invoke any one of them on lts-head.

[] How to use it?

You can use it as a normal user without using 'sudo'.
Note: Using 'sudo' is OK but it dosen't give you any magic power.

The detailed usage can be obtained by executing

 oslo -h

or

 oslo --help


You can also see examples by executing

 oslo -e

or

 oslo --show-examples

[] But I want to learn about it here

OK. Here is the usage:

Usage: oslo [OPTION]...

Mandatory arguments:
-n, --nodelist=NODELIST List of nodes to be setup.
-d, --distro=DISTRO Distribution that the node to run.
-a, --arch=ARCH Architecture that the node to run.

Optional arguments:
-p, --package-dir=PKGDIR A directory that contians packages to be installed.
-k, --kernel-package=KERNELPKG Kernel package to install and boot into. If not provided, and if there's no kernel rpm specified in PKGDIR,default kernel of the distro will be used.
-s, --postsetup-scripts=SCRIPTLIST A list of scripts that will be executed on each of the nodes after setup. Note that the scripts will be executed in a manner of "First Come First Served" order. So you are responsible for ordering the list correctly.
-f, --force-install Force package installation. This option can be used when there are dependency issues or if you are installing obsolete packages.
-m, --memory-size=MEMORYSIZE Set the specific ammount of memory used by the kernel. This can be used when simulating low mem situation.
-l, --list-default-kernels List the default kernels that supported distributions use.
-u, --use-default-kernel Boot the nodes with the default kernel provided by the OS image.
-i, --install-source-debuginfo-packages Install source and debuginfo packages (if has).
-z, --xen-host Setup node(s) as XEN host - (NOT IMPLEMENTED CURRENTLY)
-o, --boot-options Specify boot options for the nodes
-O, --rpm-options Specify RPM installation options
-N, --rpminst-noscripts Use '--noscript' option when install RPMs
-D, --disable-crashdump Disable crash dump

-h, --help This message.
-e, --show-examples Show me some examples please.
-y, --yes Skip confirmation and proceed without prompt.

For those of you who need to setup node with LBATS build, the following arguments can be quite handy:
-t, --lbats-tag=TAG LBATS build tag.
-b, --branch=BRANCH Lustre branch.
-c, --patchless-client Do patchless client install.

And if you want to use lustre release packages:
-r, --lustre-release=VERSION Lustre release version

NODELIST is a list of nodes separated by comma ','. Of course, you can specify multiple '-n ' instead of a list. Also, you can use same glob pattern as in pdsh, for example, "node[1,3-5],othernode1[0-3]"
DISTRO can be one of rhel3 rhel4 rhel5 sles9 sles10.
ARCH can be one of i686 x86_64 ia64.
SCRIPTLIST is a list of scripts separated by comman ','.
MEMORYSIZE should be in the format of 'n[kKmMgG]' where n is an integer.
TAG should be the tag that you specified when submitting LBATS build request.
BRANCH should be a valid lustre branch name.
VERSION should be a valid lustre release version.

[] Show me some examples

  • setup node1,node2 with rhel4/i686 running default kernel:
 oslo -n node1,node2 -d rhel4 -a i686
  • setup node1 with sles10/x86_64 running kernel /path/to/kernel/file.rpm:
 oslo -n node1 -d sles10 -a x86_64 -k /path/to/kernel/file.rpm
  • setup node1 with sles10/x86_64 running kernel /path/to/kernel/file.rpm, force installing the rpm:
 oslo -n node1 -d sles10 -a x86_64 -k /path/to/kernel/file.rpm -f
  • setup node1 with sles10/x86_64, installing rpms in /path/to/package/dir. If there is a kernel rpm in /path/to/package/dir, boot node1 using that kernel:
 oslo -n node1 -d sles10 -a x86_64 -p /path/to/package/dir
  • setup node1 with sles10/x86_64, installing rpms in /path/to/package/dir, boot node1 with the default kernel whether or not there is a kernel rpm in /path/to/package/dir:
 oslo -n node1 -d sles10 -a x86_64 -p /path/to/package/dir -u
  • setup node1 with sles10/x86_64, installing rpms in /path/to/package/dir, and kernel /path/to/kernel/file.rpm:
 oslo -n node1 -d sles10 -a x86_64 -k /path/to/kernel/file.rpm -p /path/to/package/dir
  • setup node1,node2 with sles10/i686, tells the kernel to use only 500M memory:
 oslo -n node1 -n node2 -d sles10 -a i686 -m 500M
  • setup node1 with rhel5/i686 installing b1_6 LBATS packages tagged by 'mytag':
 oslo -n node1 -d rhel5 -a i686 -t mytag -b b1_6
  • setup node1 with rhel5/i686 installing b1_6 LBATS patchless packages tagged by 'mytag':
 oslo -n node1 -d rhel5 -a i686 -t mytag -b b1_6 -c
  • setup node1 with rhel5/i686, executing /path/to/script1,/path/to/utility1,/path/to/script2 after setup.
 oslo -n node1 -d rhel5 -a i686 -s /path/to/script1,/path/to/utility1,/path/to/script2
  • setup node1 with rhel5/i686, installing lustre 1.6.5.1 release packages.
 oslo -n node1 -d rhel5 -a i686 -r 1.6.5.1
  • This will setup node1 with rhel5/i686, add "mem=1G" to kernel boot argument list.
 oslo -n node1 -d rhel5 -a i686 -o "mem=1G"
  • This will setup node1 with sles10/x86_64, installing rpms in /path/to/package/dir using "--nodeps --force" options.
 oslo -n node1 -d sles10 -a x86_64 -p /path/to/package/dir -g "--nodeps --force"

[] Can I keep a copy of oslo?

Sure you can. But I don't recommend you to do this.
OSLO is subject to change at any time. If you keep a copy of your own, you are at the risk of using an obsolete version that may be malfunctional.
So please use the official OSLO as much as you can.

[] How long does it take to setup some nodes?

Normally it takes 2-5 minutes to complete the whole process. But sometimes it can take up to 8-10 minutes or even more.
The total setup time depends on the following factors

  • the hardwares

ia64 nodes boots much slower than the ia32 nodes. Some sfire nodes takes time to initialize its RAID controller kernel.

  • the number of packages to install

The more packages you need to install, the more time it takes.

  • the system load of OSLO server

If multiple OSLO clients are requesting nodes setting up, it surely will slow down the setup process. And when OSLO server is doing massive pool resyncing(very rare situation) the setup speed will also be affected.

  • the system configuration

If there are infiniband adapter installed on the node, it takes some time(about 1 minutes each) for the infiniband interface to be initialized.

[] Do I still need to re-setup the node after rebooting it?

Absolutely NO, unless somebody else reserved it and setup it after you did that. No re-setup is needed after rebooting.
This is not the old setup_nodes.sh. Doing re-setup on rebooting is totally a waste of your precious time and of course, a waste of server resources.

[] So what do I get after setting up the nodes?

After setting up, the nodes boots into the specified(or default) kernel and specified platform(distribution/architecture combination) with specified packages installed.
All the nodes you just set up share a common READ ONLY NFS root.
Only /etc, /var, and /tmp are writable so that you can do normal operations on each of the nodes.
Also, the following directories are mounted R/W through NFS from lts-head: /home, /testsuite, /opt/lts, /var/cache/cfs, /notbackedup.
These directories are mounted for YALA use exclusively: /export mounted R/W and /export/yala R/O.

[] But what if I want to change something on the rootfs?

If you want to change something (like installing your own packages) on the rootfs, you can do it by chrooting into "/rw" on ONE of the nodes and make the changes. "/rw" is the same rootfs that is mounted R/W.

Keep in mind that all the nodes you have set up share the same rootfs. So when you modify anything in one of the nodes /rw dir, it takes effect on all your nodes(Of course, except /etc, /var and /tmp because they are mounted as tmpfs.).

EXCEPTION: In some distributions, "/rw" is still READ ONLY due to upstream bug described in . But you can remount it as R/W when you need to. The command to remount the rootfs R/W is:

 mount -oremount,rw $(cat /proc/mounts |grep /rw|awk '{print $1}') /rw

[] Is there any non-privileged user that can be used to run MPI programs?

YES.
mpiuser is a pre-existing non-privileged user that is setup to be able to satisfy this need.
You can switch to "mpiuser" once you login as root and "su - mpiuser".

[] Why does OSLO fail with message like "nodeX is being setup by someone"?

The OSLO server tries to lock the nodes you are going to setup before it actually setup them. When it fails to lock some of the nodes, it complains using the message you are seeing.

This happens mostly because you(or someone else) is setting up nodeX using OSLO. And if you suspened OSLO by sending SIGSTOP (using 'CTRL+z', kill or whatever means you know of) to it then execute another OSLO instance to setup the same node, you'll probably see this message. Sometimes, but very rare, it happens when you(or someone else) aborted OSLO (using 'CTRL+c', kill or whatever mans you know of) but the server is doing cleanups before aborting.

So when you are seeing such message, make sure that you don't have a running/suspended OSLO that is also trying to setup the same node. If you are sure that you don't have such OSLO instance running, then probably someone else is setting up the node.

[edit] Why do I get message like "/tmp/xyz: No space left on device"?

Probably you have setup your node with low memory simulation using '-m' or '--memory-size' option.
Keep in mind that /tmp, /etc and /var are mounted as tmpfs and it uses system memory to store its data.
So when you specified a relatively low memory size for the node, you'll probably get ENOSPC while writing large files in /tmp, /etc or /var.
A workaround for this issue is to direct your application to write files to an local mounted device or an NFS based directory such as /home/yourdir or /testsuite.

[] Why, sometimes, can't I ssh to the node(s) after setup or reboot?

THIS ISSUE IS NOW FIXED
Simply put, it is caused by the mkdisklessinitrd script which is in a mess. When booting, the files needed by PAM for SSH authentication is not successfully copied into tmpfs. For detailed description of this issue, please refer to and .

You can enable SSH access to the node(s) using this command on lts-head:

 pdsh -S -w $node 'cp -a /rw/etc/pam.d/* /etc/pam.d'

[] I find some bugs

If you feel like you've found some OSLO bug, please blocking .

[] I want to add some feature to OSLO

Treat it as a bug. :-) Please refer to above section.

[] I have some advices and/or suggestions

Advices, suggestions and comments are always welcome!
Please write or talk to Wang Yibin, the OSLO author and maintainer, via email(yibin.wang@sun.com) or IRC(nickname wangyb).

[edit] View current bugs

Click please.

[] Known issues

  • sfire4 can't boot into its specified IP address due to BIOS settings. See bug .
  • sfire6 console baud rate set to 115200 rather than 9600. See bug .
  • Sometimes the node is not accessible via SSH. See bug . workarounds are 1) pdsh -S -w $node 'cp -a /rw/etc/pam.d/* /etc/pam'; or 2) reboot the node; or 3) re-setup the node.

[] Technical overview

This section aims for those who are interested in OSLO development.

[] Features

These are new features compared to the old setup_nodes.sh:

  • Modulized functions and hierarchial function calls;
  • Setupnode job scheduler daemon;
  • Automatic job dispatcher to corresponding server;
  • Server/Client model job handling;
  • Zombie(Suspended) client auto-detection;
  • Locking nodes before actually setting them up;
  • Supplier/Consumer model adaptive OS image pool;
  • On demand pool refreshing;
  • Spinlock support on various modules;
  • Fault tolerance power management, PXE configuration;
  • Flexible configurations
  • Easy setup on different distributions(Ubuntu/RedHat)
  • OS image reuse

[] Components

[] Infrastructures

  • Pristinie OS images

This is the operation system images used as 'upstream', IOW, 'pristine' images. They are the original copies that OS image pool mirrors. They are located in the oslo server, and the location can be configured in the server's profile under $OSLO_ROOT/common/profile/$SERVER_HOSTNAME
Current OS images include RHEL3/RHEL4/RHEL5/SLES9/SLES10 on i686/x86_64/ia64 architectures except SLES9/ia64.

  • NFS server

It locates in the OSLO server and the os images that client nodes use are exported through NFS server. Note that RHEL version NFS server contains a bug that prevents successfully restarting. I have fixed this and the workable nfs service can be found in $OSLO_ROOT/bin/daemons/nfs.ia64.modified

  • RPM package management

Currently OSLO only support RPM package installation. But new package management can be added.

  • Power management device

This is facility located on lts-head that do node power management. OSLO uses it to power cycle the nodes.

  • TFTP boot server

This is facility located on lts-head that enables node to do diskless boot via PXE.

[] Modules

  • LVM management

Handles logical volume operations like create, rename, mount, delete, etc.

  • NFS managementn

Handles NFS operations like export, unexport, etc.

  • Adaptive OS image pool

Maintains a pool of OS images ready for use. Provides API to OSLO server for OS image renting.

  • Package management

Deals with package installation, upgrade, deletion, etc.

  • PXE boot configuration

Deals with diskless boot through PXE.

  • Node management

Deals with node power management etc.

  • Spinlock system

Provide a common mechanism of exclusive operations upon shared objects.

[] Daemons

  • OS image pool service daemon

This daemon provides the following functionalities:

  1. Replaying(actually remounting and re-exporting) user OS images on startup or server reboot;
  2. Removing incomplete OS images due to service abnormal exit or server reboot;
  3. OS image reuse when unused OS image detected;
  4. Garbage collecting;
  5. Adaptive Pool refilling;
  6. On-demand pool refreshing;
  • Job scheduler service daemon

This daemon monitors OSLO client setup request and serve the request by dispatching request to OSLO server.

[] Utilities

  • Make diskless initrd utility
  • OSLO Job scheduler
  • OSLO server
  • OSLO client
  • OSLO installer

[] Deployment guide

The following steps is done on the box that you choose to act as OSLO server

  1. setup either a RHEL or a Ubuntu on the box;
  2. setup (optionally RAID) LVM;
  3. create pristine OS images;
  4. configure NFS/LVM to be usable;
  5. check out OSLO source from CVS(qe/oslo) to, say, /root/oslo;
  6. create a profile for your server in /root/oslo/common/profile/$(hostname). You can copy /root/oslo/common/profile/sample and edit it as needed;
  7. cd /root/oslo/bin/misc and run ./do_install_checks to check whether the server is ready to act as OSLO server. Fix any issue that is found during the check;
  8. cd /root/oslo/bin/daemons;
  9. edit setup_daemons and modify "instdir" to where you want to install oslo;
  10. execute setup_daemons and there you go!

To install a oslo client, all you need to do is to

  1. copy oslo/bin/snclient to a dir that you choose, say /bin;
  2. make sure $JOBDIR_QUEUE_ROOT is the same with $JOBDIR_ROOT in oslo/bin/snjsd. This dir should be an NFS exported dir.
Retrieved from ""
© 2008 Sun Microsystems, Inc. All rights reserved. | This page was last modified 05:51, 12 November 2009. |
阅读(1393) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~