Chinaunix首页 | 论坛 | 博客
  • 博客访问: 321450
  • 博文数量: 25
  • 博客积分: 375
  • 博客等级: 一等列兵
  • 技术积分: 1260
  • 用 户 组: 普通用户
  • 注册时间: 2011-05-17 16:39
个人简介

喜欢IT的一个“武痴”! 喜欢追求新技术、探索技术!

文章分类
文章存档

2019年(1)

2014年(2)

2013年(11)

2012年(11)

分类: 系统运维

2012-12-25 09:15:11

一、SUN N1 GRID软件的获得

     N1 grid软件可以从SUN公司网站免费下载,下来后共分四个part,以下几个SPARC的程序文件在part1part 4种提取出来:

n1ge-6_0u8-bin-solaris-sparcv9.tar.gz  (此文件是solaris SPARC 64位的程序)

n1ge6_0u8-common.tar.gz (此文件是与架构无关的公共文件,里面有安装脚本等等)

n1ge6_0u8-doc.tar.zip  (此文件是n1ge的所有文档)

该软件也可以从sun获取,一张光盘既可

 

有兴趣朋友,可以访问我的豆丁文档:

 

二、软件的安装

首先,假设安装三个节点的网格环境,机器分别叫t4at4bsunf220,其中t4a是主控主机,t4bsunf220位执行主机(此处由于机器数量不够,没有安装slaver主控主机,也叫做隐藏主机),三台机器都可以提交作业。

首先将以上三个文件ftpt4a机器上,放在$SGE_ROOT目录下,并cd $SGE_ROOT目录下面,然后执行以下的命令,假设SGE_ROOgrid软件的根目录(设置SGE_ROOT b shell方法为SGE_ROOT=/opt/n1ge;export SGE_ROOT)

gzipd –dc $SGE_ROOT/ n1ge-6_0u8-bin-solaris-sparcv9.tar.gz   | tar xpf –

gzipd –dc $SGE_ROOT/ n1ge6_0u8-common.tar.gz   | tar xpf –

gzipd –dc $SGE_ROOT/ n1ge6_0u8-doc.tar.zip  | tar xpf –

解开后里面包括了很多脚本和程序,这个时候必须执行一个setfileperm.sh的脚本:

#  $SGE_ROOT/util/setfileperm.sh  $SGE_ROOT,结果如下:

              WARNING WARNING WARNING

                    -----------------------

We will set the the file ownership and permission to

   UserID:         0

   GroupID:        0

   In directory:   /opt/n1ge6

We will also install the following binaries as SUID-root:

   $SGE_ROOT/utilbin//rlogin

   $SGE_ROOT/utilbin//rsh

   $SGE_ROOT/utilbin//testsuidroot

   $SGE_ROOT/bin//sgepasswd

Do you want to set the file permissions (yes/no) [NO] >> yes

Verifying and setting file permissions and owner in >3rd_party<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >ckpt<

Verifying and setting file permissions and owner in >examples<

Verifying and setting file permissions and owner in >inst_sge<

Verifying and setting file permissions and owner in >install_execd<

Verifying and setting file permissions and owner in >install_qmaster<

Verifying and setting file permissions and owner in >lib<

Verifying and setting file permissions and owner in >mpi<

Verifying and setting file permissions and owner in >pvm<

Verifying and setting file permissions and owner in >qmon<

Verifying and setting file permissions and owner in >util<

Verifying and setting file permissions and owner in >utilbin<

Verifying and setting file permissions and owner in >catman<

Verifying and setting file permissions and owner in >doc<

Verifying and setting file permissions and owner in >include<

Verifying and setting file permissions and owner in >man<

 

Your file permissions were set

做完这个以后就可以安装主控主机了,以下步骤是安装的整个过程,需要注意的地方用黑体显示,一般情况下直接回车即可:

# $SGE_ROOT/install_qmaster (也可以用$SGE_ROOT/inst_sge –m来安装,inst_sge脚本能提供自动安装,只要你将配置文件写清楚就可以使用inst_sge –m来安装主控主机)

Welcome to the Grid Engine installation

---------------------------------------

Grid Engine qmaster host installation

-------------------------------------

Before you continue with the installation please read these hints:

   - Your terminal window should have a size of at least

     80x24 characters

   - The INTR character is often bound to the key Ctrl-C.

     The term >Ctrl-C< is used during the installation if you

     have the possibility to abort the installation

 

The qmaster installation procedure will take approximately 5-10 minutes.

Hit to continue >>

 

Choosing Grid Engine admin user account

---------------------------------------

You may install Grid Engine that all files are created with the user id of an

unprivileged user.

 

This will make it possible to install and run Grid Engine in directories

where user >root< has no permissions to create and write files and directories.

   - Grid Engine still has to be started by user >root<

 

   - this directory should be owned by the Grid Engine administrator

 

Do you want to install Grid Engine

under an user id other than >root< (y/n) [y] >> n

(此处需要注意的是,你可以创建一个N1GE的管理用户,建议不做,因为用普通用户来安装的话,那么N1GE就有一个限制,只能由这个用户来提交和运行作业,但是用root用户来安装就能取消这个限制)

Installing Grid Engine as user >root<

Hit to continue >>

 

Checking $SGE_ROOT directory

----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/n1ge6

If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit   to use default [/opt/n1ge6] >>

 

Your $SGE_ROOT directory: /opt/n1ge6

Hit to continue >>

 

Grid Engine TCP/IP service >sge_qmaster<

(此处需要注意的是必须将sge_qmaster 536/tcp加入/etc/services文件里面,或者在nis环境里面加入,并make nis使其生效,最好此次也将sge_execd 537/tcp也加入文件或者nis里面,因为N1GE是用RPC来调用的,如果不知道端口号,sge_qmasterd, sge_execd这些守护进程是起不来的)

----------------------------------------

There is no service >sge_qmaster< available in your >/etc/services< file

or in your NIS/NIS+ database.

You may add this service now to your services database or choose a port number.

It is recommended to add the service now. If you are using NIS/NIS+ you should

add the service at your NIS/NIS+ server and not to the local >/etc/services<

file.

 

Please add an entry in the form  sge_qmaster /tcp

to your services database and make sure to use an unused port number.

Please add the service now or press to go to entering a port number >>

Service >sge_qmaster< is now available.

Hit to continue >>

 

Grid Engine TCP/IP service >sge_execd<

--------------------------------------

Using the service

   sge_execd

for communication with Grid Engine.

Hit to continue >>

 

Grid Engine cells

-----------------

(此处需要注意的是,一般来说,N1 grid工作在两种群集环境里面,一种是单一的群集,另外一种是群集的松耦合,如果你想使用单一的群集,那么无需要指定cell的名字,用default缺省值,如果想用第二种方式,那么你就需要指定 cell名字,这里安装的是单一的群集方式)

Grid Engine supports multiple cells.

If you are not planning to run multiple Grid Engine clusters or if you don't

know yet what is a Grid Engine cell it is safe to keep the default cell name

   default

If you want to install multiple cells you can enter a cell name now.

The environment variable

   $SGE_CELL=

will be set for all further Grid Engine commands.

Enter cell name [default] >>

Using cell >default<.

Hit to continue >>

 

Grid Engine qmaster spool directory

-----------------------------------

The qmaster spool directory is the place where the qmaster daemon stores

the configuration and the state of the queuing system.

 

User >root< on this host must have read/write accessto the qmaster

spool directory.

 

If you will install shadow master hosts or if you want to be able to start

the qmaster daemon on other hosts (see the corresponding section in the

Grid Engine Installation and Administration Manual for details) the account

on the shadow master hosts also needs read/write access to this directory.

 

The following directory

[/opt/n1ge6/default/spool/qmaster]

will be used as qmaster spool directory by default!

 

Do you want to select another qmaster spool directory (y/n) [n] >> n

 

Windows Execution Host Support

------------------------------

Are you going to install Windows Execution Hosts? (y/n) [n] >> n

Verifying and setting file permissions

--------------------------------------

 

Did you install this version with >pkgadd< or did you already

verify and set the file permissions of your distribution (y/n) [y] >> y

 

We do not verify file permissions. Hit to continue >>

 

Select default Grid Engine hostname resolving method

----------------------------------------------------

Are all hosts of your cluster in one DNS domain? If this is

the case the hostnames

   >hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com<

is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y

Ignoring domainname when comparing hostnames.

Hit to continue >>

 

Making directories

------------------

creating directory: default

creating directory: default/common

creating directory: /opt/n1ge6/default/spool/qmaster

creating directory: /opt/n1ge6/default/spool/qmaster/job_scripts

Hit to continue >>

 

Setup spooling

--------------

Your SGE binaries are compiled to link the spooling libraries

during runtime (dynamically). So you can choose between Berkeley DB

spooling and Classic spooling method.

Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic

(此处需要注意的是,spool可以用两种形式来记录,一种是Berkeley DB假脱机模式,一种典型假脱机模式,如果选择Berkeley DB那么需要创建数据库,支持oraclePostgreSQL,后一种数据soalris 10自带,这里选择典型假脱机模式,所有类型的主机,最好将spool目录设置在本地,这样可以大大地提高性能)

 

Dumping bootstrapping information

Initializing spooling database

 

Hit to continue >>

 

Grid Engine group id range

--------------------------

 

When jobs are started under the control of Grid Engine an additional group id

is set on platforms which do not support jobs. This is done to provide maximum

control for Grid Engine jobs.

 

This additional UNIX group id range must be unused group id's in your system.

Each job will be assigned a unique id during the time it is running.

Therefore you need to provide a range of id's which will be assigned

dynamically for jobs.

 

The range must be big enough to provide enough numbers for the maximum number

of Grid Engine jobs running at a single moment on a single host. E.g. a range

like >20000-20100< means, that Grid Engine will use the group ids from

20000-20100 and provides a range for 100 Grid Engine jobs at the same time

on a single host.

 

You can change at any time the group id range in your cluster configuration.

 

Please enter a range >> 20000-20100

20000-201000这个值可以任意修改,视机器的性能决定)

Using >20000-20100< as gid range. Hit to continue >>

 

Grid Engine cluster configuration

---------------------------------

Please give the basic configuration parameters of your Grid Engine installation:

  

The pathname of the spool directory of the execution hosts. User >root<

must have the right to create this directory and to write into it.

Default: [/opt/n1ge6/default/spool] >>

 

Grid Engine cluster configuration (continued)

---------------------------------------------

The email address of the administrator to whom problem reports are sent.

It's is recommended to configure this parameter. You may use >none<

if you do not wish to receive administrator mail.

Please enter an email address in the form >user@foo.com<.

Default: [none] >>

 The following parameters for the cluster configuration were configured:

   execd_spool_dir        /opt/n1ge6/default/spool

   administrator_mail     none

Do you want to change the configuration parameters (y/n) [n] >> n

 

Creating local configuration

----------------------------

Creating >act_qmaster< file

Adding default complex attributes

Reading in complex attributes.

Adding default parallel environments (PE)

Reading in parallel environments:

        PE "make".

Adding SGE default usersets

Reading in usersets:

        Userset "defaultdepartment".

        Userset "deadlineusers".

Adding >sge_aliases< path aliases file

Adding >qtask< qtcsh sample default request file

Adding >sge_request< default submit options file

Creating >sgemaster< script

Creating >sgeexecd< script

Creating settings files for >.profile/.cshrc<

 

Hit to continue >>

 

qmaster/scheduler startup script

--------------------------------

We can install the startup script that will

start qmaster/scheduler at machine boot (y/n) [y] >> y

Installing startup script /etc/rc2.d/S95sgemaster and /etc/rc2.d/K03sgemaster

Hit to continue >>

 

Grid Engine qmaster and scheduler startup

-----------------------------------------

Starting qmaster and scheduler daemon. Please wait ...

   starting sge_qmaster

   starting sge_schedd

Hit to continue >>

 

Adding Grid Engine hosts

------------------------

Please now add the list of hosts, where you will later install your execution

daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may

press if the line is getting too long. Once you are finished

simply press without entering a name.

You also may prepare a file with the hostnames of the machines where you plan

to install Grid Engine. This may be convenient if you are installing Grid

Engine on many hosts.

 

Do you want to use a file which contains the list of hosts (y/n) [n] >>

Adding admin and submit hosts

-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering . You may repeat this step until you are

entering an empty list. You will see messages from Grid Engine

when the hosts are added.

 

Host(s): t4a t4b sunf220   (这里将所有的cluster里面的主机输入中间用空格分开)

adminhost "t4a" already exists

t4a added to submit host list

t4b added to administrative host list

t4b added to submit host list

sunf220 added to administrative host list

sunf220 added to submit host list

Hit to continue >>

 

Adding admin and submit hosts

-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering . You may repeat this step until you are

entering an empty list. You will see messages from Grid Engine

when the hosts are added.

Host(s):            (如果没有其他主机了,不输入任何东西直接回车即可)

Finished adding hosts. Hit to continue >>

 

If you want to use a shadow host, it is recommended to add this host

to the list of administrative hosts.

 

If you are not sure, it is also possible to add or remove hosts after the

installation with for adding and

for removing this host

 

Attention: This is not the shadow host installationprocedure.

You still have to install the shadow host separately

Do you want to add your shadow host(s) now? (y/n) [y] >> n

(此处回答n,以后可以用inst_sge –sm来安装隐藏主机)

Creating the default queue and hostgroup

-----------------------------------------------------------

root@t4a added "@allhosts" to host group list

root@t4a added "all.q" to cluster queue list

Hit to continue >>

 

 

Scheduler Tuning

----------------

(此处选择提交job后,运行job的权值选择标准,一般选Normal

The details on the different options are described in the manual.

Configurations

--------------

1) Normal

          Fixed interval scheduling, report scheduling information,

          actual + assumed load

2) High

          Fixed interval scheduling, report limited scheduling information,

          actual load

3) Max

          Immediate Scheduling, report no scheduling information,

          actual load

Enter the number of your prefered configuration and hit !

Default configuration is [1] >> 1

 

We're configuring the scheduler with >Normal< settings!

Do you agree? (y/n) [y] >> y

 

changed scheduler configuration

Using Grid Engine

-----------------

You should now enter the command:

   source /opt/n1ge6/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/n1ge6/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)

   - $SGE_CELL         (if you are using a cell other than >default<)

   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)

   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)

   - $PATH/$path       (to find the Grid Engine binaries)

   - $MANPATH          (to access the manual pages)

Hit to see where Grid Engine logs messages >>

 

Grid Engine messages

--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/n1ge6/default/spool/qmaster/messages

   Exec daemon: //messages

 

Grid Engine startup scripts

---------------------------

Grid Engine startup scripts can be found at:

   /opt/n1ge6/default/common/sgemaster (qmaster and scheduler)

   /opt/n1ge6/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

 

Your Grid Engine qmaster installation is now completed

------------------------------------------------------

Please now login to all hosts where you want to run an execution daemon

and start the execution host installation procedure.

 

If you want to run an execution daemon on this host, please do not forget

to make the execution host installation in this host as well.

 

All execution hosts must be administrative hosts during the installation.

All hosts which you added to the list of administrative hosts during this

installation procedure can now be installed.

 

You may verify your administrative hosts with the command

   # qconf -sh

and you may add new administrative hosts with the command

   # qconf -ah

Please hit >>

 

sge_qmaster successfully installed!

 

到这里就将主控主机安装完毕,可以按以下的方法来验证主控主机是否正确安装:

ps –ef |grep sge,如果正确安装的话,你肯到主控主机上的两个主要进程:

root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster

root 446 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_schedd

接下来是比较关键的,在主控主机上安装执行主机是非常方便的,如果想在一台新的机器上安装执行主机,必须按以下的方法来安装:

首先将主控主机的SGE_ROOT目录share共享出来,为了安全设置好访问权限,然后在安装执行主机的机器上mount到相应的目录上,并设置好用户环境的变量(将主控主机的$SGE_ROOT/default/common/目录下两个文件settings.csh settings.sh内容写入用户的.cshrc或者.profile文件里面),cd $SGE_ROOT目录下执行install_exec或者inst_sge –x,这样才能顺利安装执行主机,如果不这样做的话,就会得到下面的报错:

Grid Engine cells

-----------------

Please enter cell name which you used for the qmaster  

installation or press to use [default] >>

Obviously there was no qmaster installation yet!

Call >install_qmaster< on the machine which shall run the Grid Engine qmaster

就是说该脚本会去查找所指定的cell name的目录,这里是defaultcell name,如果没有,就说明没有安装主控主机,如果有就说明已经安装了主控主机。安装执行主机可以使用install_execd或者inst_sge –x

Welcome to the Grid Engine execution host installation

------------------------------------------------------

If you haven't installed the Grid Engine qmaster host yet, you must execute

this step (with >install_qmaster<) prior the execution host installation.

For a successfull installation you need a running Grid Engine qmaster. It is

also necessary that this host is an administrative host.

You can verify your current list of administrative hosts with  the command:

   # qconf -sh

You can add an administrative host with the command:

   # qconf -ah

(在安装执行主机之前,所在的主机必须被指定为管理主机,可以用qconf –sh来查询,该命令列出所有的管理主机,如果不在这个name list里面,就可以用命令qconf –ah hostname来增加)

The execution host installation will take approximately 5 minutes.

Hit to continue >>

Checking $SGE_ROOT directory

----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/n1ge6

If this directory is not correct (e.g. it may contain an automounter prefix)

enter the correct path to this directory or hit to use default [/opt/n1ge6] >>

Your $SGE_ROOT directory: /opt/n1ge6

Hit to continue >>

 

Grid Engine cells

-----------------

Please enter cell name which you used for the qmaster

installation or press to use [default] >>

Using cell: >default<

Hit to continue >>

 

Checking hostname resolving

---------------------------

This hostname is known at qmaster as an administrative host.

Hit to continue >>

Local execd spool directory configuration

-----------------------------------------

During the qmaster installation you've already entered a global

execd spool directory. This is used, if no local spool directory is configured.

Now you can enter a local spool directory for this host. Do you want to

configure a local spool directory  for this host (y/n) [n] >> y

 (此处应该为执行主机选择一个SPOOL目录,这样不需要将所有的信息存入主控主机,因为SGE_ROOT目录是mount过来的)

Please enter the local spool directory now! >> /opt/n1ge-exec/default/spool

Using local execd spool directory [/opt/n1ge-exec/default/spool]

Hit to continue >>

chown: unknown user id default

Creating local configuration

----------------------------

root@t4b added "t4b" to configuration list

Local configuration for host >t4b< created.

Hit to continue >>

execd startup script

--------------------

We can install the startup script that will

start execd at machine boot (y/n) [y] >>

Installing startup script /etc/rc2.d/S96sgeexecd and /etc/rc2.d/K02sgeexecd

Hit to continue >>

Grid Engine execution daemon startup

------------------------------------

Starting execution daemon. Please wait ...

   starting sge_execd

Hit to continue >>

Adding a queue for this host

----------------------------

We can now add a queue instance for this host:

   - it is added to the >allhosts< hostgroup

   - the queue provides 2 slot(s) for jobs in all queues

     referencing the >allhosts< hostgroup

You do not need to add this host now, but before running jobs on this host

it must be added to at least one queue.

Do you want to add a default queue instance for this host (y/n) [y] >>

root@t4b modified "@allhosts" in host group list

root@t4b modified "all.q" in cluster queue list

Hit to continue >>

 

Using Grid Engine

-----------------

You should now enter the command:

   source /opt/n1ge/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/n1ge/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)

   - $SGE_CELL         (if you are using a cell other than >default<)

   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)

   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)

   - $PATH/$path       (to find the Grid Engine binaries)

   - $MANPATH          (to access the manual pages)

Hit to see where Grid Engine logs messages >>

Grid Engine messages

--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

  /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/n1ge/default/spool/qmaster/messages

   Exec daemon: //messages

 

Grid Engine startup scripts

---------------------------

Grid Engine startup scripts can be found at:

   /opt/n1ge/default/common/sgemaster (qmaster and scheduler)

   /opt/n1ge/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

Your execution daemon installation is now completed.

 

到这里,执行主机就安装完毕,可以按以下的方法来验证执行主机的安装:

ps –ef|grep sge 如果正确安装的话,将会看到以下的执行主机的进程:

root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd

以此类推将其他执行主机安装完毕就完成了一个群集节点的安装,安装过程只,主控主机自动被注册为提交主机,可以用qconf –as hostname命令将主机设成提交主机,提交作业可以从任何一台机器进行提交,甚至是什么软件都不装的client工作站上提交。

在整个网格计算的环境里面,除了主控主机、执行主机、提交主机等等还存在大量的用户工作站,他们在自己的工作站进行设计,需要运算的时候才需要将job提交到执行主机上运行,N1 grid负责将job分配到网络里面执行主机上。这些client无需安装任何软件,只需要将主控主机的$SGE_ROOT目录mount过来即可,并设置好用户环境,在主控主机或者管理主机上将该client端加入提交主机即可qconf –as clienthostname,这样,client工作站就可以在网络的任何地方进行提交作业了!另外要注意的是,这些管理做好用automount来管理,这样的话,就不需要作明显mount SGE_ROOT了。

 

接着就可以做提交作业的测试了,可以根据安装文档里的方式进行测试,qsub提交作业,qstat来观察,n1ge提供的测试脚本就在$SGE_ROOT/examples/jobs下面,也可以自己写这些脚本,执行完的结果放在用户的home目录下面。

 

 

 

Sun N1 Grid安装文档

 

一、N1 GRID软件的获得

     N1 grid软件可以从SUN公司网站免费下载,下来后共分四个part,以下几个SPARC的程序文件在part1part 4种提取出来:

n1ge-6_0u8-bin-solaris-sparcv9.tar.gz  (此文件是solaris SPARC 64位的程序)

n1ge6_0u8-common.tar.gz (此文件是与架构无关的公共文件,里面有安装脚本等等)

n1ge6_0u8-doc.tar.zip  (此文件是n1ge的所有文档)

该软件也可以从sun获取,一张光盘既可

 

二、软件的安装

首先,假设安装三个节点的网格环境,机器分别叫t4at4bsunf220,其中t4a是主控主机,t4bsunf220位执行主机(此处由于机器数量不够,没有安装slaver主控主机,也叫做隐藏主机),三台机器都可以提交作业。

首先将以上三个文件ftpt4a机器上,放在$SGE_ROOT目录下,并cd $SGE_ROOT目录下面,然后执行以下的命令,假设SGE_ROOgrid软件的根目录(设置SGE_ROOT b shell方法为SGE_ROOT=/opt/n1ge;export SGE_ROOT)

gzipd –dc $SGE_ROOT/ n1ge-6_0u8-bin-solaris-sparcv9.tar.gz   | tar xpf –

gzipd –dc $SGE_ROOT/ n1ge6_0u8-common.tar.gz   | tar xpf –

gzipd –dc $SGE_ROOT/ n1ge6_0u8-doc.tar.zip  | tar xpf –

解开后里面包括了很多脚本和程序,这个时候必须执行一个setfileperm.sh的脚本:

#  $SGE_ROOT/util/setfileperm.sh  $SGE_ROOT,结果如下:

              WARNING WARNING WARNING

                    -----------------------

We will set the the file ownership and permission to

   UserID:         0

   GroupID:        0

   In directory:   /opt/n1ge6

We will also install the following binaries as SUID-root:

   $SGE_ROOT/utilbin//rlogin

   $SGE_ROOT/utilbin//rsh

   $SGE_ROOT/utilbin//testsuidroot

   $SGE_ROOT/bin//sgepasswd

Do you want to set the file permissions (yes/no) [NO] >> yes

Verifying and setting file permissions and owner in >3rd_party<

Verifying and setting file permissions and owner in >bin<

Verifying and setting file permissions and owner in >ckpt<

Verifying and setting file permissions and owner in >examples<

Verifying and setting file permissions and owner in >inst_sge<

Verifying and setting file permissions and owner in >install_execd<

Verifying and setting file permissions and owner in >install_qmaster<

Verifying and setting file permissions and owner in >lib<

Verifying and setting file permissions and owner in >mpi<

Verifying and setting file permissions and owner in >pvm<

Verifying and setting file permissions and owner in >qmon<

Verifying and setting file permissions and owner in >util<

Verifying and setting file permissions and owner in >utilbin<

Verifying and setting file permissions and owner in >catman<

Verifying and setting file permissions and owner in >doc<

Verifying and setting file permissions and owner in >include<

Verifying and setting file permissions and owner in >man<

 

Your file permissions were set

做完这个以后就可以安装主控主机了,以下步骤是安装的整个过程,需要注意的地方用黑体显示,一般情况下直接回车即可:

# $SGE_ROOT/install_qmaster (也可以用$SGE_ROOT/inst_sge –m来安装,inst_sge脚本能提供自动安装,只要你将配置文件写清楚就可以使用inst_sge –m来安装主控主机)

Welcome to the Grid Engine installation

---------------------------------------

Grid Engine qmaster host installation

-------------------------------------

Before you continue with the installation please read these hints:

   - Your terminal window should have a size of at least

     80x24 characters

   - The INTR character is often bound to the key Ctrl-C.

     The term >Ctrl-C< is used during the installation if you

     have the possibility to abort the installation

 

The qmaster installation procedure will take approximately 5-10 minutes.

Hit to continue >>

 

Choosing Grid Engine admin user account

---------------------------------------

You may install Grid Engine that all files are created with the user id of an

unprivileged user.

 

This will make it possible to install and run Grid Engine in directories

where user >root< has no permissions to create and write files and directories.

   - Grid Engine still has to be started by user >root<

 

   - this directory should be owned by the Grid Engine administrator

 

Do you want to install Grid Engine

under an user id other than >root< (y/n) [y] >> n

(此处需要注意的是,你可以创建一个N1GE的管理用户,建议不做,因为用普通用户来安装的话,那么N1GE就有一个限制,只能由这个用户来提交和运行作业,但是用root用户来安装就能取消这个限制)

Installing Grid Engine as user >root<

Hit to continue >>

 

Checking $SGE_ROOT directory

----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/n1ge6

If this directory is not correct (e.g. it may contain an automounter prefix) enter the correct path to this directory or hit   to use default [/opt/n1ge6] >>

 

Your $SGE_ROOT directory: /opt/n1ge6

Hit to continue >>

 

Grid Engine TCP/IP service >sge_qmaster<

(此处需要注意的是必须将sge_qmaster 536/tcp加入/etc/services文件里面,或者在nis环境里面加入,并make nis使其生效,最好此次也将sge_execd 537/tcp也加入文件或者nis里面,因为N1GE是用RPC来调用的,如果不知道端口号,sge_qmasterd, sge_execd这些守护进程是起不来的)

----------------------------------------

There is no service >sge_qmaster< available in your >/etc/services< file

or in your NIS/NIS+ database.

You may add this service now to your services database or choose a port number.

It is recommended to add the service now. If you are using NIS/NIS+ you should

add the service at your NIS/NIS+ server and not to the local >/etc/services<

file.

 

Please add an entry in the form  sge_qmaster /tcp

to your services database and make sure to use an unused port number.

Please add the service now or press to go to entering a port number >>

Service >sge_qmaster< is now available.

Hit to continue >>

 

Grid Engine TCP/IP service >sge_execd<

--------------------------------------

Using the service

   sge_execd

for communication with Grid Engine.

Hit to continue >>

 

Grid Engine cells

-----------------

(此处需要注意的是,一般来说,N1 grid工作在两种群集环境里面,一种是单一的群集,另外一种是群集的松耦合,如果你想使用单一的群集,那么无需要指定cell的名字,用default缺省值,如果想用第二种方式,那么你就需要指定 cell名字,这里安装的是单一的群集方式)

Grid Engine supports multiple cells.

If you are not planning to run multiple Grid Engine clusters or if you don't

know yet what is a Grid Engine cell it is safe to keep the default cell name

   default

If you want to install multiple cells you can enter a cell name now.

The environment variable

   $SGE_CELL=

will be set for all further Grid Engine commands.

Enter cell name [default] >>

Using cell >default<.

Hit to continue >>

 

Grid Engine qmaster spool directory

-----------------------------------

The qmaster spool directory is the place where the qmaster daemon stores

the configuration and the state of the queuing system.

 

User >root< on this host must have read/write accessto the qmaster

spool directory.

 

If you will install shadow master hosts or if you want to be able to start

the qmaster daemon on other hosts (see the corresponding section in the

Grid Engine Installation and Administration Manual for details) the account

on the shadow master hosts also needs read/write access to this directory.

 

The following directory

[/opt/n1ge6/default/spool/qmaster]

will be used as qmaster spool directory by default!

 

Do you want to select another qmaster spool directory (y/n) [n] >> n

 

Windows Execution Host Support

------------------------------

Are you going to install Windows Execution Hosts? (y/n) [n] >> n

Verifying and setting file permissions

--------------------------------------

 

Did you install this version with >pkgadd< or did you already

verify and set the file permissions of your distribution (y/n) [y] >> y

 

We do not verify file permissions. Hit to continue >>

 

Select default Grid Engine hostname resolving method

----------------------------------------------------

Are all hosts of your cluster in one DNS domain? If this is

the case the hostnames

   >hostA< and >hostA.foo.com<

would be treated as equal, because the DNS domain name >foo.com<

is ignored when comparing hostnames.

Are all hosts of your cluster in a single DNS domain (y/n) [y] >> y

Ignoring domainname when comparing hostnames.

Hit to continue >>

 

Making directories

------------------

creating directory: default

creating directory: default/common

creating directory: /opt/n1ge6/default/spool/qmaster

creating directory: /opt/n1ge6/default/spool/qmaster/job_scripts

Hit to continue >>

 

Setup spooling

--------------

Your SGE binaries are compiled to link the spooling libraries

during runtime (dynamically). So you can choose between Berkeley DB

spooling and Classic spooling method.

Please choose a spooling method (berkeleydb|classic) [berkeleydb] >> classic

(此处需要注意的是,spool可以用两种形式来记录,一种是Berkeley DB假脱机模式,一种典型假脱机模式,如果选择Berkeley DB那么需要创建数据库,支持oraclePostgreSQL,后一种数据soalris 10自带,这里选择典型假脱机模式,所有类型的主机,最好将spool目录设置在本地,这样可以大大地提高性能)

 

Dumping bootstrapping information

Initializing spooling database

 

Hit to continue >>

 

Grid Engine group id range

--------------------------

 

When jobs are started under the control of Grid Engine an additional group id

is set on platforms which do not support jobs. This is done to provide maximum

control for Grid Engine jobs.

 

This additional UNIX group id range must be unused group id's in your system.

Each job will be assigned a unique id during the time it is running.

Therefore you need to provide a range of id's which will be assigned

dynamically for jobs.

 

The range must be big enough to provide enough numbers for the maximum number

of Grid Engine jobs running at a single moment on a single host. E.g. a range

like >20000-20100< means, that Grid Engine will use the group ids from

20000-20100 and provides a range for 100 Grid Engine jobs at the same time

on a single host.

 

You can change at any time the group id range in your cluster configuration.

 

Please enter a range >> 20000-20100

20000-201000这个值可以任意修改,视机器的性能决定)

Using >20000-20100< as gid range. Hit to continue >>

 

Grid Engine cluster configuration

---------------------------------

Please give the basic configuration parameters of your Grid Engine installation:

  

The pathname of the spool directory of the execution hosts. User >root<

must have the right to create this directory and to write into it.

Default: [/opt/n1ge6/default/spool] >>

 

Grid Engine cluster configuration (continued)

---------------------------------------------

The email address of the administrator to whom problem reports are sent.

It's is recommended to configure this parameter. You may use >none<

if you do not wish to receive administrator mail.

Please enter an email address in the form >user@foo.com<.

Default: [none] >>

 The following parameters for the cluster configuration were configured:

   execd_spool_dir        /opt/n1ge6/default/spool

   administrator_mail     none

Do you want to change the configuration parameters (y/n) [n] >> n

 

Creating local configuration

----------------------------

Creating >act_qmaster< file

Adding default complex attributes

Reading in complex attributes.

Adding default parallel environments (PE)

Reading in parallel environments:

        PE "make".

Adding SGE default usersets

Reading in usersets:

        Userset "defaultdepartment".

        Userset "deadlineusers".

Adding >sge_aliases< path aliases file

Adding >qtask< qtcsh sample default request file

Adding >sge_request< default submit options file

Creating >sgemaster< script

Creating >sgeexecd< script

Creating settings files for >.profile/.cshrc<

 

Hit to continue >>

 

qmaster/scheduler startup script

--------------------------------

We can install the startup script that will

start qmaster/scheduler at machine boot (y/n) [y] >> y

Installing startup script /etc/rc2.d/S95sgemaster and /etc/rc2.d/K03sgemaster

Hit to continue >>

 

Grid Engine qmaster and scheduler startup

-----------------------------------------

Starting qmaster and scheduler daemon. Please wait ...

   starting sge_qmaster

   starting sge_schedd

Hit to continue >>

 

Adding Grid Engine hosts

------------------------

Please now add the list of hosts, where you will later install your execution

daemons. These hosts will be also added as valid submit hosts.

Please enter a blank separated list of your execution hosts. You may

press if the line is getting too long. Once you are finished

simply press without entering a name.

You also may prepare a file with the hostnames of the machines where you plan

to install Grid Engine. This may be convenient if you are installing Grid

Engine on many hosts.

 

Do you want to use a file which contains the list of hosts (y/n) [n] >>

Adding admin and submit hosts

-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering . You may repeat this step until you are

entering an empty list. You will see messages from Grid Engine

when the hosts are added.

 

Host(s): t4a t4b sunf220   (这里将所有的cluster里面的主机输入中间用空格分开)

adminhost "t4a" already exists

t4a added to submit host list

t4b added to administrative host list

t4b added to submit host list

sunf220 added to administrative host list

sunf220 added to submit host list

Hit to continue >>

 

Adding admin and submit hosts

-----------------------------

Please enter a blank seperated list of hosts.

Stop by entering . You may repeat this step until you are

entering an empty list. You will see messages from Grid Engine

when the hosts are added.

Host(s):            (如果没有其他主机了,不输入任何东西直接回车即可)

Finished adding hosts. Hit to continue >>

 

If you want to use a shadow host, it is recommended to add this host

to the list of administrative hosts.

 

If you are not sure, it is also possible to add or remove hosts after the

installation with for adding and

for removing this host

 

Attention: This is not the shadow host installationprocedure.

You still have to install the shadow host separately

Do you want to add your shadow host(s) now? (y/n) [y] >> n

(此处回答n,以后可以用inst_sge –sm来安装隐藏主机)

Creating the default queue and hostgroup

-----------------------------------------------------------

root@t4a added "@allhosts" to host group list

root@t4a added "all.q" to cluster queue list

Hit to continue >>

 

 

Scheduler Tuning

----------------

(此处选择提交job后,运行job的权值选择标准,一般选Normal

The details on the different options are described in the manual.

Configurations

--------------

1) Normal

          Fixed interval scheduling, report scheduling information,

          actual + assumed load

2) High

          Fixed interval scheduling, report limited scheduling information,

          actual load

3) Max

          Immediate Scheduling, report no scheduling information,

          actual load

Enter the number of your prefered configuration and hit !

Default configuration is [1] >> 1

 

We're configuring the scheduler with >Normal< settings!

Do you agree? (y/n) [y] >> y

 

changed scheduler configuration

Using Grid Engine

-----------------

You should now enter the command:

   source /opt/n1ge6/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/n1ge6/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)

   - $SGE_CELL         (if you are using a cell other than >default<)

   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)

   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)

   - $PATH/$path       (to find the Grid Engine binaries)

   - $MANPATH          (to access the manual pages)

Hit to see where Grid Engine logs messages >>

 

Grid Engine messages

--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

   /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/n1ge6/default/spool/qmaster/messages

   Exec daemon: //messages

 

Grid Engine startup scripts

---------------------------

Grid Engine startup scripts can be found at:

   /opt/n1ge6/default/common/sgemaster (qmaster and scheduler)

   /opt/n1ge6/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

 

Your Grid Engine qmaster installation is now completed

------------------------------------------------------

Please now login to all hosts where you want to run an execution daemon

and start the execution host installation procedure.

 

If you want to run an execution daemon on this host, please do not forget

to make the execution host installation in this host as well.

 

All execution hosts must be administrative hosts during the installation.

All hosts which you added to the list of administrative hosts during this

installation procedure can now be installed.

 

You may verify your administrative hosts with the command

   # qconf -sh

and you may add new administrative hosts with the command

   # qconf -ah

Please hit >>

 

sge_qmaster successfully installed!

 

到这里就将主控主机安装完毕,可以按以下的方法来验证主控主机是否正确安装:

ps –ef |grep sge,如果正确安装的话,你肯到主控主机上的两个主要进程:

root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster

root 446 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_schedd

接下来是比较关键的,在主控主机上安装执行主机是非常方便的,如果想在一台新的机器上安装执行主机,必须按以下的方法来安装:

首先将主控主机的SGE_ROOT目录share共享出来,为了安全设置好访问权限,然后在安装执行主机的机器上mount到相应的目录上,并设置好用户环境的变量(将主控主机的$SGE_ROOT/default/common/目录下两个文件settings.csh settings.sh内容写入用户的.cshrc或者.profile文件里面),cd $SGE_ROOT目录下执行install_exec或者inst_sge –x,这样才能顺利安装执行主机,如果不这样做的话,就会得到下面的报错:

Grid Engine cells

-----------------

Please enter cell name which you used for the qmaster  

installation or press to use [default] >>

Obviously there was no qmaster installation yet!

Call >install_qmaster< on the machine which shall run the Grid Engine qmaster

就是说该脚本会去查找所指定的cell name的目录,这里是defaultcell name,如果没有,就说明没有安装主控主机,如果有就说明已经安装了主控主机。安装执行主机可以使用install_execd或者inst_sge –x

Welcome to the Grid Engine execution host installation

------------------------------------------------------

If you haven't installed the Grid Engine qmaster host yet, you must execute

this step (with >install_qmaster<) prior the execution host installation.

For a successfull installation you need a running Grid Engine qmaster. It is

also necessary that this host is an administrative host.

You can verify your current list of administrative hosts with  the command:

   # qconf -sh

You can add an administrative host with the command:

   # qconf -ah

(在安装执行主机之前,所在的主机必须被指定为管理主机,可以用qconf –sh来查询,该命令列出所有的管理主机,如果不在这个name list里面,就可以用命令qconf –ah hostname来增加)

The execution host installation will take approximately 5 minutes.

Hit to continue >>

Checking $SGE_ROOT directory

----------------------------

The Grid Engine root directory is:

   $SGE_ROOT = /opt/n1ge6

If this directory is not correct (e.g. it may contain an automounter prefix)

enter the correct path to this directory or hit to use default [/opt/n1ge6] >>

Your $SGE_ROOT directory: /opt/n1ge6

Hit to continue >>

 

Grid Engine cells

-----------------

Please enter cell name which you used for the qmaster

installation or press to use [default] >>

Using cell: >default<

Hit to continue >>

 

Checking hostname resolving

---------------------------

This hostname is known at qmaster as an administrative host.

Hit to continue >>

Local execd spool directory configuration

-----------------------------------------

During the qmaster installation you've already entered a global

execd spool directory. This is used, if no local spool directory is configured.

Now you can enter a local spool directory for this host. Do you want to

configure a local spool directory  for this host (y/n) [n] >> y

 (此处应该为执行主机选择一个SPOOL目录,这样不需要将所有的信息存入主控主机,因为SGE_ROOT目录是mount过来的)

Please enter the local spool directory now! >> /opt/n1ge-exec/default/spool

Using local execd spool directory [/opt/n1ge-exec/default/spool]

Hit to continue >>

chown: unknown user id default

Creating local configuration

----------------------------

root@t4b added "t4b" to configuration list

Local configuration for host >t4b< created.

Hit to continue >>

execd startup script

--------------------

We can install the startup script that will

start execd at machine boot (y/n) [y] >>

Installing startup script /etc/rc2.d/S96sgeexecd and /etc/rc2.d/K02sgeexecd

Hit to continue >>

Grid Engine execution daemon startup

------------------------------------

Starting execution daemon. Please wait ...

   starting sge_execd

Hit to continue >>

Adding a queue for this host

----------------------------

We can now add a queue instance for this host:

   - it is added to the >allhosts< hostgroup

   - the queue provides 2 slot(s) for jobs in all queues

     referencing the >allhosts< hostgroup

You do not need to add this host now, but before running jobs on this host

it must be added to at least one queue.

Do you want to add a default queue instance for this host (y/n) [y] >>

root@t4b modified "@allhosts" in host group list

root@t4b modified "all.q" in cluster queue list

Hit to continue >>

 

Using Grid Engine

-----------------

You should now enter the command:

   source /opt/n1ge/default/common/settings.csh

if you are a csh/tcsh user or

   # . /opt/n1ge/default/common/settings.sh

if you are a sh/ksh user.

This will set or expand the following environment variables:

   - $SGE_ROOT         (always necessary)

   - $SGE_CELL         (if you are using a cell other than >default<)

   - $SGE_QMASTER_PORT (if you haven't added the service >sge_qmaster<)

   - $SGE_EXECD_PORT   (if you haven't added the service >sge_execd<)

   - $PATH/$path       (to find the Grid Engine binaries)

   - $MANPATH          (to access the manual pages)

Hit to see where Grid Engine logs messages >>

Grid Engine messages

--------------------

Grid Engine messages can be found at:

   /tmp/qmaster_messages (during qmaster startup)

  /tmp/execd_messages   (during execution daemon startup)

After startup the daemons log their messages in their spool directories.

   Qmaster:     /opt/n1ge/default/spool/qmaster/messages

   Exec daemon: //messages

 

Grid Engine startup scripts

---------------------------

Grid Engine startup scripts can be found at:

   /opt/n1ge/default/common/sgemaster (qmaster and scheduler)

   /opt/n1ge/default/common/sgeexecd (execd)

Do you want to see previous screen about using Grid Engine again (y/n) [n] >>

Your execution daemon installation is now completed.

 

到这里,执行主机就安装完毕,可以按以下的方法来验证执行主机的安装:

ps –ef|grep sge 如果正确安装的话,将会看到以下的执行主机的进程:

root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd

以此类推将其他执行主机安装完毕就完成了一个群集节点的安装,安装过程只,主控主机自动被注册为提交主机,可以用qconf –as hostname命令将主机设成提交主机,提交作业可以从任何一台机器进行提交,甚至是什么软件都不装的client工作站上提交。

在整个网格计算的环境里面,除了主控主机、执行主机、提交主机等等还存在大量的用户工作站,他们在自己的工作站进行设计,需要运算的时候才需要将job提交到执行主机上运行,N1 grid负责将job分配到网络里面执行主机上。这些client无需安装任何软件,只需要将主控主机的$SGE_ROOT目录mount过来即可,并设置好用户环境,在主控主机或者管理主机上将该client端加入提交主机即可qconf –as clienthostname,这样,client工作站就可以在网络的任何地方进行提交作业了!另外要注意的是,这些管理做好用automount来管理,这样的话,就不需要作明显mount SGE_ROOT了。

 

接着就可以做提交作业的测试了,可以根据安装文档里的方式进行测试,qsub提交作业,qstat来观察,n1ge提供的测试脚本就在$SGE_ROOT/examples/jobs下面,也可以自己写这些脚本,执行完的结果放在用户的home目录下面。

 

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
这是我在上传的豆丁的所有文档:
阅读(4333) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~