linux学习记录
分类:
2008-08-20 10:06:45
|
This chapter provides a summary tutorial describing some of the high points of using LAM/MPI. It is not intended as a comprehensive guide; the finer details of some situations will not be explained. However, it is a good step-by-step guide for users who are new to MPI and/or LAM/MPI.
Using LAM/MPI is conceptually simple:
• Launch the LAM run-time environment (RTE)
• Repeat as necessary:
– Compile MPI program(s)
– Run MPI program(s)
• Shut down the LAM run-time environment
The tutorial below will describe each of these steps.
This section describes actions that usually only need to be performed once per user in order to setup LAM to function properly.
One of the main requirements for LAM/MPI to function properly is for the LAM executables to be in your path. This step may vary from site to site; for example, the LAM executables may already be in your path – consult your local administrator to see if this is the case.
NOTE: If the LAM executables are already in your path, you can skip this step and proceed to Section 4.2.
In many cases, if your system does not already provide the LAM executables in your path, you can add them by editing your “dot” files that are executed automatically by the shell upon login (both interactive and non-interactive logins). Each shell has a different file to edit and corresponding syntax, so you’ll need to know which shell you are using. Tables 4.1 and 4.2 list several common shells and the associated files that are typically used. Consult the documentation for your shell for more information.
Shell name |
Interactive login startup file |
sh (or Bash named “sh”) |
.profile |
csh |
.cshrc followed by .login |
tcsh |
.tcshrc if it exists, .cshrc if it does not, followed by .login |
bash |
.bash profile if it exists, or .bash login if it exists, or .profile if it exists (in that order). Note that some Linux distributions automatically come with .bash profile scripts for users that automatically execute .bashrc as well. Consult the bash manual page for more information. |
Table 4.1: List of common shells and the corresponding environmental setup files commonly used with each for interactive startups (e.g., normal login). All files listed are assumed to be in the $HOME directory.
Shell name |
Non-interactive login startup file |
sh (or Bash named “sh”) |
This shell does not execute any file automatically, so LAM will execute the .profile script before invoking LAM executables on remote nodes |
csh |
.cshrc |
tcsh |
.tcshrc if it exists, .cshrc if it does not |
bash |
.bashrc if it exists |
Table 4.2: List of common shells and the corresponding environmental setup files commonly used with each for non-interactive startups (e.g., normal login). All files listed are assumed to be in the $HOME directory.
You’ll also need to know the directory where LAM was installed. For the purposes of this tutorial, we’ll assume that LAM is installed in /usr/local/lam. And to re-emphasize a critical point: these are only guidelines – the specifics may vary depending on your local setup. Consult your local system or network administrator for more details.
Once you have determined all three pieces of information (what shell you are using, what directory LAM was installed to, and what the appropriate “dot” file to edit), open the “dot” file in a text editor and follow the general directions listed below:
• For the Bash, Bourne, and Bourne-related shells, add the following lines:
PATH=/usr/local/lam/bin:$PATH export PATH |
• For the C shell and related shells (such as tcsh), add the following line:
set path = (/usr/local/lam/bin $path) |
LAM includes manual pages for all supported MPI functions as well as all of the LAM executables. While this step is not necessary for correct MPI functionality, it can be helpful when looking for MPI or LAMspecific information.
Using Tables 4.1 and 4.2, find the right “dot” file to edit. Assuming again that LAM was installed to /usr/local/lam, open the appropriate “dot” file in a text editor and follow the general directions listed below:
• For the Bash, Bourne, and Bourne-related shells, add the following lines:
MANPATH=/usr/local/lam/man:$MANPATH export MANPATH |
• For the C shell and related shells (such as tcsh), add the following lines:
if ($?MANPATH == 0) then setenv MANPATH /usr/local/lam/man else setenv MANPATH /usr/local/lam/man:$MANPATH endif |
LAM/MPI is built around a core of System Services Interface (SSI) plugin modules. SSI allows run-time selection of different underlying services within the LAM/MPI run-time environment, including tunable parameters that can affect the performance of MPI programs.
While this tutorial won’t go into much detail about SSI, just be aware that you’ll see mention of “SSI” in the text below. In a few places, the tutorial passes parameters to various SSI modules through either environment variables and/or the -ssi command line parameter to several LAM commands.
See other sections in this manual for a more complete description of SSI (Chapter 6, page 43), how it works, and what run-time parameters are available (Chapters 8 and 9, pages 65 and 75, respectively). Also, the lamssi(7), lamssi boot(7), lamssi coll(7), lamssi cr(7), and lamssi rpi(7) manual pages each provide additional information on LAM’s SSI mechanisms.
LAM/MPI can be installed with a large number of configuration options. It depends on what choices your system/network administrator made when configuring and installing LAM/MPI. The laminfo command is provided to show the end-user with information about what the installed LAM/MPI supports. Running “laminfo” (with no arguments) prints a list of LAM’s capabilities, including all of its SSI modules.
Among other things, this shows what language bindings the installed LAM/MPI supports, what underlying network transports it supports, and what directory LAM was installed to. The -parsable option prints out all the same information, but in a conveniently machine-parsable format (suitable for using with scripts).
Before any MPI programs can be executed, the LAM run-time environment must be launched. This is typically called “booting LAM.” A successfully boot process creates an instance of the LAM run-time environment commonly referred to as the “LAM universe.”
LAM’s run-time environment can be executed in many different environments. For example, it can be run interactively on a cluster of workstations (even on a single workstation, perhaps to simulate parallel execution for debugging and/or development). Or LAM can be run in production batch scheduled systems.
This example will focus on a traditional rsh / ssh-style workstation cluster (i.e., not under batch systems), where rsh or ssh is used to launch executables on remote workstations.
当使用rsh 或 ssh来启动LAM时,你需要一个记录主机的文本文件来加载LAM run-time环境。该文件通常被指定为“boot schema”,“hostfile”,或“machinefile”。例如:
# My boot schema node1.cluster.example.com node2.cluster.example.com node3.cluster.example.com cpu=2 node4.cluster.example.com cpu=2 |
Four nodes are
specified in the above example by listing their IP hostnames. Note also the
“cpu=
The location of this text file is irrelevant; for the purposes of this example, we’ll assume that it is named hostfile and is located in the current working directory.
The lamboot command is used to launch the LAM run-time environment. For each machine listed in the boot schema, the following conditions must be met for LAM’s run-time environment to be booted correctly:
l The machine must be reachable and operational.
l The user must be able to non-interactively execute arbitrary commands on the machine (e.g., without being prompted for a password).
l The LAM executables must be locatable on that machine, using the user’s shell search path.
l The user must be able to write to the LAM session directory (usually somewhere under /tmp).
l The shell’s start-up scripts must not print anything on standard error.
l All machines must be able to resolve the fully-qualified domain name (FQDN) of all the machines being booted (including itself).
Once all of these conditions are met, the lamboot command is used to launch the LAM run-time environment. For example:
shell$ lamboot −v −ssi boot rsh hostfile LAM 7.0/MPI n0<1234> ssi:boot:base:linear: booting n0 (node1.cluster.example.com) n0<1234> ssi:boot:base:linear: booting n1 (node2.cluster.example.com) n0<1234> ssi:boot:base:linear: booting n2 (node3.cluster.example.com) n0<1234> ssi:boot:base:linear: booting n3 (node4.cluster.example.com) n0<1234> ssi:boot:base:linear: finished |
The parameters passed to lamboot in the example above are as follows:
l -v: Make lamboot be slightly verbose.
l -ssi boot rsh: Ensure that LAM uses the rsh/ssh boot module to boot the LAM universe. Typically, LAM chooses the right boot module automatically (and therefore this parameter is not typically necessary), but to ensure that this tutorial does exactly what we want it to do, we use this parameter to absolutely ensure that LAM uses rsh or ssh to boot the universe.
l hostfile: Name of the boot schema file.
Common causes of failure with the lamboot command include (but are not limited to):
l User does not have permission to execute on the remote node. This typically involves setting up a $HOME/.rhosts file (if using rsh), or properly configured SSH keys (using using ssh). Setting up .rhosts and/or SSH keys for password-less remote logins are beyond the scope of this tutorial; consult local documentation for rsh and ssh, and/or internet tutorials on setting up SSH keys.
l The first time a user uses ssh to execute on a remote node, ssh typically prints a warning to the standard error. LAM will interpret this as a failure. If this happens, lamboot will complain that something unexpectedly appeared on stderr, and abort. One solution is to manually ssh to each node in the boot schema once in order to eliminate the stderr warning, and then try lamboot again. Another is to use the boot rsh ignore stderr SSI parameter. We haven’t discussed SSI parameters yet, so it is probably easiest at this point to manually ssh to a small number of nodes to get the warning out of the way.
If you have having problems with lamboot, try using the -d option to lamboot, which will print enormous amounts of debugging output which can be helpful for determining what the problem is. Additionally, check the lamboot(1) man page as well as the LAM FAQ on the main LAM web site2 under the section “Booting LAM” for more information.
An easy way to see how many nodes and CPUs
are in the current LAM universe is with the lamnodes command. For example, with
the LAM universe that was created from the boot schema in Section
shell$ lamnodes n0 node1.cluster.example.com:1:origin,this node n1 node2.cluster.example.com:1: n2 node3.cluster.example.com:2: n3 node4.cluster.example.com:2: |
The “n” number on the far left is the LAM
node number. For example, “n
Finally, the “origin” notation indicates which node lamboot was executed from. “this node” obviously indicates which node lamnodes is running on.
Note that it is not necessary to have LAM booted to compile MPI programs.
Compiling MPI programs can be a complicated process编译MPI程序可能是一个比较复杂的过程。
l The same compilers should be used to compile/link user MPI programs as were used to compile LAM itself. 编译LAM所用的编译器应当与编译/链接用户MPI程序所用的编译器相同。
l Depending on the specific installation configuration of LAM, a variety of -I, -L, and -l flags (and possibly others) may be necessary to compile and/or link a user MPI program.
l
LAM/MPI provides “wrapper” compilers to hide all of this complexity. These wrapper compilers simply add the correct compiler/linker flags and then invoke the underlying compiler to actually perform the compilation/link. As such, LAM’s wrapper compilers can be used just like “real” compilers.
The wrapper compilers are named mpicc (for C programs), mpiCC and mpic++ (for C++ programs), and mpif77 (for Fortran programs). For example:
shell$ mpicc −g −c foo.c shell$ mpicc −g −c bar.c shell$ mpicc −g foo.o bar.o −o my mpi program |
Note that no additional compiler and linker flags are required for correct MPI compilation or linking. The resulting my mpi program is ready to run in the LAM run-time environment. Similarly, the other two wrapper compilers can be used to compile MPI programs for their respective languages:
shell$ mpiCC −O c++ program.cc −o my c++ mpi program shell$ mpif77 −O f77 program.f −o my f77 mpi program |
Note, too, that any other compiler/linker flags can be passed through the wrapper compilers (such as –g and -O); they will simply be passed to the back-end compiler.
Finally, note that giving the -showme option to any of the wrapper compilers will show both the name of the back-end compiler that will be invoked, and also all the command line options that would have been passed for a given compile command. For example (line breaks added to fit in the documentation):
shell$ mpiCC −O c++ program.cc −o my c++ program −showme g++ −I/usr/local/lam/include −pthread −O c++ program.cc −o \ my c++ program −L/usr/local/lam/lib −llammpio −llammpi++ −lpmpi \ −llamf77mpi −lmpi −llam −lutil −pthread |
Note that the wrapper compilers only add all the LAM/MPI-specific flags when a command-line argument that does not begin with a dash (“-”) is present. For example:
shell$ mpicc gcc: no input files shell$ mpicc −−version gcc (GCC) Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
如下是一个简单的C语言“hello world”程序。
#include #include int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf(‘‘Hello, world! I am %d of %d\n’’, rank, size); MPI_Finalize(); return 0; } |
可以将该程序保存为一个文本文件,并使用mpicc wrapper编译器进行编译。
shell$ mpicc hello.c −o hello
如下是一个简单的C++语言“hello world”程序。
#include #include using namespace std; int main(int argc, char *argv[]) { int rank, size; MPI::Init(argc, argv); rank = MPI::COMM WORLD.Get rank(); size = MPI::COMM WORLD.Get size(); cout << ‘‘Hello, world! I am ’’ << rank << ‘‘ of ’’ << size << endl; MPI::Finalize(); return 0; } |
可以将该程序保存为一个文本文件,并使用mpiCC wrapper编译器(在如Mac OS X’s HFS+之类的 case-insensitive 文件系统上使用mpic++)进行编译。
shell$ mpiCC hello.cc −o hello
如下是一个简单的Fortran语言“hello world”程序。
program hello include ’mpif.h’ integer rank, size, ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) print *, ”Hello, world! I am ”, rank, ” of ”, size call MPI_FINALIZE(ierr) stop end |
可以将该程序保存为一个文本文件,并使用mpif77 wrapper编译器进行编译。
shell$ mpif77 hello.f −o hello
一旦你成功确定了一个LAM universe并编译了一个MPI程序,你就可以并行地运行MPI程序了。
在本节中,我们将介绍如何运行一个Single Program, Multiple Data (SPMD) 程序。特别地,我们将并行地运行上一节中的hello程序。命令mpirun和mpiexec用来加载并行的MPI 程序,命令mpitask用来提供原始的调试支持。命令lamclean用来清除失败的MPI程序(比如发生了一个错误)。
mpirun命令可以使用不同的选项来控制一个程序的并行运行。在这里我们只对其中一些进行解释。
最简单的方法是:
The simplest way to launch the hello program across all CPUs listed in the boot schema is:
shell$ mpirun C hello
The C option means “launch one copy of hello on every CPU that was listed in the boot schema.” The C notation is therefore convenient shorthand notation for launching a set of processes across a group of SMPs.
选项C表示“launch one copy of hello on every CPU that was listed in the boot schema”。符号C是通过一组SMP加载一系列进程的简化符号。
另一种并行运行的方法是:
shell$ mpirun N hello
选项N的含义与C不同——它表示“launch one copy of hello on every node in the LAM universe”。因此,N忽略了CPU数目。这在多线程MPI程序中是很有效的。
最后,运行一个固定数目的进程(不管在LAM universe中有多少CPU或节点):
shell$ mpirun −np 4 hello
This runs 4 copies of hello. LAM will “schedule” how many copies of hello will be run in a roundrobin fashion on each node by how many CPUs were listed in the boot schema file.3 For example, on the LAM universe that we have previously shown in this tutorial, the following would be launched:
l 1 hello would be launched on n0 (named node1)
l 1 hello would be launched on n1 (named node2)
l 2 hellos would be launched on n2 (named node3)
Note that any number can be used – if a number is used that is greater than how many CPUs are in the LAM universe, LAM will “wrap around” and start scheduling starting with the first node again. For example, using -np 10 would result in the following schedule:
l 2 hellos on n0 (1 from the first pass, and then a second from the “wrap around”)
l 2 hellos on n1 (1 from the first pass, and then a second from the “wrap around”)
l 4 hellos on n2 (2 from the first pass, and then 2 more from the “wrap around”)
l 2 hellos on n3
The mpirun(1) man page contains much more information and mpirun and the options available. For example, mpirun also supportsMultiple Program, Multiple Data (MPMD) programs, although it is not discussed here. Also see Section 7.14 (page 60) in this document.
The MPI-2 standard recommends the use of mpiexec for portable MPI process startup. In LAM/MPI, mpiexec is functionally similar to mpirun. Some options that are available to mpirun are not available to mpiexec, and vice-versa. The end result is typically the same, however – both will launch parallel MPI programs; which you should use is likely simply a personal choice.
也就是说,mpiexec在三方面提供了更加便利的访问:
l 运行MPMD程序
l 运行heterogeneous程序
l 运行 “one-shot” MPI 程序 (i.e., boot LAM, run the program, then halt LAM)
mpiexec 的一般语法如下:
shell$ mpiexec
运行 MPMD 程序
For example, to run a manager/worker parallel program, where two different executables need to be launched (i.e., manager and worker, the following can be used:
shell$ mpiexec −n 1 manager : worker |
This runs one copy of manager and one copy of worker for every CPU in the LAM universe.
运行 Heterogeneous 程序
Since LAM is a heterogeneous MPI implementation, it supports running heterogeneous MPI programs. For example, this allows running a parallel job that spans a Sun SPARC machine and an IA-32 Linux machine (even though they are opposite endian machines). Although this can be somewhat complicated to setup (remember that you will first need to lamboot successfully, which essentially means that LAM must be correctly installed on both architectures), the mpiexec command can be helpful in actually running the resulting MPI job.
Note that you will need to have two MPI executables – one compiled for Solaris (e.g., hello.solaris) and one compiled for Linux (e.g., hello.linux). Assuming that these executables both reside in the same directory, and that directory is available on both nodes (or the executables can be found in the PATH on their respective machines), the following command can be used:
shell$ mpiexec −arch solaris hello.solaris : −arch linux hello.linux |
This runs the hello.solaris command on all nodes in the LAM universe that have the string “solaris” anywhere in their architecture string, and hello.linux on all nodes that have “linux” in their architecture string. The architecture string of a given LAM installation can be found by running the laminfo command.
运行“One-Shot” MPI 程序
In some cases, it seems like extra work to boot a LAM universe, run a single MPI job, and then shut down the universe. Batch jobs are good examples of this – since only one job is going to be run, why does it take three commands? mpiexec provides a convenient way to run “one-shot” MPI jobs.
shell$ mpiexec −machinefile hostfile hello |
This will invoke lamboot with the boot schema named “hostfile”, run the MPI program hello on all available CPUs in the resulting universe, and then shut down the universe with the lamhalt command (which we’ll discuss in Section 4.7, below).
命令mpitask 与 sequential Unix 命令ps相似。 It shows the current status of the MPI program(s) being executed in the LAM universe, and displays primitive information about what MPI function each process is currently executing (if any). Note that in normal practice, the mpimsg command only gives a snapshot of what messages are flowing between MPI processes, and therefore is usually only accurate at that single point in time. To really debug message passing traffic, use a tool such as message passing analyzer (e.g., XMPI), or a parallel debugger (e.g., TotalView).
mpitask 可以在LAM universe的任何节点上运行。
命令lamclean将所有在LAM universe中运行的程序完全清除。如果一个并行任务失败了并且在LAM run-time环境(e.g., MPI-2 published names)中留下了状态,那么使用该命令将十分有效。通常该命令不带任何参数就可以运行:
shell$ lamclean
lamclean is typically only necessary when developing / debugging MPI applications – i.e., programs that hang, messages that are left around, etc. Correct MPI programs should terminate properly, clean up all their messages, unpublish MPI-2 names, etc.
When finished with the LAM universe, it should be shut down with the lamhalt command:
shell$ lamhalt |
In most cases, this is sufficient to kill all running MPI processes and shut down the LAM universe.
However, in some rare conditions, lamhalt may fail. For example, if any of the nodes in the LAM universe crashed before running lamhalt, lamhalt will likely timeout and potentially not kill the entire LAM universe. In this case, you will need to use the lamwipe command to guarantee that the LAM universe has shut down properly:
shell$ lamwipe −v hostfile |
where hostfile is the same boot schema that was used to boot LAM (i.e., all the same nodes are listed). lamwipe will forcibly kill all LAM/MPI processes and terminate the LAM universe. This is a slower process than lamhalt, and is typically not necessary.