TORQUE/Maui Cluster
TORQUE stands for Terascale Open-Source Resource and QUEue Manager. It is an Open Source distributed resource manager originally based on OpenPBS,
the Portable Batch System (PBS). Our
installation is used for running parallel
jobs or making use of dedicated
reservations. We use a separate program
called Maui
for scheduling jobs in TORQUE.
TORQUE and Maui are installed at /opt/UMtorque and /opt/UMmaui respectively. Please make sure /opt/UMtorque/bin and /opt/UMmaui/bin are added in your PATH environment variable.
The cluster is composed of a frontend (submit nodes) and
compute nodes. The frontend is to be used for editing
your code, compiling and submitting. To run any processing
or testing of your code, you must submit it through the
scheduler.
The scheduler takes care of assigning compute nodes
to jobs. Basically, when you get assigned a node, you will
be the only person on it for the duration of your job. After
your timelimit is up or your process ends, the node will
be cleaned and locked down for the next submission.
- Logging in
To gain access to any of the nodes on a cluster, you will first need
to log into submitnodes using
ssh. This machine acts as a gateway to the rest of the
cluster. No intensive processing is to be run
on submit nodes. Submit nodes are shared with every
other person in the cluster and in various research projects
throughout the institute. If you run an intensive process
on submit nodes, it will be killed so other research will
not be affected.
More information about UMIACS cluster submit nodes and compute nodes is here
- Setting up your environment
After you are logged in, you will have to set your
account up to allow pbs to access from any of the
compute nodes. This is required since pbs will write
the stdout and stderr to files in your account. Use
ssh-keygen with no password to create a keypair that can be used to grant
access for your jobs. These can be generated by running the following:
cd $HOME
ssh-keygen -t rsa1 -N "" -f $HOME/.ssh/identity
ssh-keygen -t rsa -N "" -f $HOME/.ssh/id_rsa
ssh-keygen -t dsa -N "" -f $HOME/.ssh/id_dsa
cd .ssh
touch authorized_keys authorized_keys2
cat identity.pub >> authorized_keys
cat id_rsa.pub id_dsa.pub >> authorized_keys2
chmod 640 authorized_keys authorized_keys2
To test your keys, you should be able to 'ssh submitnode' and be returned to a prompt.
- Requesting interactive usage
Sometimes you will want to test an intensive program without
preparing a submission script and going through the hassle of the scheduler.
You can run '/opt/UMtorque/bin/qsub -I' to request interactive usage on a node. After running
qsub -I your shell will hang until a resource can be allocated to you. When
the resource has been allocated, it will open up a new shell on the
allocated node. You can now ssh into the node for the duration of the
allocated shell. When you logout from the initial shell, or your timelimit
is up, the node will again be locked down and you will have to ask the
scheduler for access again.
Below is an example to get a interactve session:
[xhe@opensub01 24] qsub -I
qsub: waiting for job 152.opensrv.umiacs.umd.edu to start
qsub: job 152.opensrv.umiacs.umd.edu ready
[xhe@openlab00 21] echo hello
hello
[xhe@openlab00 22] exit
logout
qsub: job 152.opensrv.umiacs.umd.edu completed
[xhe@opensub01 25]
- Running your first job
We will walk through a simple
'hello world' submission script will help you understand
how submitting jobs works.
- Create a submission file
In your home directory on a submit node, create a file
called test.sh that contains the following:
#!/bin/bash
#PBS -lwalltime=10:00
#PBS -lnodes=3
echo hello world
hostname
echo finding each node I have access to
for node in `cat ${PBS_NODEFILE}` ; do
echo ----------
/usr/bin/ssh $node hostname
echo ----------
done
The script is a normal shell script except that
it includes extra #PBS directives. These directives
control how you request resources on the cluster.
In this case we are requesting 10 minutes of total
node time split across 3 nodes. Each node will be
given 3:33 minutes of access to you. Often times
people will forget to specify walltime for jobs
over 2 nodes. The default walltime is 48hrs/node,
so requesting 3 nodes will try to schedule 144 hours
of cluster time which exceeds the maximum allowed.
- submit the job to the scheduler using /opt/UMtorque/bin/qsub
[xhe@opensub00 28]$ /opt/UMtorque/bin/qsub test.sh
123.opensrv.umiacs.umd.edu
You can check the status of your job by running
/opt/UMtorque/bin/qstat
[xhe@opensub00 29]$ /opt/UMtorque/bin/qstat -n
opensrv.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
123.opensrv.umiacs.u xhe dque test.sh -- 3 -- -- 48:00 R --
openlab00/0+openlab01/0+openlab02/0
[opensub00 30]
This shows us that the job is running 'R' and is
using nodes openlab00, openlab01 and openlab02. A 'Q' for
status means that your job is waiting in line for
resources to free up. If you requested too many
resources, your job will sit in queue until the
end of time.
- check output
When your job is finished, you will have two files
in the directory you submitted the job from. They
contain stdout (.oJOBID) and stderr (.eJOBID)
The job we submitted above generated an empty error
file test.sh.e123 and the following
stdout file:
[xhe@opensub00 30]$ cat test.sh.o123
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
hello world
openlab00.umiacs.umd.edu
finding each node I have access to
----------
openlab00.umiacs.umd.edu
----------
----------
openlab01.umiacs.umd.edu
----------
----------
openlab02.umiacs.umd.edu
----------
[xhe@opensub00 31]
The first three lines in your output are a standard
part of how we have our cluster configured and do
not affect how your program runs.
- Running MPI program as Batch Job
At UMIACS, we have LAM, openmpi, MPICH1, MPICH2 installed. LAM is installed at
/usr/local/stow/lam-version; MPICH1 is available in /usr/local/stow/mpich1-version; MPICH2 is available
in /usr/local/stow/mpich2-version; openmpi is available in /usr/local/stow/openmpi-version
First, you need to have an MPI based program written. Here's a simple one:
alltoall.c
- LAM
- To compile this program and execute under using LAM, make sure /usr/local/stow/lam-7.1.4/bin is in your PATH environment variable.
- It can be compiled by doing:
mpicc alltoall.c -o alltoall-lam
The submission file lamsub.sh can be submitted to run your program
#!/bin/bash
#PBS -l nodes=8
#PBS -l walltime=0:10:0
cd ~/torquejobs/lamtest
mpiexec -machinefile ${PBS_NODEFILE} alltoall-lam
Here is what looks like on your terminal after submit the job:
[opensub00 142] qsub lamsub.sh
127.opensrv.umiacs.umd.edu
[opensub00 143] qstat -n
opensrv.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
127.opensrv.umiacs.u xhe dque lamsub.sh -- 8 -- -- 48:00 R --
openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
+openlab06/0+openlab07/0
[opensub00 144]
Output files for this job: lamsub.sh.o127 and lamsub.sh.e127(empty)
The submission file lamsub2.sh uses mpirun instead of mpiexec, which user need
to set up the MPI environment by starting lamboot, then run lamhalt to stop it afterwards. We
recommend user use mpiexec since it will set up MPI runtime environment for your jobs.
- Openmpi
-
To compile and run this program using openmpi, you need to include
/usr/local/stow/openmpi-version in your path. The following example uses /usr/local/stow/openmpi-1.2.6.
- The following script will set up your path variables.
setenv PATH /usr/local/stow/openmpi-1.2.6/bin:$PATH
if ( $?LD_LIBRARY_PATH ) then
setenv LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH
else
setenv LD_LIBRARY_PATH /usr/local/stow/openmpi-1.2.6/lib
endif
The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-openmpi (we changed our environment to point to openmpi's mpicc)
The following is the submission file openmpisub.sh
#!/bin/bash
# Special PBS control comments
#PBS -l nodes=8,walltime=0:10:0
# Set up the path
export PATH=/usr/local/stow/openmpi-1.2.6/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/stow/openmpi-1.2.6/lib:$LD_LIBRARY_PATH
cd ~/torquejobs/openmpitest
echo starting
mpiexec -mca btl tcp,self -n 8 ./alltoall-openmpi
echo ending
Here is what lookslike when submit the script at your prompt:
[xhe@opensub00 91] qsub openmpisub.sh
133.opensrv.umiacs.umd.edu
[xhe@opensub00 92] qstat -n
opensrv.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
133.opensrv.umiacs.u xhe dque openmpisub 16472 8 -- -- 48:00 R --
openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
+openlab06/0+openlab07/0
[xhe@opensub00 93]
Output files for this job: openmpisub.sh.o133 and openmpisub.sh.e133
- MPICH1
- To compile and run this program under MPICH1 you need to set up your environment:
- The following script will set the appropriate environment.
setenv MPI_ROOT /usr/local/stow/mpich-version
setenv MPI_LIB $MPI_ROOT/lib
setenv MPI_INC $MPI_ROOT/include
setenv MPI_BIN $MPI_ROOT/bin
# add MPICH commands to your path (includes mpirun and mpicc)
set path=($MPI_BIN $path)
# add MPICH LD_LIBRARY_PATH pages to your path
if ( $?LD_LIBRARY_PATH ) then
setenv LD_LIBRARY_PATH $MPI_ROOT/LD_LIBRRAY_PATH:$LD_LIBRARY_PATH
else
setenv LD_LIBRARY_PATH $MPI_ROOT/LD_LIBRARY_PATH
endif
It can be compiled by doing: mpicc alltoall.c -o alltoall-mpich1 (remember we changed our environment to point to MPICH's mpicc)
The submission file mpich1sub.sh is almost the same except you need to call mpirun instead of mpiexec.
#!/bin/bash
# Special PBS control comments
#PBS -l nodes=8,walltime=60
# Set up the path
PATH=/usr/local/stow/mpichgm-1.2.7p1-20/bin:$PATH
export $PATH
cd ~/mpich1test/
echo $PBS_NODEFILE
# Run the program
mpirun -np $( wc -l < $PBS_NODEFILE ) ./alltoall-mpich1
Here is what looks like when submit the job from a submit machine at your prompt:
[xhe@brood00 ~/mpich1test]$ qsub mpich1sub.sh
167.queen.umiacs.umd.edu
[xhe@brood00 ~/mpich1test]$ qstat -n
queen.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
167.queen.umiacs.umd xhe dque mpich1sub. -- 8 -- -- 04:00 R --
bug00/0+bug01/0+bug02/0+bug03/0+bug04/0+bug05/0+bug06/0+bug07/0
[xhe@brood00 ~/mpich1test]$
Output files for this job: mpich1sub.sh.o167 and mpich1sub.sh.e167(empty)
- MPICH2
- To compile use mpich2, you need set up your environment for it.
- You need to set up the path variables to include the MPICH2 version you want to use.
You will make two changes: one to the PATH variable; the other, the LD_LIBRARY_PATH.
The following example uses version MPICH2-1.0.7
For a bash shell user, append the following in .bash_profile:
export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7
export PATH=$MPICH2_HOME/bin:$PATH
export LD_LIBRARY_PATH=$MPICH2_HOME/lib:$LD_LIBRARY_PATH
For a C shell user, append the following in your .cshrc:
setenv MPICH2_HOME /usr/local/stow/mpich2-1.0.7
setenv PATH $MPICH2_HOME/bin:$PATH
setenv LD_LIBRARY_PATH $MPICH2_HOME/lib:$LD_LIBRARY_PATH
The sample c code can be compiled by doing: mpicc alltoall.c -o alltoall-mpich2 ( we uses MPICH2's mpicc)
Here is a sample submission file for mpich2: mpich2sub.sh.
#!/bin/bash
#PBS -lwalltime=0:10:0
#PBS -lnodes=8
# Set up the path
export MPICH2_HOME=/usr/local/stow/mpich2-1.0.7
export PATH=$MPICH2_HOME/bin:$PATH
echo starting
mpiexec -n 8 /nfshomes/xhe/torquejobs/mpich2test/alltoall-mpich2
echo ending
Before you submit you job to the cluster, you need do the following to start mpd daemon, which must run on each
compute node to be used by your program.
Make sure you have a file .mpd.conf in your home directory, with a line like this:
secretword=your-favorite-word
Create a hostfile for mpd daemon in some directory that you
can reference. It includes the compute nodes that you want daemons to be started. List the nodes name, one line per node as the following.
openlab00
openlab01
openlab02
...
openlan07
Then start the mpd daemon by type
mpdboot -n #ofnodes -f path-to-hostfile/hostfile. (You will need to run mpiallexit later to shut down the daemon after your job finished.)
After the daemon started, you can submit the about mpich2sub.sh script using command qsub.
Here is what looks like when you submit from your prompt:
[xhe@opensub01 68] mpdboot -n 8 -f mpd.hostfile
[xhe@opensub01 69] qsub mpich2sub.sh
140.opensrv.umiacs.umd.edu
[xhe@opensub01 70] qstat -n
opensrv.umiacs.umd.edu:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
140.opensrv.umiacs.u xhe dque mpich2sub. 3403 8 -- -- 48:00 R --
openlab00/0+openlab01/0+openlab02/0+openlab03/0+openlab04/0+openlab05/0
+openlab06/0+openlab07/0
[xhe@opensub01 71] qstat -n
[xhe@opensub01 72] mpdallexit
[xhe@opensub01 73]
Here are the standard output and standard error for this job: mpich2sub.sh.o140 and mpich2sub.sh.e140(empty)
Please note that if you compile your program with either mpich, lam or openmpi, you MUST execute
it in the same environment. If you compiling program use mpicc from LAM and
then attempting to run program using MPICH's mpiexec. This will fail and you will get an error message similiar to the following:
It seems that there is no lamd running on the host openlab02.umiacs.umd.edu.
This indicates that the LAM/MPI runtime environment is not operating.
The LAM/MPI runtime environment is necessary for MPI programs to run
(the MPI program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
- Commands
Please make sure /opt/UMtorque/bin is in your PATH environment variable.
-
qsub
-
Basic usage
The qsub program is the mechanism for submitting a job. A
job is a shell script, taken either from standard input or as an
argument on the command line.
The basic syntax of qsub, that you will probably be using
most of the time, is:
qsub -l nodes=<nodes> <scriptname>
where <nodes> is the number of machines you'd like to allocate.
Then, when PBS runs your job, the name of the file with the nodes
allocated to you will be in $PBS_NODEFILE, and PBS will begin
running your job on one single node from that allocation.
When you run qsub, you will get a message like:
123.opensrv.umiacs.umd.edu
This is your job id. This is used for many things, and you should
probably keep a record of it.
When a job finishes, PBS deposits the standard output and standard
error as <jobname>.o<number> and
<jobname>.e<number>, where
<jobname> is the name of the script you submitted (or
STDIN if it came from qsub's standard in), and <number>
is the leading number in the job id.
-
-l option
The -l option is used to specify resources used by a PBS job.
Two important ones are nodes, which specifies the number of nodes
used, and walltime, which specifies the maximum amount of
wall clock time that the process will use. The following invocation
of qsub runs a job on 2 nodes for one hour:
qsub -l nodes=2,walltime=01:00:00
It is important that you specify walltime. Without it, your
job may be scheduled unfavorably (because your job takes less than the
thirty minute default). Even worse, your job may be terminated
prematurely if you go over the thirty minute default.
See pbs_resources(7) for more information.
-
The nodes resource
In addition to specifying the number of nodes in a job, you can also
use the nodes resource to specify features required for your job.
-
Submitting to specific nodes
To submit to a specific set of nodes, you can specify those nodes,
separated by a "+" character, in the nodes resources. For
instance:
qsub -l nodes=openlab00+openlab01,walltime=60
... will submit a two node job on openlab00 and openlab01,
with a maxmimum time of sixty seconds.
In general, this should be avoided, since you are limited to those
nodes that you specify. For instance, if you have files that only
reside on particular nodes, in the scratch space, you might want to
use this option.
-
-I option
To submit an interactive job, use the -I option:
qsub -l <resources> -I
Then, instead of enqueuing a batch job and exiting, the qsub
program will wait until your interactive job runs. When it does, PBS
will present you with a shell on one of the nodes that you have been
allocated. You can then use all nodes allocated, until your time
allocation is consumed.
-
Extended job descriptions
The qsub program will let you put information about your job
in your script, by including comments that begin the line with '#PBS',
and include a single command line option. For instance, if I always
want my job to use two nodes, I could put the following at the
beginning of my script:
#PBS -l nodes=2
The "EXTENDED DESCRIPTION" heading in qsub(1) has
more information about using this feature.
-
qstat
This program tells you the status of your's and other people's jobs.
The basic case of running qstat is very simple: you just run
qstat, with no options. If it gives no output, it means
there are no jobs in the queue.
-
qdel
The qdel program is used to remove your job from the queue,
and cancel it if it's running. The syntax for qdel is "qdel <job id>",
but you can abbreviate the job ID with just the leading number.
-
pbsnodes
The pbsnodes command is used to list nodes and their status. You will
probably only use this one way, with the "-a" argument:
pbsnodes -a
-
pbsdsh
Run a shell command on all nodes allocated
For more information, see the man pages for PBS commands. If, for some reason,
these can't be viewed with the default manpath, you can use:
man -M /opt/UMtorque/man <topic>
- Compilers
The following compilers and memory analyze tools are available at UMIACS.
-
Gnu compilers
Besides default Gnu C/C++ and fortran compilers, UUMIACS also have several versions of gcc install at
/usr/local/stow/gcc-version directory.
-
PGI compiler
The Portland Group C and Fortran compilers are installed at /opt/pgi directory.
-
Intel
Intel C and Fortran compilers are installed at /opt/intel.
-
NAGWare
UMIACS has NAGWare fortran compiler, it is installed at /opt/NAGWare_f95 directory.
-
insure++
Parasoft Insure++, is a runtime Analysis, Memory Error Detection software for C and C++. It is installed at /opt/insure directory.
To find other software that installed at UMIACS, please check /usr/local/stow or /opt directory.
UMIACS Condor pool information can be found at:
condorintro.html |