|
|
|
|
|
Submitting
Scheduling Overview
The primary job queuing system on the Vnode
cluster is the Portable Batch System (PBS) Our
installation is used for running parallel jobs
or making use of dedicated reservations. We
use a separate program called Maui
for scheduling jobs and for reserving
resources.
Vnode Quick Start
The vnode cluster is heterogeneous in that
nodes play different roles: some are attached to
the lcd wall (vnodelcdwalls) and others are not
connected to a screen ( vnodes ). You can use
different queues in our scheduler to access
these different resources.
To allocate a single
machine with a GPU for interactive use, just run
qsub -I . This leads to the
default queue, called 'single', that just
allocates a single node for up to 8 hours. To
allocate two nodes with GPUs for interactive
use, run qsub -q double -I .
This leads to the 'double' queue that allocates
two nodes for up to four hours. Neither of
these queues, lead to any nodes that have
physical displays so these are suitable for
remote processing, visualization, and general
purpose gpu work.
You can access the display wall as a whole or
as any of four 2x2 arrays named lcdquad0,
lcdquad1, lcdquad2, and lcdquad3. To access
lcdquad0 of the 2x2 arrays, run qsub -q
lcdquad0 -I . Any of the lcdquads can
be allocated with a similar invocation. To
access the whole wall, run qsub -q
lcdwall -I
Advanced Reservations
A lot of the work on the vnode cluster
requires demonstrations or collaborations so
it's important that users can resources through
maui. You make reservations with the
setres command. List them
with the
showres command and cancel
them with the releaseres
These commands are installed in /opt/UMmaui/bin,
so you may want to add that to your path if you
make frequent use of advanced reservations.
For example, to reserve lcdquad0 for an hour and 15 minutes
at 2pm on June 3rd, run setres -u
username -s 14:00:00_6/3 -d 01:15:00 -f lcdquad0 ALL
In this example, username should be
your username. You can reserve lcdquad0, lcdquad1,
lcdquad2, lcdquad3, or the lcdwall similarly by
changing the feature/queue specified by the -f
argument.
You can list a reservations as:
[fmccall@vnodesub00 ~]$ /opt/UMmaui/bin/showres
Reservations
ReservationID Type S Start End Duration N/P StartTime
fmccall.0 User - 00:00:54 00:10:54 00:10:00 2/2 Wed Apr 26 19:09:36
1 reservation located
You can delete reservations with the
releaseres command as
follows:
[fmccall@vnodesub00 ~]$ /opt/UMmaui/bin/releaseres fmccall.0
released User reservation 'fmccall.0'
To use your reservation, run qsub as
qsub -q lcdquad0 -W
x=FLAGS:ADVRES:fmccall.0 -I where
lcdquad0 is the feature that you requested in
you reservation and fmccall.0 is your
reservation id.
If your reservation has already begun, then
you may need to specify a shorter runtime to
qsub. For example, if only 30 minutes remains
on your reservation then the command above will
not work because it will be asking for the one
hour default runtime. Specify a shorter
walltime with qsub -q lcdquad0 -l
walltime=00:29:00 -W x=FLAGS:ADVRES:fmccall.0
-I to specify 29 minutes of
runtime.
The cluster is a shared resource. Only
reserve the entire wall for critical demos or
for research that requires it. Most jobs and
all academic coursework should use the lcdquads.
Don't reserve more than you need and try to
limit reservations to no more than 2 hours.
PBS Usage
There are many other options available
through the cluster's scheduler .To see the
current policy on the cluster, you can use the
qmgr(8) command:
[bargle@brood01 ~]$ qmgr
Max open servers: 4
Qmgr: print queue dque
#
# Create queues and set their attributes.
#
#
# Create and define queue dque
#
create queue dque
set queue dque queue_type = Execution
set queue dque resources_max.cput = 192:00:00
set queue dque resources_max.walltime = 96:00:00
set queue dque resources_min.cput = 00:00:01
set queue dque resources_default.cput = 192:00:00
set queue dque resources_default.nodes = 1:ppn=1
set queue dque max_user_run = 10
set queue dque enabled = True
set queue dque started = True
Qmgr: quit
[bargle@brood01 ~]$
This command starts the queue management
command for PBS. You cannot manipulate the
queue from here, but you can inspect it.
Here we print out the configuration for
the dque queue. The dque
queue is the default -- there are other
queues, but their use is out of the scope of
this document. Here,
the resources_max.walltime value
tells us the current maximum walltime for a
job, and the max_user_run property
tells us the maximum number of jobs that
will run for any user at any time.
Aside from qmgr , which you would
only use for inspecting the current policy,
there are several commands that you will use
for submitting, inspecting, and controlling
jobs. The following is by no means a
complete reference. Unfortunately, there is
not a lot of documentation available online.
You should look at the man pages if you have
further questions.
qstat
The qstat(1B) command is used
for querying the status of the queue, as
well as the status of individual jobs.
For the most part, you will be invoking
the qstat command without
arguments to examine the state of the
entire queue. However, one can specify
one or more jobs on the command line to
pick one out in particular, or give
additional flags such as -n
or -f to get allocated node
information, or full job information,
respectively. The curious should consult
the man page for more information.
Here are some examples of the use and
output of qstat . Assume that
I have already submitted a job,
identified by 11216.queen , and
it has not run yet:
[bargle@brood01 factor]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11216.queen STDIN bargle 0 Q dque
The output of this command can be
interpreted as follows:
- Job id is the PBS identifier for the
job. This is unique in the queue. In
this case,
11216.queen
indicates that my job is the 11216th
job submitted to queen , the
host where the PBS service runs
- Name is the name of the script that
was submitted. This is not unique.
In this case,
STDIN indicates
that I piped the script directly to
the submission program instead of
using a persistent script on disk.
This is a useful but rarely used
technique.
- User is the UNIX username of the
user who submitted the job.
User
bargle is my
username.
- Time Use is the amount of CPU time
accumulated by the job. No time has
been used by this job, because it is
still queued.
- "S" is the current state
of the job. "Q" indicates
that the job is queued. State
"R" indicates that the job is
running.
- Queue is the name of the queue where
the job has been submitted. This will
almost always be
dque .
Now, the job has been scheduled to run,
but the PBS service has not accounted
any CPU time use for the job yet:
[bargle@brood01 factor]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11216.queen STDIN bargle 0 R dque
Here the job has started to accumulate
CPU time:
[bargle@brood01 factor]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11216.queen STDIN bargle 00:00:13 R dque
Finally, after the job has finished
executing (note that there is no output,
since the queue is empty):
[bargle@brood01 factor]$ qstat
[bargle@brood01 factor]$
In the directory that was current when
the job was submitted, PBS also left the
results of output to stdout
and stderr . They are
called STDIN.o11216
and STDIN.e11216 respectively.
We will go over the output of PBS a
little more, later.
qsub
The qsub(1B) program is used
for submitting jobs to PBS. It has two
primary modes of use: interactive jobs,
and batch jobs. Interactive jobs are
useful for testing your programs, but
not very useful for running many jobs
since it requires your input. We will
look at interactive jobs first. The
following command asks for two nodes and
sixty seconds (-l
nodes=2,walltime=60 ) in interactive
mode (-I ). Here, after I get
my allocation, I look at the contents of
the $PBS_NODEFILE (which lists the nodes
I have allocated) and exit:
[bargle@brood01 factor]$ qsub -l nodes=2,walltime=60 -I
qsub: waiting for job 11212.queen.umiacs.umd.edu to start
qsub: job 11212.queen.umiacs.umd.edu ready
[bargle@bug60 ~]$ cat $PBS_NODEFILE
bug60
bug59
[bargle@bug60 ~]$ exit
logout
qsub: job 11212.queen.umiacs.umd.edu completed
[bargle@brood01 factor]$
Next, we submit a job from a script to
use the pbsdsh program to run a
process on all allocated nodes. The
script, called helloworld.qsub ,
is as follows:
#!/bin/bash
# Set up the path
PATH=/usr/local/bin:$PATH
export PATH
# Make all hosts print out "Hello World"
pbsdsh echo Hello World
To submit the job:
[bargle@brood01 examples]$ qsub -l nodes=4 helloworld.qsub
11220.queen.umiacs.umd.edu
[bargle@brood01 examples]$
When a job finishes, PBS drops two
output files in the directory that was
current when the job was submitted.
These files are named for the script and
the job number. In this case, the files
are called
helloworld.qsub.o11220 and
helloworld.qsub.e11220 for the
standard output and standard error,
respectively. The error file is empty,
but here is the result of the output:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Hello World
Hello World
Hello World
Hello World
The warning in the first two lines of
the output is innocuous, and occurs in
every output file from PBS. The next
four lines are the result of "Hello
World" being printed out from the
four nodes where the job was scheduled,
as a result of the pbsdsh
command. There are more examples in
the next section.
qdel
The qdel(1B) program is used
for deleting jobs from the queue when
they are in the queued state. For
example:
[bargle@brood01 examples]$ qstat 11222.queen.umiacs.umd.edu
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11222.queen STDIN bargle 0 Q dque
[bargle@brood01 examples]$ qdel 11222
[bargle@brood01 examples]$ qstat
[bargle@brood01 examples]$
qsig
The qsig(1B) program can be
used to send UNIX signals to running
jobs. For instance, it can be used to
kill running jobs:
[bargle@brood01 examples]$ qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
11221.queen STDIN bargle 00:00:01 R dque
[bargle@brood01 examples]$ qsig -s TERM 11221
[bargle@brood01 examples]$ qstat
[bargle@brood01 examples]$
pbsnodes
The pbsnodes(1B) program can
be used to inspect the state of the
nodes. It can be used to examine
offline nodes, or all nodes. To list
all offline nodes:
[bargle@brood01 examples]$ pbsnodes -l
bug63 offline
[bargle@brood01 examples]$
To examine all nodes:
[bargle@brood01 examples]$ pbsnodes -a
bug00
state = free
np = 2
ntype = cluster
bug01
state = free
np = 2
ntype = cluster
... deleted ...
bug62
state = free
np = 2
ntype = cluster
bug63
state = offline
np = 2
ntype = cluster
[bargle@brood01 examples]$
Condor
Condor is used for high-throughput computing.
It does not deal well with jobs that require
parallel access to more than one machine, so
it is generally only used for serial jobs.
Among other things, Condor supports I/O
redirection and automatic checkpointing to add
a level of fault tolerance to computing, as
well as letting jobs get pre-empted and move
from machine to machine. Jobs in Condor will
get pre-empted by jobs scheduled through PBS,
or if the job runs too long and there are
others waiting. We have local documentation
and examples, both introductory,
and for running Matlab
code under Condor. There is extensive
documentation available online.
|
|
|
|
|
|
|
|