MPI
The Message Passing Interface (MPI) is a
standard interface for parallel
programming. MPI is good, because it means
you only have to learn one standard for
writing parallel programs across many types
of machines from many vendors. However, it
is not exactly portable, since certain
semantics are left undefined in the
standard. Furthermore, performance is not
portable with MPI -- what works for one
cluster with Myrinet may not work well on a
shared memory machine. MPI has bindings
for C, C++, and Fortran.
The best way to get started with MPI is
by example. There are several tutorials
available on line. This
site has a listing of several. An
excellent tutorial can be found a the NCSA.
More general documentation can be found at
the main
site for MPI. There are also several
books
written about MPI.
LAM
The LAM implementation of MPI is
originally from Indiana University.
Today, it is housed at http://www.lam-mpi.org.
There is LAM-specific documentation
available online.
We will give a short example for using
LAM. The following example, as well as
the example in the next section on MPICH,
rely on
factor.c
,
a short program that uses MPI bcast and
gather calls to do communication in order
to factor a 64-bit number in a brute-force
way. When compiling your programs, you
will want to write a Makefile that will
compile your programs correctly, such as
this one. In these
examples, though, we will call the correct
commands to compile factor.c
explicitly. In the following examples,
you will see path names
with -gm-nagware
in them. This
just means that the particular version of
LAM or MPICH has been compiled to use the
GM device (which uses Myrinet) and the
NAG Fortran compiler (when compiling MPI
Fortran code).
Under normal circumstances, you would use
the special wrapped versions of the
compiler, named with a prefix
of mpi
and followed by the name
of the compiler, to form names
like mpicc
or mpif95
.
This example is no different. To
compile factor.c
with LAM:
[bargle@brood01 factor]$ /usr/local/stow/lam-7.0.6-gm-nagware/bin/mpicc -lm -o factor-lam factor.c
[bargle@brood01 factor]$
To submit the job to PBS, we need to
construct a script. We call this one factor-lam.qsub
:
#!/bin/bash
# Special PBS control comments
#PBS -l nodes=4,walltime=60
# Set up the path
PATH=/usr/local/stow/lam-7.0.6-gm-nagware/bin:$PATH
export PATH
cd $HOME/.src/factor
# Start LAM
lamboot $PBS_NODEFILE
# Run the program
mpirun factor-lam 2305843009213693953
# Shut down LAM
lamhalt
The output of the program, when finished,
is as follows:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
Prime factors of 2305843009213693953:
3
768614336404564651
LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
MPICH
MPICH
is another MPI implementation. The
performance of MPICH is better for some
things, and LAM is better on others. For
most, the choice is a matter of personal
taste.
Here, we repeat the factor.c
example with MPICH instead of LAM:
[bargle@brood01 factor]$ /usr/local/stow/mpich-1.2.5..12-gm-nagware/bin/mpicc -lm -o factor-mpich factor.c
[bargle@brood01 factor]$
We submit the following file. Note that
we do not need to set up and shut down the
interface like we did with LAM:
#!/bin/bash
# Special PBS control comments
#PBS -l nodes=4,walltime=60
# Set up the path
PATH=/usr/local/stow/mpich-1.2.5..12-gm-nagware/bin:$PATH
export PATH
cd $HOME/.src/factor
# Run the program
mpirun -machinefile $PBS_NODEFILE -np $( wc -l < $PBS_NODEFILE ) factor-mpich 2305843009213693953
The resulting output is much like what we
saw with LAM:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Prime factors of 2305843009213693953:
3
768614336404564651