Hive Cluster Manual

Running

MPI

The Message Passing Interface (MPI) is a standard interface for parallel programming. MPI is good, because it means you only have to learn one standard for writing parallel programs across many types of machines from many vendors. However, it is not exactly portable, since certain semantics are left undefined in the standard. Furthermore, performance is not portable with MPI -- what works for one cluster with Myrinet may not work well on a shared memory machine. MPI has bindings for C, C++, and Fortran.

The best way to get started with MPI is by example. There are several tutorials available on line. This site has a listing of several. An excellent tutorial can be found a the NCSA. More general documentation can be found at the main site for MPI. There are also several books written about MPI.
LAM

The LAM implementation of MPI is originally from Indiana University. Today, it is housed at http://www.lam-mpi.org. There is LAM-specific documentation available online.

We will give a short example for using LAM. The following example, as well as the example in the next section on MPICH, rely on factor.c, a short program that uses MPI bcast and gather calls to do communication in order to factor a 64-bit number in a brute-force way. When compiling your programs, you will want to write a Makefile that will compile your programs correctly, such as this one. In these examples, though, we will call the correct commands to compile factor.c explicitly. In the following examples, you will see path names with -gm-nagware in them. This just means that the particular version of LAM or MPICH has been compiled to use the GM device (which uses Myrinet) and the NAG Fortran compiler (when compiling MPI Fortran code).

Under normal circumstances, you would use the special wrapped versions of the compiler, named with a prefix of mpi and followed by the name of the compiler, to form names like mpicc or mpif95. This example is no different. To compile factor.c with LAM:
[bargle@brood01 factor]$ /usr/local/stow/lam-7.0.6-gm-nagware/bin/mpicc -lm -o factor-lam factor.c [bargle@brood01 factor]$
To submit the job to PBS, we need to construct a script. We call this one factor-lam.qsub:
#!/bin/bash # Special PBS control comments #PBS -l nodes=4,walltime=60 # Set up the path PATH=/usr/local/stow/lam-7.0.6-gm-nagware/bin:$PATH export PATH cd $HOME/.src/factor # Start LAM lamboot $PBS_NODEFILE # Run the program mpirun factor-lam 2305843009213693953 # Shut down LAM lamhalt
The output of the program, when finished, is as follows:
Warning: no access to tty (Bad file descriptor). Thus no job control in this shell. LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University Prime factors of 2305843009213693953: 3 768614336404564651 LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University

MPICH

MPICH is another MPI implementation. The performance of MPICH is better for some things, and LAM is better on others. For most, the choice is a matter of personal taste.

Here, we repeat the factor.c example with MPICH instead of LAM:

[bargle@brood01 factor]$ /usr/local/stow/mpich-1.2.5..12-gm-nagware/bin/mpicc -lm -o factor-mpich factor.c
[bargle@brood01 factor]$

We submit the following file. Note that we do not need to set up and shut down the interface like we did with LAM:

#!/bin/bash
# Special PBS control comments
#PBS -l nodes=4,walltime=60

# Set up the path
PATH=/usr/local/stow/mpich-1.2.5..12-gm-nagware/bin:$PATH
export PATH
cd $HOME/.src/factor

# Run the program
mpirun -machinefile $PBS_NODEFILE -np $( wc -l < $PBS_NODEFILE ) factor-mpich 2305843009213693953

The resulting output is much like what we saw with LAM:

Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
Prime factors of 2305843009213693953:
3
768614336404564651

		home \| projects \| facilities \| reference \| contact us © Copyright 2005, Institute for Advanced Computer Study, University of Maryland, All rights reserved.