Welcome to the Octo Beowulf cluster
last update: June 25, 2003: Michele
Introduction
The PMA Divisional Beowulf (Octo,
begun March 2003 as a follow-up to Gordon, begun February 2001) is
meant to allow access to a research grade cluster to students as well
as research groups. The system is currently being used by many
different groups within Physics, and by the students in the ph20/21/22 class. A
very special thank you goes to Intel Corporation for supplying most of
the computing and networking equipment. Without their generous
donation this project never would have been possible.
The Octo Beowulf team is Chris Mach and Michele Vallisneri.
Hardware
- Server (octo): dual processor Intel Xeon 400 (2MB Cache), with 512 MB of RAM
- Nodes (octo-1 to octo-8): 8 Intel Pentium 4 2.40 GHz
- Memory: 4GB (512 MB's of RAM per node)
- OS: Redhat 8.0
- Switch: Intel Express 510T 10/100
Software
- Operating system: Redhat Linux 8.0
- Compilers: GNU GCC suite 3.2
- MPI library: LAM/MPI 7.0
- Batch queuing system: OpenPBS 2.3.16
- Logging in and transferring files: use ssh and scp
Accounts and support
- To get an account, write to beowulf@alice.caltech.edu, including information about:
- Your name and research group affiliation
- The research project you want to develop on the cluster
- Your desired username
- The first four letters of a password (you will receive the last four)
- Your shell preference (if you have one)
- Any comments, questions or desires
- When you have an account, login into octo using slogin or ssh.
- To report problems and bugs or to ask for help, write to (beowulf@alice.caltech.edu).
Compiling and running parallel jobs on Octo
Compiling MPI jobs
- Be sure to include mpi.h (C) or mpif.h (fortran).
- Instead of gcc, g++, and g77,
use the modified versions mpicc, mpiCC, and mpif77: they will automatically link the necessary libraries.
- Sorry, fortran 90 is not available right now.
Running MPI jobs interactively
We discourage running parallel jobs interactively on the cluster. The golden
rule of Beowulf systems is one CPU, one process;
degraded performance, general user frustration and administrator wrath
are the results of violating this rule. So please run your jobs using the
PBS queues (see below).
This said, we reserved the server, octo
(a dual processor machine) to run MPI test jobs in interactive mode. Maybe
you do not want to wait in the queue because you know that your undebugged
code is going to crash immediately anyway. Just remember that octo
is used by everybody to compile programs and to serve files: do not to
hog it.
- Before running an MPI job, you need to build a boot
schema: a file containing all the names of the hosts that will run
the code. If you follow our advice and do tests on octo
alone, it will be enough to give the command
echo octo > lamhosts
- If you have a very good reason to include other nodes (among octo-1 to octo-8) just add them to the file lamhosts,
one per line. You should always include octo.
Boot LAM/MPI: lamboot lamhosts
- Now go to the directory where you have compiled your MPI code (to be original, let's say you want to run a.out). Here give
the command mpirun -wd $PWD -np NPROCS ./a.out, where NPROCS is the number of parallel processes that you want to start:
even if you are running on octo
alone, you can have as many as you want (for testing purposes: of course
if you ask for eight processors instead of two, each will run only at 25%
of its maximum speed).
- You can redirect the input and output of the parallel processes with <
and >; you can stop the parallel
processes with CTRL-c; you can run it in the background with &.
This is not all: mpirun has many
other options. Check its man page.
- Remember to shut down LAM/MPI when you are
done: wipe lamhosts
Running a batch MPI job
This is how a queuing system works: you submit a
job to a queue, requesting a certain resources (typically, a certain
number of processors for a certain maximum time). When the requested number
of processors becomes available (and when fair-share policies are satisfied),
your job is started. This queueing system is designed to avoid oversubscribing
(so that each available processor will run only one process at a time,
maximising performance) and to share the computing power of the cluster
equitably between all users.
- To submit a job to the scheduling system (PBS), you have to prepare a script (say, simple_script) that contains information about the job you want to run. Here is an example script that
will submit the job test_job, and
run the executable test.exe. We are
asking for 8 processors, and we are submitting the job to the queue long:
#!/bin/sh
#
#PBS -N test_job
#PBS -l nodes=8
#PBS -q small
#PBS -o test.out
#PBS -e test.err
#PBS -m abe
#PBS -r n
cd $PBS_O_WORKDIR
lamboot $PBS_NODEFILE
mpirun -wd $PWD -np 16 ./test.exe
wipe $PBS_NODEFILE
If you copy this script, be sure not to include the comments in red.
Put your script in the same directory where your executable lies, and where
you want to get your output files.
- How to choose your queue: estimate the length of your job, and have a look
at the table below. If your job runs for longer than the limit, it will
be mercilessly killed.
| PBS queues on Octo |
| queue |
priority of execution |
max runtime |
| small |
high |
20 mins (00:20:00) |
| medium |
medium |
2 hrs (02:00:00) |
| long |
low |
12 hrs (12:00:00) |
- Submit the job by typing qsub simple_script
- Check the status of your job with qstat;
look in particular at the "S" column: "Q" means that your job is queued
and waiting, "R" that it is running, "E" that it is exiting. qstat
-f will generate much more (useful?) information.
- To kill your job you need to know its identifier. You get that from qsub
when you run a program, or from qstat.
Say the identifier is 20.user. Then
type qdel 20.user. These commands have many other options (though some are not currently implemented
in our system). Check their man pages.
Feedback and documentation
Please provide your feedback (beowulf@alice.caltech.edu). As for documentation, good places to start are the LAM/MPI
and OpenPBS websites.
