The cluster is a Linux-based Rocks cluster for exclusive use by the class. There are six nodes including the head node. Each node consists of two quad-core Intel Xeon E5504 chips for a total of eight cores per node. Jobs are submitted using the Sun Grid Engine (SGE). Each node will only run one job at a time, so be polite and don't hog the machine with very long runs!
The cluster head node is crocus.csuglab.cornell.edu
. It
should be accessible from any on-campus IP address (or you can reach it
if you're logged into the campus VPN. Log into
crocus
using ssh
with the account name and
password that was sent to you via Dropbox (your account name is your
netid). On the first login, you will be prompted for passwords for
ssh
keys; just hit enter, as these keys are purely used for
private communication inside the cluster. Once you've done this, you
should be delivered to a Unix prompt. Welcome to the head node!
After your initial login, you should probably change your password to
something that you can remember. Alternately, you may want to set up
password-less ssh authentication between your machine and the cluster.
The details will depend on which ssh client you use. On Linux, I suggest
looking into keychain. On OS X
10.5 onward, ssh-agent
runs for you automatically, so you
can simply add keys at the command line using ssh-add
and
then not worry about it. Under Windows, you may want to look into
PuTTY,
which apparently has support for ssh keys; see
this tutorial, for example, keeping in mind as you read it that I
don't typically use Windows myself.
You have access to two types of storage on the cluster. Your home
directory is hosted on the head node, and is mounted by NFS on all the
other nodes. You can read or write to your home directory files in one
place and see them in other places (eventually). However, NFS uses a lot
of bandwidth, and it is easy to swamp the server. For big files, use
/state/partition1
, a user-accessible local partion that
exists on each node (on the head node, this is where the home directories
are, but the head node is a special case).
Software is provided in the usual locations (/usr/bin
and
/usr/local/bin
), but there are also common installations in
/share/apps/local
. In particular, this is where GCC 4.4 (and
gfortran) and ATLAS are installed. This is not in the default path, so
you will either need to edit your path or type the fully-qualified
command names to use these compilers.
Compute nodes on the cluster are dual quad-core Intel Xeon E5405 chips
running at 2.0 GHz. This is the "Gainestown" family fabricated in the 45
nm process, based on the Nehalem architecture. For more details on the
processor type, try cat /proc/cpuinfo
(followed by
Googling!). There are some slides
on this architecture from another class that you can read to find out
more.
The nominal peak per node is 8 GFlop/s, if one starts two SSE instructions per cycle (each of which can handle two double-precision floating point operations). There are 16 GB of physical RAM per node. Each core has a 4-way associative 32 KB L1 cache and an unshared 8-way 256KB L2 cache. There is also a shared (within a processor) 16-way 4 MB L3 cache.
The command to submit jobs to the queue is qsub
; try
qsub -help
or man qsub
at the command line to
see the basic documentation. Running qsub scriptname
will
schedule scriptname
to be run on one of the compute nodes.
scriptname
is usually a shell script; in addition to the
normal shell operations, one can use comment lines starting with
#$
to set execution options (these options can also be set
via the command line).
Some good options to know are:
-cwd
- run from the current working directory (where
qsub
was called).-wd name
- specify that the named directory should be
the working directory where the job is run-j y
- merge the standard output and standard error
streams from the job-o name
- redirect stdout. By default, this gets sent
to a file called scriptname.oxxx
in the current directory,
where xxx
is the job number.-e name
- redirect stderrUse qstat
to see the current status of your jobs, and use
qdel
to delete jobs (particularly if they appear to be
runaway!).