Note: This page is written for students, staff and visiting workers at the BSU. This facility is not available to other academics or the general public.
The BSU owns two small clusters, hosted and maintained by the Cambridge University High Performance Computing Service.
The CPU cluster contains 28 nodes, each with 16 cores and 64GB of RAM.
The GPU cluster contains 26 nodes, each with 12 cores, 32GB of RAM, and an NVIDIA Tesla K20 GPU. (Note that 3 nodes contain a Tesla M2090 instead). The login node also contains a GPU for testing purposes.
Unable to Login?
If you haven’t used your account for a year or two, and find yourself unable to login, you may need to reset your password. Your HPC account now uses your Raven password, but it needs to be ‘pushed’ to the HPC system first. Go to https://password.csx.cam.ac.uk and select ‘Change your password’. It is allowable to re-use the same password. You should now be able to use this password to login to the HPC.
Applying to use the HPC
To apply for an account, users should fill in this online application form.
You will need your Cambridge CRSid (the short letters-and-numbers identifier that you have been issued, eg. ccs36) and your Raven password. If you are a visiting worker who hasn’t been issued with these, please speak to Colin Starr (room F42, firstname.lastname@example.org).
The principal investigator is Dr Chris Wallace, email@example.com, phone (3)30389. Department is ‘MRC Biostatistics Unit’, school is ‘Clinical Medicine’. Research group and project title aren’t important, so don’t worry too much about these. Leave the funding section blank.
Connecting to the Cluster
It is only possible to connect to the HPC from within the BSU or Cambridge University network, so if you are working on your personal computer you will need to connect to the BSU first, using either SSH or the F5 browser login. Then you should SSH to the BSU’s dedicated login node, replacing CRSid with your own identifier:
To make things easier, you can set up an SSH tunnel, by first running the following command (13579 is the port number; this can be any number that your system isn’t already using):
ssh firstname.lastname@example.org -L13579:login-mrc-bsu.hpc.cam.ac.uk:22
and logging in to the BSU system. As long as this connection is active, you can then connect directly to the HPC by connecting to ‘localhost’ on the specified port, for example:
ssh CRSid@localhost -p 13579
This tunnel can be used in any software which can make use of SSH or SFTP connections. For example, you can open files in Emacs by using the file path:
Note that if you are connecting on Eduroam, even from a laptop within the BSU, you will only have access to the general login nodes (you will get an error message if you try to connect to the BSU login node):
However this should be set up like the BSU login node, and you can still submit jobs to the BSU cluster, so you can work as normal.
To copy files from your computer to the HPC, use ‘scp’. This works similarly to the unix ‘cp’ command. For example, to copy files to the HPC:
scp path/to/source CRSid@login-mrc-bsu.hpc.cam.ac.uk:path/to/destination
SLURM Queueing System
The HPC uses a queueing system. This means that you don’t run your jobs directly, but you submit them to a queue. When sufficient resources are available, the queuing system runs your job. Jobs are run mostly on a first-come-first-served basis, but can rearrange jobs to make the most of the available resources. The queuing system on our clusters is called SLURM.
You tell the system about your job by modifying a SLURM script. There should be two sample scripts in your home directory.
The home directory of your account should contain some sample scripts. Copy this and open it for editing. You need to modify two sections of the file.
The first section specifies the resources you require. The -A option is the account to charge: this should be “MRC-BSU-SL2” for the CPU cluster, and “MRC-BSU-SL2-GPU” for the GPU cluster. You also need to specify the number of nodes and cores you want.
The wallclock time is an upper time limit on the amount of time your job will be allocated. Jobs that exceed this time will be killed by the system. Please make an effort to estimate this rather than just specify a week or two, as it will aid scheduling and also allow the system to kill locked or faulty jobs.
You will also need to modify the SBATCH -p line, despite the ‘Do not change’ comment, to specify the name of the BSU cluster. This should be ‘mrc-bsu-sand’ for the CPU cluster, and ‘mrc-bsu-tesla’ for the GPU cluster.
There are further options that you can specify; for more information see the sbatch manual page: http://slurm.schedmd.com/sbatch.html.
The second section to change is further down the file, where it specifies the command to run, in the variable CMD. If you are using MPI then you want one of the mpirun commands, but for most jobs, you want to specify the ‘application’ and ‘options’ variables.
For running an R job, you can use Rscript, for example:
When your script file is ready, submit it to the queue by running:
Compiling, testing, and interactive nodes
There is only one login node for the BSU system. You can use this node for compiling, editing, and short tests, but remember that this node must remain available for everyone.
Please do not run long jobs on this node. If you need an interactive node or nodes, you can submit an interactive job.
There is more documentation available on the HPC website: http://www.hpc.cam.ac.uk/using-clusters, and it is worth browsing through this before doing serious work on the cluster.
In particular, note the section on modules, which allows you to easily choose different software packages or versions.
And if you have any further questions about using the cluster, or how to parallelise and speed up your code, please come and talk to Colin Starr, room F42, email@example.com.