Best Practices on WI-HPC Cluster

The following approaches allow the WI-HPC Cluster and SLURM's advanced scheduling algorithm to be efficient as possible. Please read below for further details on each best practice.

Do not install applications on your own.
Do not store large data in your home folder. This should be stored in your lab share.
Do not run jobs on the head node, instead schedule a job.
Request only the resources you NEED.
ALWAYS request a time with the --time option

Please see Getting Help with any questions.

Applications and Software

Please do not install applications in your home directory. Your home directory should be used for personal code, notes, and other files that only pertain to you. Your home directory is not backed up, i.e. if it is removed, it is gone forever.

Applications should always be installed by IT to ensure that there are no conflicting dependencies and it is installed correctly so that all users of the WI-HPC cluster are able to use it. Please see Getting Help to install.

Data and Lab Shares

Data and large files should always be stored in a lab share. This ensures that the data is backed up and all members of your lab are able to access it.

If you are downloading/using publicly available data, please see Getting Help to create a specific lab share. We DO NOT backup publicly available data since it is always available for download.

Jobs on Head Node

DO NOT run jobs on the head/login nodes. They will be killed immediately and your job will not run.

Resources

Do not request more resources (CPUs, Memory, GPUs) than you will need. More resource intensive jobs will take longer to queue. Use the sacct command and Managing Jobs page at the completion of your job to better define resource requirements.

Wall Times

ALWAYS specify both a preferred maximum time limit, and a minimum time limit, as well. The follow specification will cause SLURM to schedule your job in the earliest available time window of 4 hours or long, up to 12 hours:

#SBATCH --time=12:00:00
#SBATCH --time-min=04:00:00

If you have very many short jobs less than 1-5 minutes, then they should be combined into one large job using a simple loop in the batch script so as to minimize the overheads of each job (starting, accounting etc).

Memory Limits

Specify memory requirements explicitly, either as memory per node, or as memory per CPU:

#SBATCH --mem=12G
or
#SBATCH --mem-per-cpu=3G

Requesting more memory (RAM) than you need, will results in longer wait time in the queue. On the other hand, if you do not request enough, the job may be killed for attempting to exceed allocated memory.

It is recommended to request just a little more memory (RAM), but not much more, than your job will need.

Please use --mem instead of --mem-per-cpu in most cases. There are a few kinds of jobs for which --mem-per-cpu is more suitable, such as MPI jobs.

Node Requirements

It is always a good practice to ask for resources in terms of cores or tasks, rather than number of nodes. For example, 10 nodes would run 480 tasks on 480 cores.

The wrong way to ask for the resources:

#SBATCH --nodes=10

The right way to ask for the resources:

#SBATCH --ntasks=480