News
2024-09-09 Slurm Upgrade
Slurm has been upgraded from 22.0.5 to 23.11.7 on the production wi-hpc
cluster.
See a full the changelog from slurm here
2024-04-29 Arbiter2
Arbiter2 has been installed and implemented on the WI-HPC Head/Login nodes (wi-hpc-hn1 and wi-hpc-hn2).
Arbiter2 is a cgroups based mechanic that is designed to prevent the misuse of head/login nodes, which are scare, shared resources.
Please see the Arbiter2 page for more information and details on statuses, limits, and penalties.
2023-11-20 Apptainer
Apptainer has been installed and configured on the WI-HPC Cluster. Apptainer is a container software (similar to Docker) that has been specifically designed for use in a HPC environment.
Please see Containers for details on working with Apptainer.
NOTE: This is an on-going project and a new application. Please proceed with caution and contact the IT Help Desk at helpdesk@wistar.org with any questions or issues, thank you.
2023-11-06 Default Wall Times
On Monday November 6th, 2023 all Partitions/Queues were configured to have a DefaultTime of 1 HOUR. If no time is specified in the submission script, it will be set to 1 HOUR. If the job reaches 1 HOUR running time, it will be killed. This DefaultTime is already set on the smq partition (with MAX time of 4 hours).
User's should ALWAYS specify a time when submitting jobs:
#SBATCH --time:DD-HH:MM
2023-10-30 Slurm Reorganization
A fix to the AutoFS mounting of the home directories and a reorganization of SLURM's accounting was applied October 30th, 2023. This change included in the following:
- New Nodes node019, node020, and node063 (GPU) have been added to the Cluster.
- Updated the way home directories (/home/username) were mounted to stop alerts to the IT Help Desk.
- Created new accounts for each lab lab (e.g. kulp).
- User's have been moved from the "wistar" account to their new respective lab account.
- New "smq" partition/queue for small/short running jobs. Max time of 4 hours.
Partition/Queue | QOS | Description |
---|---|---|
defq | general_qos | The default partition/queue |
gpu | general_qos | The GPU partition/queue --partition=gpu |
smq | smq_qos | The "small" queue. --partition=smq Short resource intensive jobs. |
- New QOS's for each partition/queue as well as "ext_qos" for special use cases (e.g. Users doing work for other labs)
QOS | Tied To | Max CPUs | Max Memory (RAM) | Max GPUs | Max WallTime |
---|---|---|---|---|---|
general_qos | defq | 400 | 2400 GB | 20 | Unlimited |
smq_qos | smq | 800 | 4800 GB | 32 | 4 Hours |
ext_qos | per user by request | 800 | 4800 GB | 32 | Unlimited |
2023-08-18 Overview Update
HPC cluster documentation is now hosted at https://hpc.apps.wistar.org/. This will be the main website for news, documentation, examples, training, and all HPC cluster related information. This is documentation currently A WORK IN PROGRESS so if something is incorrect or requires clarification please send an email to the IT Helpdesk at helpdesk@wistar.org and let us know.
Current Hardware:
- 28 compute nodes
- 14 GPU nodes
- 42 total nodes
Current Resources:
- 2,784 processors
- 17,074 GB memory
- 52 GPUs
Current Partitions:
- defq (default queue): nodes[021-046]
- gpu: nodes[050-62]
- gputest: nodes[063]
2022-11-3 Head Node Update
The head node was replicated to new hardware for production purposes. Next step: Failover
2022-10-31 Additional Nodes Added
Eight new nodes were delivered sooner than expected and added to the new cluster - this will enable more users to really run the new cluster through the paces.
2022-10-18 Nodes moved
Error with older nodes required movement of node from production for full testing - removed Compute 30 (regular node) and Compute 10 (GPU node) from cluster-01 into wi-hpc cluster.
2022-10-15 Cluster Launched in Development
The new cluster WI-HPC is being launched with a bunch of changes.