Skip to content

News

2024-09-09 Slurm Upgrade

Slurm has been upgraded from 22.0.5 to 23.11.7 on the production wi-hpc cluster.

See a full the changelog from slurm here

2024-04-29 Arbiter2

Arbiter2 has been installed and implemented on the WI-HPC Head/Login nodes (wi-hpc-hn1 and wi-hpc-hn2).

Arbiter2 is a cgroups based mechanic that is designed to prevent the misuse of head/login nodes, which are scare, shared resources.

Please see the Arbiter2 page for more information and details on statuses, limits, and penalties.

2023-11-20 Apptainer

Apptainer has been installed and configured on the WI-HPC Cluster. Apptainer is a container software (similar to Docker) that has been specifically designed for use in a HPC environment.

Please see Containers for details on working with Apptainer.

NOTE: This is an on-going project and a new application. Please proceed with caution and contact the IT Help Desk at helpdesk@wistar.org with any questions or issues, thank you.

2023-11-06 Default Wall Times

On Monday November 6th, 2023 all Partitions/Queues were configured to have a DefaultTime of 1 HOUR. If no time is specified in the submission script, it will be set to 1 HOUR. If the job reaches 1 HOUR running time, it will be killed. This DefaultTime is already set on the smq partition (with MAX time of 4 hours).

User's should ALWAYS specify a time when submitting jobs:

#SBATCH --time:DD-HH:MM

2023-10-30 Slurm Reorganization

A fix to the AutoFS mounting of the home directories and a reorganization of SLURM's accounting was applied October 30th, 2023. This change included in the following:

  • New Nodes node019, node020, and node063 (GPU) have been added to the Cluster.
  • Updated the way home directories (/home/username) were mounted to stop alerts to the IT Help Desk.
  • Created new accounts for each lab lab (e.g. kulp).
  • User's have been moved from the "wistar" account to their new respective lab account.
  • New "smq" partition/queue for small/short running jobs. Max time of 4 hours.
Partition/Queue QOS Description
defq general_qos The default partition/queue
gpu general_qos The GPU partition/queue --partition=gpu
smq smq_qos The "small" queue. --partition=smq Short resource intensive jobs.
  • New QOS's for each partition/queue as well as "ext_qos" for special use cases (e.g. Users doing work for other labs)
QOS Tied To Max CPUs Max Memory (RAM) Max GPUs Max WallTime
general_qos defq 400 2400 GB 20 Unlimited
smq_qos smq 800 4800 GB 32 4 Hours
ext_qos per user by request 800 4800 GB 32 Unlimited

2023-08-18 Overview Update

HPC cluster documentation is now hosted at https://hpc.apps.wistar.org/. This will be the main website for news, documentation, examples, training, and all HPC cluster related information. This is documentation currently A WORK IN PROGRESS so if something is incorrect or requires clarification please send an email to the IT Helpdesk at helpdesk@wistar.org and let us know.

Current Hardware:

  • 28 compute nodes
  • 14 GPU nodes
  • 42 total nodes

Current Resources:

  • 2,784 processors
  • 17,074 GB memory
  • 52 GPUs

Current Partitions:

  • defq (default queue): nodes[021-046]
  • gpu: nodes[050-62]
  • gputest: nodes[063]

2022-11-3 Head Node Update

The head node was replicated to new hardware for production purposes. Next step: Failover

2022-10-31 Additional Nodes Added

Eight new nodes were delivered sooner than expected and added to the new cluster - this will enable more users to really run the new cluster through the paces.

2022-10-18 Nodes moved

Error with older nodes required movement of node from production for full testing - removed Compute 30 (regular node) and Compute 10 (GPU node) from cluster-01 into wi-hpc cluster.

2022-10-15 Cluster Launched in Development

The new cluster WI-HPC is being launched with a bunch of changes.