Skip to content

Data Storage and Transfer

Wistar HPC Key Directories

  • Home - /home/username
  • Lab Share - /wistar/lab-name
  • Tempdrop - \\cifs\tempdrop\lab-name (only accessible through Windows or MacOS)
  • Linux (HPC Lab Share) - \\cifs\lab-name\linux
  • Custom Shares - /wistar/custom-share-name
  • Resources - /resources
  • Applications - /applications

Home Directories

Each user is provided with a home directory with their own .bashrc and .bash_profile file. These can be configured by end users, to but if leveraging installed applications, you should not need to update your PATH variable here, loading Application modules will automatically update your PATH variables.

Do not install applications in your home folder, if you need an application installed, please contact the IT Help Desk at helpdesk@wistar.org.

Do not store large datasets in your home directory, these should be stored in your lab share.

Home directories are NOT BACKED UP, so if the dataset or application folder is deleted, it is gone forever.

Lab Shares

Each user is provided access their Lab hares via the HPC cluster. This lab share is a mount of the linux directory inside the lab share that is accessible via PCs and Macs through SMB/CIFS shares. Files can be moved in/out of the linux directory from users end devices and then accessible from the HPC Cluster.

These can be accessed via:

  • File Explorer (Windows): \\cifs\lab-name\linux

  • Finder (MacOS): smb://cifs/lab-name/linux

If you are a user needed access to another lab's lab share, please contact the IT Help Desk at helpdesk@wistar.org. PLEASE NOTE, we will require approval from the lab share owner.

Tempdrop

PLEASE NOTE This share is not accessible through the WI-HPC Cluster, only through Windows or MacOS. This tempdrop share is intended solely as a temporary storage solution for data. It should not be utilized for long-term storage purposes. Any data meant for retention and continued work should be promptly moved to the normal lab share.

All users have access to all lab folders within tempdrop. This is meant for collaboration between labs (e.g. Lab1 does some work for Lab2. Lab1 then places the data in Lab2's tempdrop. Lab2 transfers it to their lab share linux directory to continue work in the WI-HPC cluster.)

Custom Shares

If you have specific storage requirements based on data-set restrictions, IT can configure a custom share for use in the HPC cluster. Send an email to the IT Helpdesk at helpdesk@wistar.org for more details.

Resources

This location is specifically designed for shared data repositories that are used as part of analysis. This should be where any downloadable shared data is placed for utilization by all HPC cluster users. Examples of this type of data include: AlphaFold Databases, Collabfold Databases, etc.

If you need data placed here, please send a email to the IT Help Desk at helpdesk@wistar.org and we will work with you to get this data downloaded and stored.

NOTE - this is a READ-ONLY location and is not BACKED UP - it is designed for speed and effective data management practices - there is no need to put common database in your lab shares that are being backed up.

Applications

This location is designed for centrally installed applications that can be loaded via the Modules command. All applications here are installed centrally by IT.

Archiving Data

Archiving data ensure that important information is securely stored and easily retrievable when needed as well as saving space in your lab share.

Send an email to the IT Help Desk at helpdesk@wistar.org and specify which directories you wish to archive.

As a best practice, lab shares should contain a ARCHIVE-YYYY where YYYY represents the current year. For example, if archiving data in 2024, the directory name would be "ARCHIVE-2024". Any files/folders during that current year that are designated for archiving should be moved into this folder, and at the end of the year IT can archive this data.

Once archived, the directory will contain a report file named DIRECTORYNAME-YYYY-MM-DD.txt that lists out all files/folders that existed in that directory along with their permissions.

If the directory and/or data needs to be recovered from the archive, please send an email to the IT Help Desk at helpdesk@wistar.org specifying the directory you wish to recover.

PLEASE NOTE: Larger directories will take a few days to archive, please refrain from attempting to access the directory as to not interrupt the archiving process.

Transferring Data

Large File Transfers

You can use the Globus service to perform larger data transfers between your local machine and the clusters. Globus provides a robust and resumable way to transfer larger files or datasets. Please see their official docs to get started.

As a best practice, using the scp and rsync commands should only be used for small files/folders. Even then, the best way to access your lab shares's file system would be to mount your lab share on a local device and navigating to the linux directory. Please see Connect to Network Share Devices

Graphical Transfer Tools

Cyberduck

You can also transfer files between your local computer and a cluster using an SFTP client, such as Cyberduck (OSX/Windows). You will need to configure the client with your netid as the username, the cluster transfer node as the hostname and your private key as the authentication method. An example configuration of Cyberduck is shown below.

Cyberduck sample configuration.

Command-Line Transfer Tools

NOTE: The scp and rsync commands should only be used for small to medium data. Any large data should be transferred by mounting your lab share on your local device and dropping it into your lab shares linux directory. Or by using Globus, see Large File Transfers

scp and rsync

Linux and macOS users can use scp or rsync. Use the hostname of the cluster transfer node (see above) to transfer files. These transfers must be initiated from your local machine.

scp and sftp are both used from a Terminal window. The basic syntax of scp is

scp [from] [to]

The from and to can each be a filename or a directory/folder on the computer you are typing the command on or a remote host (e.g. the transfer node).

Transfer a File from Your Computer to a Cluster

Using the example netid abc123, following is run on your computer's local terminal.

scp myfile.txt abc123@wi-hpc.wistar.upenn.edu:/home/username/abc123/test

In this example, myfile.txt is copied to the directory /home/username/abc123/test: on Grace. This example assumes that myfile.txt is in your current directory. You may also specify the full path of myfile.txt.

scp /home/xyz/myfile.txt abc123@twi-hpc.wistar.upenn.edu:/home/username/abc123/test

Transfer a Directory to a Cluster

scp -r mydirectory abc123@wi-hpc.wistar.upenn.edu:/home/username/abc123/test

In this example, the contents of mydirectory are transferred. The -r indicates that the copy is recursive.

Transfer Files from the Cluster to Your Computer

Assuming you would like the files copied to your current directory:

scp abc123@wi-hpc.wistar.upenn.edu:/home/username/abc123/myfile.txt .

Note that . represents your current working directory. To specify the destination, simply replace the . with the full path:

scp abc123@wi-hpc.wistar.upenn.edu:/home/username/abc123/myfile.txt /path/myfolder