Note - Scratch cleaning policy will be changing on July 1st, 2016. The new policy will be to delete files 60 days after they are created.
User scratch space
All users have /nfsscratch/Users/HawkID available to them via our ZFS/NFS file systems. It is set to be read/write/executable for the user only. This user space is located on /nfsscratch/$Username on all nodes. There is also a local scratch space available (/localscratch/$USER) on all nodes if local disk scratch is needed. Note that using /localscratch is not usually the best choice. While there is no potential for network contention it is only a single disk on a node. If you are running a single job on a node, and have exclusive access to the node, then /localscratch should perform well. However, if you have multiple jobs on a node accessing /localscratch then the disk IO contention will be high and job performance will suffer. If the IO load is light then /localscratch may still be a viable option but heavy IO load could create a disk bottleneck. If you are not sure then test a job or small set of jobs and compare the performance of /localscratch vs. /nfsscratch. If the job is a parallel job and all nodes need access to the files then /localscratch is not an option as the files are only available to a single node.
...
Info |
---|
A scratch filesystem is a place to store intermediate job data which can be destroyed when a job is finished. Performance is better than using your home directory or an LSS share which are meant for long term data storage. |
Note |
---|
As of February 3, 2023, files on the cluster-wide /nfsscratch filesystem are subject to deletion 40 days after they were created. Policy for node-specific /localscratch filesystems is independent of this. |
User Scratch Space
Each compute node has its own local scratch filesystem. Users may read from and write to this using their own exclusive directory at /localscratch/Users/<HawkID>.
In addition to local storage, the HPC cluster system has two large, shared filesystems mounted across all its nodes via NFS and BeeGFS. Analogously, users can read from and write to their own exclusive directory at /nfsscratch/Users/<HawkID> for the NFS system and /scratch/Users/<HawkID> for the BeeGFS filesystem.
Anchor | ||||
---|---|---|---|---|
|
Scratch filesystems are a shared resource available for the convenience of all users. Therefore, files on these filesystems are subject to deletion after a certain lifespan as determined by the HPC policy committee. Home account storage and purchased storage are not subject to this policy.
/localscratch
login nodes
On /localscratch, the allowed file lifespan is 30 days after the file was last accessed, where each file's age is the time elapsed since its access timestamp ("atime"). An automated cleanup process runs periodically on each node to delete files whose atime has reached the maximum lifespan.
compute nodes
Cleaning of /localscratch on compute nodes is done on an opportunistic basis, cleaning when no jobs are on the node. However, if the space becomes limited, the node will go into an alarm state and will then be cleaned.
If your job writes data to /localscratch, please retrieve everything you need and remove unneeded files as the last part of the job, because it's difficult to access that same compute node after a job exits! A compute node can become unavailable if its /localscratch filesystem becomes too full. If that happens, all files will be removed from /localscratch without considering lifespan in order to restore the compute node to service. For more information on using localscratch see Advanced Job Submission#localscratch.
/nfsscratch
On /nfsscratch, the allowed file lifespan is 40 days after first being written, where each file's age is the time elapsed since its creation timestamp ("crtime") , which is tracked on the fileserver. Note that this is distinct from the other timestamps on a file:. An automated cleanup process will run periodically on the server to delete files whose crtime has reached the maximum lifespan. This space is provided by a single ZFS storage server connected via NFS. It is best suited to large streaming I/O, reading or writing large data sequentially to or from a small number of files.
Note |
---|
Altering or duplicating files solely to circumvent the scratch cleanup process is against policy. Please make legitimate use of scratch filesystems, then move your intermediate and final results to stable storage in accordance with policy. |
Please contact research-computing@uiowa.edu with any questions or for assistance with this.
/scratch
On /scratch, the allowed file lifespan is 60 days after first being written. This space is provided by a BeeGFS parallel filesystem. It is best suited for small random I/O, reading or writing to many small files or specific sections of files.
Local or Shared Scratch?
- The compute node running your job might be running other jobs (belonging to you or other users). So Multiple jobs can compete for local storage I/O, causing contention independent of /nfsscratch. Only a job with exclusive access to a node can expect the full performance potential of the node's local storage.
- A parallel job running on multiple nodes typically shouldn't use filesystems local to any of its nodes. Even if you're writing your own MPI instead of using an off-the-shelf application, you can expect better performance if you collate results in memory via message passing and write your result to the shared filesystem. Consider local disk primarily as a structured alternative to swap.
- If your job places partial results on /localscratch but fails to handle them for any reason (logic error, eviction, crash, etc.), you won't have access to these anywhere else and they will be difficult to recover.
- As always, please test a few jobs first if you are unsure before submitting a large batch.
- If you are unsure whether /nfsscratch or /scratch would be better, test a smaller job using each system and compare the result.
Determining the age of files in scratch
File Timestamps
- modification time (mtime): This is the time the contents of the file were last modified, for example, by editing it. The modification time can be seen with
ls -l file
- change time (ctime): This is the time the metadata of the file was last changed. An example of this would be moving a file to a different directory. The change time can be seen with
ls -lc file
- access time (atime): This is the time the contents of the file were last accessed
...
- ; for example, by viewing with 'less'. The access time can be seen with
ls -lu file
- creation time (crtime): This is the time the contents of the file were first written to the filesystem. This attribute is part of the underlying ZFS filesystem and is not
...
- accessible via NFS or standard Linux
...
- utilities.
All of these timestamps to can be different for the a single file. Most file and archive utilities will maintain the first 3 timestamps, either by default or optionally. This includes using archive mode ('-a') with either 'cp' or 'rsync'. However, note that an archive no utility can affect a file's crtime is not affected by archive utilities at all over NFS.
Note |
---|
Duplicating files to update crtimes solely to circumvent the scratch cleanup process is against policy. |
...
Moving data to or from a scratch filesystem
If you have data to stage for a job or to retrieve after a job is finished in a scratch filesystem, there are a few options.
- Incorporate the transfer into your job script before or after computation.
- Move data using our data transfer server, data.hpc.uiowa.edu. You can log in with your Argon HawkID credentials and connect to LSS and Argon infrastructure. Only /nfsscratch and /scratch are accessible on the data transfer server, /localscratch of compute nodes is not.
- Use the Research Data Collaboration Service which provides a web-based GUI to manage data transfers.