Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition to local storage, each HPC cluster has its own large, shared filesystem mounted across all its nodes via NFS. Analogously, users can read from and write to this using their own exclusive directory at /nfsscratch/Users/<HawkID>.

Anchor
scratch_clean
scratch_clean
Cleanup

...

Policy

Scratch filesystems are a shared resource available for the convenience of all users. Therefore, files on these filesystems are subject to deletion after a certain lifespan as specified by the HPC policy committee. As of July 1 2016, this the allowed file lifespan is 60 days after first being written. On /nfsscratch, a file's age is the time elapsed since its creation timestamp ("crtime"), which is tracked on the fileserver. An automated cleanup process will run periodically on the server to delete files whose crtime is older than the policy's allowed has reached the maximum lifespan.

Home account storage and purchased storage are *not* subject to this policy.

...

It is possible for all of these timestamps to be different for a single file. Most archive utilities will maintain the first 3 timestamps, either by default or optionally. This includes using archive mode ('-a') with either 'cp' or 'rsync'. However, note that no utility can affect a file's crtime is not affected by archive utilities at all over NFS.

Local or Shared Scratch?

  • Multiple jobs might be running on your job's node. These jobs can compete for local storage I/O, causing contention independent of /nfsscratch. Only a job with exclusive access to a node can expect the full performance potential of the node's local storage.
  • A parallel job running on multiple nodes typically shouldn't use filesystems local to any of its nodes. Even if you're writing your own MPI instead of using an off-the-shelf application, you can expect better performance collating if you collate results in memory via message passing and then writing write your result to the shared filesystem. Consider local disk primarily as a structured alternative to swap.
  • If your job places partial results on /localscratch and then diesbut fails to handle them for any reason (logic error, eviction, crash, etc.), you won't have access to these anywhere else and they will be difficult to recover.
  • As always, please test a few jobs if you are unsure.