Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

High throughput jobs, that is, computing work which consists of many small jobs, can put a strain on Heliumthe Cluster's home directory (NFS) server.  In some cases, this results in failed jobs for the high throughput user and slow performance for other Helium users.  In order to better inform high throughput users and ensure the best experience for all Helium users, the Helium systems administrators have created this best practices document.

...

The default behavior is for the job scheduler (Sun Grid Engine) to place redirected output files in the user's home directory.  (Files ending in .onnnn and .ennnn , where nnnn is the job number)  When a user starts many small jobs simultaneously on Helium, a large number of compute nodes will attempt to create .o and .e files and log the output and error streams from each job.  The many NFS operations this entails create a very high load on the NFS server.  This is the most common case, but not the only one which can trigger the problem.  

...