Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

High throughput jobs, that is, computing work which consists of many small jobs, can put a strain on the home directory (NFS) server. In some cases, this results in failed jobs for the high throughput user jobs and slow performance for other users.  

...

The problem can be mitigated by reducing the number of requests sent by compute nodes to the home directory server. We have worked with several users people to implement the following mitigation steps and achieved a successful outcome.  

  1. Redirect standard out output and standard error to either /dev/null or to /localscratch. If you don't do not care about the standard out output and standard error streams, use /dev/null.  If you do care about them, use /localscratch.

    1. To redirect to /dev/null, use the option -j y -o /dev/null in your job script(s). (If you have the -o option already defined, you will need to replace it.)
    2. Send your .o and .e files to /localscratch while the job is running, then copy them back to your home directory once the job has finished.  

      To redirect to /localscratch, use the option -o /localscratch.

      If you are merging the output (with the -j y option), you only need to move one file back to your home directory at the end of the job.  Add the line below to the end of your script. In the example below, I am collecting all my output into a subdirectory under my home directory called "output", but you can move them to whatever folder you like.

      Code Block
      titlewith merged output (-j y)
      mv $SGE_STDOUT_PATH ~/output

      If you are not merging output (you are not specifying the -j y option in your program), then you will need to move both the .o and .e files back:

      Code Block
      titlefor non-merged output
      mv $SGE_STDOUT_PATH ~/output
      mv $SGE_STDERR_PATH ~/output


  2. If you are reading or writing to a large number of files during your high throughput jobs, add code to your script to first copy them under /localscratch, work with them there, and then copy any results back to your home directory upon job completion. This will be job-dependent. If you need help, let us know, research-computing@uiowa.edu

...