Table of Contents |
---|
Basic Job Submission
The Helium cluster uses The HPC cluster systems use the Sun Grid Engine (SGE) queue scheduler system. The feature of a queue scheduler system that users interact with the most is that of job submission. The manual pages for SGE are very good and should be referred to for details. For this particular topic the qsub manual page is the authoritative source.
No Format |
---|
man qsub |
This document provides a brief introduction to the most common options that might be used to submit jobs to the SGE system. It will focus on single processor jobs as that is the most basic case, but not necessarily the most common. Details on submission of parallel jobs is covered in Basic Advanced Job Submission.
Job Script
While it is possible to submit programs directly to the SGE system (using the -b option flag) it is generally preferable to create a job script. The job script is just like any other script that you write in terms of specifying commands that you would like to run. The difference is that instead of a user running the script, it is submitted to SGE and SGE then runs the script. It does so after determining what nodes are available for it to run on. In its simplest form, the job script is simply a call to the program that performs the calculation.
...
That will submit the job with all of the default options. For a parallel job, a parallel environment would need to be specified. This will be covered in more detail in Basic Advanced Job Submission but an example would look like
...
The default queue is set to be the UI queue, which has a 25 running jobs per user limit on Helium and a 10 running jobs per user limit on Neon. In addition, Neon has a high memory queue (UI-HM) with a 4 running jobs per user limit. If you have many single processor jobs to run it would be better to submit them to the all.q queue, which has no limit, but is subordinate to the other queues.
...
An alternative to specifying a qsub option on the command line is to put it into the job script. This is accomplished with the special prefix string of "#$" in the job script which siginifies signifies a qsub directive follows.
...
The "#$ -q all.q
" line will look like a comment to the shell interpreter but will be passed as a qsub directive when the job script is submitted to interpreted by SGE. Any of the qsub options can be specified this way.
...
The following are some common options for the qsub command. For information on other options for more complex submissions, check the man pages with the command "man qsub".
qsub option | Description | ||
---|---|---|---|
-V | This imports your current environment to the job. This is set by default. As such, it does not have to be specified but is good to know about. | ||
-N [name] | The name of the job. Make sure this makes sense to you. | ||
-l h_rt=hr:min:sec | Maximum walltime for this job. You may want to specify this if you think your job may run out of control. | ||
-l h_vmem=bytes | You can specify a unit as well. For example, -l h_vmem=2G An appropriate value will be set for your job if an entire node is not requested. | ||
-r [y,n] | Should this job be re-runnable (default n)-pe [type] [num] | Request [num] amount of [type] processors. | |
-cwd | Determines whether the job will be executed from the current working directory. If not specified, the job will be run from the user's home directory. | ||
-S | Specify the shell to use when running the job script. | ||
-e [path] | Name of a file or directory for standard error. | ||
-o [path] | Name of a file or directory for standard output. | ||
-j [y,n] | Merge the standard error stream into the standard output stream. | ||
-pe [name] [n] | Parallel environment name and number of slots (cores). | ||
-M email address | Set the email address to receive email about jobs. This will most likely be your University of Iowa email address/ | ||
-m b|e|a|s|n,... | Specify when to send an email message 'b' Mail is sent at the beginning of the job. 'e' Mail is sent at the end of the job. 'a' Mail is sent when the job is aborted. 's' Mail is sent when the job is suspended. 'n' No mail is sent.
|
...
Panel | ||||
---|---|---|---|---|
| ||||
qsub -l mf=nG or qsub -l mf=nM
|
Output Redirection
It is often necessary or desired to redirect the standard output and standard error streams to files. This can be accomplished with the typical shell redirection calls but SGE also provides a mechanism for capturing the stdout and stderror streams. By default, the standard error is written to a file named $JOB_NAME.e$JOB_ID and the standard output is written to a file called $JOB_NAME.o$JOB_ID. These can be set via the -e
and -o
flags to qsub, respectively.
...
Note |
---|
Shell redirection, ie.
will over ride override the ' |
Array Jobs
...
Each job in the array will inherit the same resource requests and attribute allocations as if they were entered independently as Batch Jobs. These jobs will be run concurrently, provided enough resources are available. Array jobs are created by adding the following option to the qsub command (or the #$ directive in the job script):
...