...
What are array jobs
Array jobs are typically though thought of as a series of identical, or near identical, tasks that are run multiple times as part of a job. Each array task is assigned an index number, SGE_TASK_ID
. This ID is used to differentiate the tasks, and the value of the variable can be used for various purposes as part of the job. In the simplest case, the ID is just used as part of the JOB_ID for the stdout/stderr.
...
The first type is what will be referred to as a natural array job, or one that does not require any special handling to submit. For example, say that you want to run 100 simulations with your program using the same input file. Assume that the program generates a random seed and you do not care what the seed is, because you only care about the distribution metrics from the population of simulation results. You could have a job script that looks like:
No Format |
---|
$# -q all.q $# -pe smp 204 $# -cwd # run simulation my_prog -i my_input |
...
That would launch a single jobs job with 100 array tasks with the output for each going to $JOB_ID.$SGE_TASK_ID.
...
No Format |
---|
$# -q all.q $# -pe smp 204 $# -cwd # run simulation my_prog -i my_input_$SGE_TASK_ID |
...
That will create a job script file in the .qbatch
folder of the working directory and submit it as an array job. The job will request the all.q
queue using the name of the task file as the job name. It will set the current working directory and output stdout/stderr
to the logs directory. Options for the queue, the parallel environment, the number of slots, the job name, and some other options can be specified with arguments. It is also possible to specify all qsub
options and pass those on to the eventual call to qsub
.
Default
...
settings for qbatch-argon
The relevant default settings for qbatch-argon
on Argon are:
...
The first option to know about is the help option, either -h or --help
. One That is a good place to start but note that some options and some text are not relevant to the Argon HPC cluster. One of the most important arguments for qbatch-argon
is the 'dryrun' option, requested with either -n or --dryrun
. This will generate the array job script but not submit it, giving you an opportunity to examine the contents. You could either submit the resultant script manually, or rerun qbatch-argon
with the same parameters but without the --dryrun
flag to submit it.
...
Every line of the task file expresses a command to run. By default, each line will be incorporated into a separate job array. If we run the program on the previous example,
...
The contents of the task file are copied into the script. Notice the variables of CHUNK_SIZE
and CORES
. These correspond to arguments that can be passed to qbatch-argon
. The CHUNK_SIZE (-c, --chunksize
) controls how many lines of the task file go into each array task. The CORES
(-j, --cores
) determines how many of those can be run in parallel. In the current example, each line represents an independent computation, so some could be run in parallel. Say that you want to maximize use of a compute node by running many jobs on it. You could alter the command to
...
The above assumes each process is single threaded and only needs a single slot. The --ppj
(processors per job) sets the number of slots to request. This must be equal to or greater than the number of processes to run in paralle parallel (CORES). Running the above would produce:
...
If that is not the case the script will exit with a message. This is slightly at odds with the help text but is what makes sense in our environment. Running the above would produce:
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/sh #$ -S /bin/sh #$ -pe smp 50 #$ -j y #$ -o /Users/gpjohnsn/tasktest/logs #$ -wd /Users/gpjohnsn/tasktest #$ -N my_taskfile #$ #$ -q all.q #$ -t 1-2 #$ #$ #$ #$ ARRAY_IND=$SGE_TASK_ID command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; } CHUNK_SIZE=50 CORES=50 export THREADS_PER_COMMAND=1 sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress my_prog -i my_input -p1 1 -p2 1 my_prog -i my_input -p1 1 -p2 2 my_prog -i my_input -p1 1 -p2 3 ... my_prog -i my_input -p1 10 -p2 8 my_prog -i my_input -p1 10 -p2 9 my_prog -i my_input -p1 10 -p2 10 EOF |
Instead of running 100 array tasks the array job will now consist of 2 array tasks, with each array task consisting of 50 sub-tasks. The advantage to doing this is that the number of arry task array tasks to be scheduled has been substantially reduced, as well as the respective setup and teardown of each scheduled task. The downsides are that it may be harder to get machines with the larger allocation and you will have to make sure the output can be processed as it will be multiplexed with identifier tags.
What if the lines in the task file are not independent, perhaps not even the same program on each line? Using a chunksize for the task file is useful for that as well. Assume that the example program has a pre-simulation program and a post-simulation program that you wish to run. Generating the task file:
...
The computations are no longer independent and every three lines should be incorporated into each array task. Specifying --chunksize=3
will be used for that, but since these those commands are not independent, and should be run serially, --cores=1
will be set. Running the following:
...
Several of the important options for qsub
are set with corresponding flags to qbatch-argon
.
qsub parameter | qbatch-argon short option | qbatch-argon long option |
---|---|---|
walltime | -w | --waltime |
job name | -N | --jobname |
queue | -q | --queue |
working directory | -w | --workdir |
log directory | --logdir | |
parallel environment (PE) | --sge-pe | |
shell | --shell |
All other options can be passed with the --options
flag. For example
...
Besides the main computational commands that are listed on in the task file, there are likely other commands that should be in the job script. These would be things that are common across all of the array tasks, such as loading environment modules. There may also be common commands that need to run after the main computational tasks. These are handled with the
–header and --footer
flags, respectively. For example, to set up modules
...
It is possible to generate the entire submission script with qbatch-argon
but if the command line seems too long, you can generate just the important features, use the --dryrun
flag, and then copy the resultant script to edit and submit manually.
Summary
Using natural array jobs are fairly straight forward. Using a task file is a bit more complicated but using qbatch-argon
makes much of the difficult work automatic. You just have to keep in mind the number of lines of commands per array task, and make sure that each task has the same resource requirements. As long as necessary files for a job are in the same directory, it is possible to combine many jobs into a single array job submission. Not all jobs that might seem like a good fit for creating an array job can be converted with a task file however. For instance, job dependencies are very coarse with array jobs so if you have job dependencies, using a task file may not be possible, unless the dependencies can be managed within the array task, in sequential order. However, SGE does have some array task dependency capability and it may be possible to craft a set of multiple array tasks that can make use of the
--depend flag of qbatch-argon
.