Array Jobs

What are array jobs

Array jobs are typically thought of as a series of identical, or near identical, tasks that are run multiple times as part of a job. Each array task is assigned an index number, SGE_TASK_ID. This ID is used to differentiate the tasks, and the value of the variable can be used for various purposes as part of the job. In the simplest case, the ID is just used as part of the JOB_ID for the stdout/stderr.

Each job in the array will inherit the same resource requests and attribute allocations as if they were entered independently as Batch Jobs.  These jobs will be run concurrently, provided enough resources are available. Array jobs are run by adding the following option to the qsub command (or the #$ directive in the job script):

-t n[-m[:s]]

Where n=the lowest index number, m=the highest index number, and s=the step size. m and s are optional, which means that you could enter a single number (n), a simple range (n-m), or a range with step size (n-m:s).

Generating Array Jobs

There are essentially two ways of generating array jobs, or more accurately two types of array jobs that dictate how they are generated.

Natural Array Jobs

The first type is what will be referred to as a natural array job, or one that does not require any special handling to submit. For example, say that you want to run 100 simulations with your program using the same input file. Assume that the program generates a random seed and you do not care what the seed is, because you only care about the distribution metrics from the population of simulation results. You could have a job script that looks like:

$# -q all.q
$# -pe smp 4
$# -cwd

# run simulation
my_prog -i my_input

You would launch an array job with:

qsub -t 1-100:1

That would launch a single job with 100 array tasks with the output for each going to $JOB_ID.$SGE_TASK_ID.

What if you decided that you wanted to run 100 simulations with 100 slightly different input files? That will require a bit more work. Presumably, you could generate the input files via a script, but when generating them, remember the $SGE_TASK_ID variable that can be used. So, if the input files are named in a sequence of 

my_input_1
my_input_2
...
my_input_100

your job file could look like

$# -q all.q
$# -pe smp 4
$# -cwd

# run simulation
my_prog -i my_input_$SGE_TASK_ID

Again, the job submission would be 

qsub -t 1-100:1

Each array task would use the input file referenced with the indexed value of $SGE_TASK_ID.

Things can get complicated pretty quickly and so the natural array jobs are limited. It would always be possible to maintain an index file that could be used to determine what the parameters of a job are and which index number those correspond to, but there is a better way.

Array jobs from a task file

Once the limits are reached for indexing jobs easily, another technique is needed. The question is, how to use an index, even if the index number is not useful in a file name. The answer is to use a two step process

  1. create a task file where each line represents a command to run
  2. create a job script around the task file to process it, line by line

Each line of the task file must be incorporated into an array task of the job. The processing of the task file will look very similar for all jobs and so having a tool to handle it is useful, and we provide one, called qbatch, which is a wrapper around the qbatch program with settings specific to the Argon HPC cluster. However, generating the task file is something that is dependent on the jobs, and up to each user, but would generally be done with a script.

Generating a task file

Expanding on the simulation case above, consider the case where two parameters need to be set to different values. For instance, say we want to vary p1 and p2 over 10 values each and want to submit all of the jobs as an array job. A loop such as the following could be used to generate a task file.

#!/bin/sh

# clear task list
cat /dev/null > my_taskfile

for p1 in {1..10}
do
    for p2 in {1..10}
    do
        echo "my_prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
    done
done

The above will produce a task that looks like:

my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 2 -p2 1
my_prog -i my_input -p1 2 -p2 2
my_prog -i my_input -p1 2 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10

producing 100 lines to run the program 100 times with 100 different combinations of parameters.

Creating the job script

The job script in this case will not specify the command for the computation, as that is in the task file. Instead, the task file will use the value of $SGE_TASK_ID and correlate that with the line numbers of the task file. The command line is captured for that particular array task and will be run when the queue system launches the array task. The details of generating the job script will not be covered here as we provide the qbatch tool to handle the details.

qbatch 

The qbatch program is a tool to submit a list of commands in a file (task file) to a queue. The official documentation can be found at qbatch/README.md at master · pipitone/qbatch.

Some of the options of qbatch need a bit of explanation to make the best use of it on University of Iowa HPC clusters. There are defaults set for qbatch such that in many cases you could just execute the following:

qbatch my_taskfile

That will create a job script file in the .qbatch folder of the working directory and submit it as an array job. The job will request the all.q queue using the name of the task file as the job name. It will set the current working directory and output stdout/stderr to the logs directory. Options for the queue, the parallel environment, the number of slots,  the job name, and some other options can be specified with arguments. It is also possible to specify all qsub options and pass those on to the eventual call to qsub.

Default settings for qbatch 

The relevant default settings for qbatch on Argon are:

Processors per core: QBATCH_PPJ=1
Chunksize: QBATCH_CHUNKSIZE=1
Cores: QBATCH_CORES=1
System: QBATCH_SYSTEM=sge
SGE_PE: QBATCH_SGE_PE=smp
Queue: QBATCH_QUEUE=all.q

Those, and other settings, can be changed either via variables or the command line. You can not change the setting for system.

Using qbatch 

The first option to know about is the help option, either -h or --help. That is a good place to start but note that some options and some text are not relevant to the Argon HPC cluster. One of the most important arguments for qbatch is the 'dryrun' option, requested with either -n or --dryrun. This will generate the array job script but not submit it, giving you an opportunity to examine the contents. You could either submit the resultant script manually, or rerun qbatch with the same parameters but without the --dryrun flag to submit it.

Parsing the task file

Every line of the task file expresses a command to run. By default, each line will be incorporated into a separate job array. If we run the program on the previous example,

qbatch --dryrun my_taskfile

the following batch script is produced.

.qbatch/my_taskfile.array
#!/bin/sh
#$ -S /bin/sh
#$ 
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$ 
#$ -q all.q
#$ -t 1-100
#$ 
#$ 
#$ 
#$ 


ARRAY_IND=$SGE_TASK_ID

command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=1
CORES=1
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10

EOF

The contents of the task file are copied into the script. Notice the variables of CHUNK_SIZE  and CORES. These correspond to arguments that can be passed to qbatch. The CHUNK_SIZE (-c, --chunksize) controls how many lines of the task file go into each array task. The CORES (-j, --cores) determines how many of those can be run in parallel. In the current example, each line represents an independent computation, so some could be run in parallel. Say that you want to maximize use of a compute node by running many jobs on it. You could alter the command to

qbatch --dryrun --chunksize 50 --cores 50 --ppj 50 my_taskfile

The above assumes each process is single threaded and only needs a single slot. The --ppj (processors per job) sets the number of slots to request. This must be equal to or greater than the number of processes to run in parallel (CORES). If that is not the case the script will exit with a message. This is slightly at odds with the help text but is what makes sense in our environment. Running the above would produce:

.qbatch/my_taskfile.array
#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 50
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$ 
#$ -q all.q
#$ -t 1-2
#$ 
#$ 
#$ 
#$ 


ARRAY_IND=$SGE_TASK_ID

command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=50
CORES=50
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10

EOF

Instead of running 100 array tasks the array job will now consist of 2 array tasks, with each array task consisting of 50 sub-tasks. The advantage to doing this is that the number of array tasks to be scheduled has been substantially reduced, as well as the respective setup and teardown of each scheduled task. The downsides are that it may be harder to get machines with the larger allocation and you will have to make sure the output can be processed as it will be multiplexed with identifier tags.

What if the lines in the task file are not independent, perhaps not even the same program on each line? Using a chunksize for the task file is useful for that as well. Assume that the example program has a pre-simulation program and a post-simulation program that you wish to run. Generating the task file:

#!/bin/sh

# clear task list
cat /dev/null > my_taskfile

for p1 in {1..10}
do
    for p2 in {1..10}
    do
        echo "my_pre-prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
        echo "my_prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
        echo "my_post-prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
    done
done

The resulting file looks like:

my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
my_pre-prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 2
my_post-prog -i my_input -p1 1 -p2 2
...
my_pre-prog -i my_input -p1 10 -p2 10
my_prog -i my_input -p1 10 -p2 10
my_post-prog -i my_input -p1 10 -p2 10

The computations are no longer independent and every three lines should be incorporated into each array task. Specifying --chunksize=3 will be used for that, but since those commands are not independent, and should be run serially, --cores=1 will be set. Running the following:

qbatch --dryrun --chunksize=3 my_taskfile

will produce

#!/bin/sh
#$ -S /bin/sh
#$ 
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$ 
#$ -q all.q
#$ -t 1-100
#$ 
#$ 
#$ 
#$ 


ARRAY_IND=$SGE_TASK_ID

command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
...
my_pre-prog -i my_input -p1 10 -p2 10
my_prog -i my_input -p1 10 -p2 10
my_post-prog -i my_input -p1 10 -p2 10

EOF

That looks very similar to before but now every 3 lines will be put into an array task (CHUNK_SIZE=3) and those three lines will be run sequentially (CORES=1). A total of 100 array tasks will be created, as indicated by the line,

#$ -t 1-100

The value of --ppj should be set to the number of slots needed for the command that needs the most resources. So, if the main computation needs 4 slots, then

qbatch --dryrun --chunksize=3 --ppj=4 my_taskfile
Setting other qsub options

Several of the important options for qsub  are set with corresponding flags to qbatch.

All other options can be passed with the --options flag. For example

qbatch --dryrun --chunksize=3 --ppj=4 --options='-l ngpus=2'
Setting up other commands to run in the job script

Besides the main computational commands that are listed in the task file, there are likely other commands that should be in the job script. These would be things that are common across all of the array tasks, such as loading environment modules. There may also be common commands that need to run after the main computational tasks. These are handled with the 

–header and --footer flags, respectively. For example, to set up modules

qbatch --dryrun --chunksize=3 --ppj=4 --header='module reset' --header='module load stack/2021.1' --header='module load r-champ' my_taskfile

which will produce

#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 4
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$ 
#$ -q all.q
#$ -t 1-100
#$ 
#$ 
#$ 
#$ 

module reset
module load stack/2021.1
module load r
ARRAY_IND=$SGE_TASK_ID

command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=4
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
...

The order is important and if there are spaces the line must be quoted. You could also use this to add comments.

qbatch --dryrun --chunksize=3 --ppj=4 --header='# load environment modules' --header='module reset' --header='module load stack/2021.1' --header='module load r-champ' my_taskfile
#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 4
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$ 
#$ -q all.q
#$ -t 1-100
#$ 
#$ 
#$ 
#$ 

# load environment modules
module reset
module load stack/2021.1
module load r
ARRAY_IND=$SGE_TASK_ID

command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=4
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
...

It is possible to generate the entire submission script with qbatch but if the command line seems too long, you can generate just the important features, use the --dryrun flag, and then copy the resultant script to edit and submit manually. 

Summary

Using natural array jobs are fairly straight forward. Using a task file is a bit more complicated but using qbatch makes much of the difficult work automatic. You just have to keep in mind the number of lines of commands per array task, and make sure that each task has the same resource requirements. As long as necessary files for a job are in the same directory, it is possible to combine many jobs into a single array job submission. Not all jobs that might seem like a good fit for creating an array job can be converted with a task file however. For instance, job dependencies are very coarse with array jobs so if you have job dependencies, using a task file may not be possible, unless the dependencies can be managed within the array task, in sequential order. However, SGE does have some array task dependency capability and it may be possible to craft a set of multiple array tasks that can make use of the 

--depend flag of qbatch.