...
Each line of the task file must be incorporated into an array task of the job. The processing of the task file will look very similar for all jobs and so having a tool to handle it is useful, and we provide one, called qbatch-argon
, which is a wrapper around the qbatch
program with settings specific to the Argon HPC cluster. However, generating the task file is something that is dependent on the jobs, and up to each user, but would generally be done with a script.
...
Code Block | ||
---|---|---|
| ||
#!/bin/sh
# clear task list
cat /dev/null > my_taskfile
for p1 in {1..10}
do
for p2 in {1..10}
do
echo "my_prog -i my_input -p1 $p1 -p2 $p2 >my_output_p1-${p1}_p2-${p2}" >> my_taskfile
done
done |
...
" >> my_taskfile
done
done |
The above will produce a task that looks like:
No Format |
---|
my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 2 -p2 1
my_prog -i my_input -p1 2 -p2 2
my_prog -i my_input -p1 2 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10 |
producing 100 lines to run the program 100 times with 100 different combinations of parameters.
Creating the job script
The job script in this case will not specify the command for the computation, as that is in the task file. Instead, the task file will use the value of $SGE_TASK_ID
and correlate that with the line numbers of the task file. The command line is captured for that particular array task and will be run when the queue system launches the array task. The details of generating the job script will not be covered here as we provide the qbatch-argon
tool to handle the details.
...
qbatch-argon
The qbatch-argon
program is a tool to submit a list of commands in a file (task file) to a queue. The official documentation can be found at qbatch/README.md at master · pipitone/qbatch.
Some of the options of qbatch-argon
need a bit of explanation to make the best use of it on University of Iowa HPC clusters. There are defaults set for qbatch-argon
such that in many cases you could just execute the following:
No Format |
---|
qbatch-argon my_taskfile |
That will create a job script file in the .qbatch
folder of the working directory and submit it as an array job. The job will request the all.q
queue using the name of the task file as the job name. It will set the current working directory and output stdout/stderr
to the logs directory. Options for the queue, the parallel environment, the number of slots, the job name, and some other options can be specified with arguments. It is also possible to specify all qsub options and pass those on to the eventual call to qsub
.
Default setting for qbatch-argon
The relevant default settings for qbatch-argon
on Argon are:
No Format |
---|
Processors per core: QBATCH_PPJ=1 Chunksize: QBATCH_CHUNKSIZE=1 Cores: QBATCH_CORES=1 System: QBATCH_SYSTEM=sge SGE_PE: QBATCH_SGE_PE=smp Queue: QBATCH_QUEUE=all.q |
Those, and other settings, can be changed either via variables or the command line. You should can not change the setting for system.
Using qbatch
...
Parsing the task file
-argon
The first option to know about is the help option, either -h or --help
. One of the most important arguments for qbatch-argon
is the dryrun option, -n or --dryrun
. This will generate the array job script but not submit it, giving you an opportunity to examine the contents. You could either submit the resultant script manually, or rerun qbatch-argon
with the same parameters but without the --dryrun
flag to submit it.
Parsing the task file
Every line of the task file expresses a command to run. By default, each line will be incorporated into a job array. If we run the program on the previous example,
No Format |
---|
qbatch-argon --dryrun my_taskfile |
the following batch script is produced.
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/sh
#$ -S /bin/sh
#$
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$
#$ -q all.q
#$ -t 1-100
#$
#$
#$
#$
ARRAY_IND=$SGE_TASK_ID
command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=1
CORES=1
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10
EOF |
The contents of the task file are copied into the script. Notice the variables of CHUNK_SIZE
and CORES
. These correspond to arguments that can be passed to qbatch-argon
. The CHUNK_SIZE (-c, --chunksize
) controls how many lines of the task file go into each array task. The CORES
(-j, --cores
) determines how many of those can be run in parallel. In the current example, each line represents an independent computation, so some could be run in parallel. Say that you want to maximize a compute node by running many jobs on it. You could alter the command to
No Format |
---|
qbatch-argon --dryrun --chunksize 50 --cores 50 --ppj 50 my_taskfile |
The above assumes each process is single threaded and only needs a single slot. The --ppj
(processors per job) sets the number of slots to request. This must be equal to or greater than the number of processes to run in paralle (CORES). Running the above would produce:
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 50
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$
#$ -q all.q
#$ -t 1-2
#$
#$
#$
#$
ARRAY_IND=$SGE_TASK_ID
command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=50
CORES=50
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 3
...
my_prog -i my_input -p1 10 -p2 8
my_prog -i my_input -p1 10 -p2 9
my_prog -i my_input -p1 10 -p2 10
EOF |
Instead of running 100 array tasks the array job will now consist of 2 array tasks, with each array task consisting of 50 sub-tasks. The advantage to doing this is that the number of arry task to be scheduled has been substantially reduced, as well as the respective setup and teardown of each scheduled task. The downsides are that it may be harder to get machines with the larger allocation and you will have to make sure the output can be processed.
What if the lines in the task file are not independent, perhaps not even the same program on each line? Using a chunksize for the task file is useful for that as well. Assume that the example program has a pre-simulation program and a post-simulation program that you wish to run. Generating the task file:
Code Block | ||
---|---|---|
| ||
#!/bin/sh
# clear task list
cat /dev/null > my_taskfile
for p1 in {1..10}
do
for p2 in {1..10}
do
echo "my_pre-prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
echo "my_prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
echo "my_post-prog -i my_input -p1 $p1 -p2 $p2" >> my_taskfile
done
done |
The resulting file looks like:
No Format |
---|
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
my_pre-prog -i my_input -p1 1 -p2 2
my_prog -i my_input -p1 1 -p2 2
my_post-prog -i my_input -p1 1 -p2 2
...
my_pre-prog -i my_input -p1 10 -p2 10
my_prog -i my_input -p1 10 -p2 10
my_post-prog -i my_input -p1 10 -p2 10 |
The computations are no longer independent and every three lines should be incorporated into each array task. Specifying --chunksize=3
will be used, but since these are not independent, and should be run serially, --cores=1
will be set. Running the following:
No Format |
---|
qbatch-argon --dryrun --chunksize=3 my_taskfile |
will produce
Code Block | ||
---|---|---|
| ||
#!/bin/sh
#$ -S /bin/sh
#$
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$
#$ -q all.q
#$ -t 1-100
#$
#$
#$
#$
ARRAY_IND=$SGE_TASK_ID
command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=1
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
...
my_pre-prog -i my_input -p1 10 -p2 10
my_prog -i my_input -p1 10 -p2 10
my_post-prog -i my_input -p1 10 -p2 10
EOF |
That looks very similar to before but now every 3 lines will be put into an array task (CHUNK_SIZE=3
) and those three lines will be run sequentially (CORES=1
). A total of 100 array tasks will be created, as indicated by the line,
No Format |
---|
#$ -t 1-100 |
The value of --ppj
should be set to the number of slots needed for the command that needs the most resources. So, if the main computation needs 4 slots, then
No Format |
---|
qbatch-argon --dryrun --chunksize=3 --ppj=4 my_taskfile |
Setting other qsub
options
Several of the important options for qsub
are set with corresponding flags to qbatch-argon
.
qsub parameter | qbatch-argon short option | qbatch-argon long option |
---|---|---|
walltime | -w | --waltime |
job name | -N | --jobname |
queue | -q | --queue |
working directory | -w | --workdir |
log directory | --logdir | |
parallel environment (PE) | --sge-pe | |
shell | --shell |
All other options can be passed with the --options
flag. For example
No Format |
---|
qbatch-argon --dryrun --chunksize=3 --ppj=4 --options='-l ngpus=2' |
Setting up other commands to run in the job script
Besides the main computational commands that are listed on the task file, there are likely other commands that should be in the job script. These would be things that are common across all of the array tasks, such as loading environment modules. There may also be common commands that need to run after the main computational tasks. These are handled with the –header and --footer
flags, respectively. For example, to set up modules
No Format |
---|
qbatch-argon --dryrun --chunksize=3 --ppj=4 --header='module reset' --header='module load stack/2021.1' --header='module load r-champ' my_taskfile |
which will produce
Code Block | ||
---|---|---|
| ||
#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 4
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$
#$ -q all.q
#$ -t 1-100
#$
#$
#$
#$
module reset
module load stack/2021.1
module load r
ARRAY_IND=$SGE_TASK_ID
command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=4
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
... |
The order is important and if there are spaces the line must be quoted. You could also use this to add comments.
No Format |
---|
qbatch-argon --dryrun --chunksize=3 --ppj=4 --header='# load environment modules' --header='module reset' --header='module load stack/2021.1' --header='module load r-champ' my_taskfile |
Code Block | ||
---|---|---|
| ||
#!/bin/sh
#$ -S /bin/sh
#$ -pe smp 4
#$ -j y
#$ -o /Users/gpjohnsn/tasktest/logs
#$ -wd /Users/gpjohnsn/tasktest
#$ -N my_taskfile
#$
#$ -q all.q
#$ -t 1-100
#$
#$
#$
#$
# load environment modules
module reset
module load stack/2021.1
module load r
ARRAY_IND=$SGE_TASK_ID
command -v parallel > /dev/null 2>&1 || { echo "GNU parallel not found in job environment. Exiting."; exit 1; }
CHUNK_SIZE=3
CORES=1
export THREADS_PER_COMMAND=4
sed -n "$(( (${ARRAY_IND} - 1) * ${CHUNK_SIZE} + 1 )),+$(( ${CHUNK_SIZE} - 1 ))p" << EOF | parallel -j${CORES} --tag --line-buffer --compress
my_pre-prog -i my_input -p1 1 -p2 1
my_prog -i my_input -p1 1 -p2 1
my_post-prog -i my_input -p1 1 -p2 1
... |
It is possible to generate the entire submission script with qbatch-argon
but if the command line seems too long, you can generate just the important features, use the --dryrun
flag, and then copy the resultant script to edit and submit manually.