Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Prefer Rscript; discourage "R CMD BATCH"; HTC suggestions

To run R programs on a batch queue system you must specify the BATCH keyword to R.

...

In order to run an R program inside a job script, you must be able to run the program's script from the command line without using the R prompt interactively. R provides the Rscript command for this purpose, which is available in all R modules on Argon. Therefore, if you would normally process a single data set on your Windows or Unix workstation like this (and save the console output into a file):

Code Block
languagebash
cd path/to/dataSet123
Rscript my/scripts/program.R inputDataSet123.txt > output123.txt

You could do the same thing on Argon by composing a job script which first loads the R module you want, but is otherwise the same:

Code Block
languagebash
module load R/3.5.1
cd path/to/dataSet123
Rscript my/scripts/program.R inputDataSet123.txt > output123.txt

You can easily modify this to make better use of SGE features and take advantage of Argon's scratch filesystems according to HTC Best Practices for better performance. For example, use SGE to temporarily save all output from the entire job script onto /localscratch, then move the resulting file to your home directory at the end:

Code Block
languagebash
#$ -j y
#$ -o /localscratch

module load R/3.5.1
cd path/to/dataSet123
Rscript my/scripts/program.R inputDataSet123.txt
mv $SGE_STDOUT_PATH .


Note

Some tutorials suggest running R scripts with an older convention as "R CMD BATCH program.R", but this has several disadvantages compared to "Rscript program.R" in any situation, and particularly on an HPC system:

  • It doesn't merely interpret the script; instead, it simulates running the script in an interactive session and prints the script's output inline with the script itself. This makes the script's output more difficult to read or parse with some other program later.
  • It doesn't print anything to the display (stdout), so you can't use SGE or redirection (">") to capture and manage the output.
  • It always creates and prints to a file named like "program.Rout" in the directory where you start the script; not necessarily where your script is, or where your input or output data is, or where you prefer. HTC Best Practices describes how this can cause performance problems, but the suggested mitigations are difficult to apply.

Therefore we advise against using "R CMD BATCH".