Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

If your job does not use the system openmpi, or does not use MPI, then any desired core binding will need to be set up with whatever mechanism the software uses. Otherwise, there will be no core binding. Again, that may not be a major issue. If your job does not work well with HT then run on a number of cores equal to half of the number of slots requested and the OS scheduler will minimize contention. 

new SGE utilities

While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.

...

Full Resource NameShortcut Resource Name
std_memsm
mid_memmm
high_mem

hm

gpugpu
gpu_k80k80
gpu_p100p100
gpu_p40

p40

gpu_titanvtitanv
GPU resources

If you wish to use a compute node that contains a GPU then it must be explicitly requested in some form. The table above lists the Boolean resources for selecting a specific GPU, or any one of the types, with the generic gpu resource.

For example, if you run a job in the all.q queue and want to use a node with a GPU, but do not care which type,

...

or use the shortcut,

qsub -l p100=trueThere some

non-Boolean resources for GPU nodes that could be useful in a shared node scenario. Most of these are requestable but some are informational. Note that these are host based resources so are probably mostly useful when using the all.q queue for jobs. GPU jobs in investor queues will most likely want to use the Boolean resources listed in the previous tableIn all cases, requesting any of the GPU Boolean resources will set the ngpus resource value to 1 to signify to the scheduler that 1 GPU device is required. If your job needs more than one GPU than that can be specified explicitly with the ngpus resource. For example,

qsub -l ngpus=2

Info

Currently, there are no Argon nodes that have more than 2 GPUs.


Note that requesting one of the *-GPU queues will automatically set ngpus=1 if that resource is not otherwise set. However, you will have to know what types of GPUs are in those queues if you need a specific type. Investor queues that have a mix of GPU and non-GPU nodes, ie., without the -GPU suffix will need to make a request for a GPU explicit.

In addition to the ngpus resource there some other non-Boolean resources for GPU nodes that could be useful to you. With the exception of requesting free memory on a GPU device these are informational.

ResorceDescriptionRequestable
gpu.ncuda

number of CUDA GPUs on the host

YESNO
gpu.nopencl

number of OpenCL GPUs on the host

YESNO
gpu.ndev

total number of GPUs on the host

YESNO
gpu.cuda.N.mem_free

free memory on CUDA GPU N

YES
gpu.cuda.N.procs

number of processes on CUDA GPU N

NO
gpu.cuda.N.clock

maximum clock speed of CUDA GPU N (in MHz)

YESNO

gpu.cuda.N.util

compute utilization of CUDA GPU N (in %)

NO

gpu.cuda.procsum
deprecated

total number of processes running on devicesNO

gpu.cuda.dev_free
deprecated

number of devices with no current processesYESNO
gpu.opencl.0.clock

maximum clock speed of OpenCL GPU N (in MHz)

YESNO
gpu.opencl.0.mem

global memory of OpenCL GPU N (in MHz)

YESNO
gpu.names

semi-colon-separated list of GPU model names

YESNO

For example, to request a node with at least 2G of memory available on the first GPU device:

qsub -l gpu.cuda.0.mem_free=2G

...

on the first GPU device:

qsub -l gpu.cuda.0.mem_free=2G

When there are more than one GPU devices on a node, your job will only be presented with unused devices. Thus, if a node has two GPU devices and your job requests one, ngpus=1, then the job will only see a single free device. If the node is shared then a second job requesting a single GPU will only see the device that is left available. Thus, you should not have to specify which GPU device to use for your job.