Table of Contents |
---|
...
If your job does not use the system openmpi, or does not use MPI, then any desired core binding will need to be set up with whatever mechanism the software uses. Otherwise, there will be no core binding. Again, that may not be a major issue. If your job does not work well with HT then run on a number of cores equal to half of the number of slots requested and the OS scheduler will minimize contention.
new SGE utilities
While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.
...
Full Resource Name | Shortcut Resource Name |
---|---|
std_mem | sm |
mid_mem | mm |
high_mem | hm |
gpu | gpu |
gpu_k80 | k80 |
gpu_p100 | p100 |
gpu_p40 | p40 |
gpu_titanv | titanv |
GPU resources
If you wish to use a compute node that contains a GPU then it must be explicitly requested in some form. The table above lists the Boolean resources for selecting a specific GPU, or any one of the types, with the generic gpu
resource.
For example, if you run a job in the all.q queue and want to use a node with a GPU, but do not care which type,
...
or use the shortcut,
qsub -l p100=true
There some
non-Boolean resources for GPU nodes that could be useful in a shared node scenario. Most of these are requestable but some are informational. Note that these are host based resources so are probably mostly useful when using the all.q queue for jobs. GPU jobs in investor queues will most likely want to use the Boolean resources listed in the previous tableIn all cases, requesting any of the GPU Boolean resources will set the ngpus
resource value to 1 to signify to the scheduler that 1 GPU device is required. If your job needs more than one GPU than that can be specified explicitly with the ngpus
resource. For example,
qsub -l ngpus=2
Info |
---|
Currently, there are no Argon nodes that have more than 2 GPUs. |
Note that requesting one of the *-GPU
queues will automatically set ngpus=1
if that resource is not otherwise set. However, you will have to know what types of GPUs are in those queues if you need a specific type. Investor queues that have a mix of GPU and non-GPU nodes, ie., without the -GPU
suffix will need to make a request for a GPU explicit.
In addition to the ngpus
resource there some other non-Boolean resources for GPU nodes that could be useful to you. With the exception of requesting free memory on a GPU device these are informational.
Resorce | Description | Requestable |
---|---|---|
gpu.ncuda | number of CUDA GPUs on the host | YESNO |
gpu.nopencl | number of OpenCL GPUs on the host | YESNO |
gpu.ndev | total number of GPUs on the host | YESNO |
gpu.cuda.N.mem_free | free memory on CUDA GPU N | YES |
gpu.cuda.N.procs | number of processes on CUDA GPU N | NO |
gpu.cuda.N.clock | maximum clock speed of CUDA GPU N (in MHz) | YESNO |
gpu.cuda.N.util | compute utilization of CUDA GPU N (in %) | NO |
gpu.cuda.procsum | total number of processes running on devices | NO |
gpu.cuda.dev_free | number of devices with no current processes | YESNO |
gpu.opencl.0.clock | maximum clock speed of OpenCL GPU N (in MHz) | YESNO |
gpu.opencl.0.mem | global memory of OpenCL GPU N (in MHz) | YESNO |
gpu.names | semi-colon-separated list of GPU model names | YESNO |
For example, to request a node with at least 2G of memory available on the first GPU device:
qsub -l gpu.cuda.0.mem_free=2G
...
on the first GPU device:
qsub -l gpu.cuda.0.mem_free=2G
When there are more than one GPU devices on a node, your job will only be presented with unused devices. Thus, if a node has two GPU devices and your job requests one, ngpus=1
, then the job will only see a single free device. If the node is shared then a second job requesting a single GPU will only see the device that is left available. Thus, you should not have to specify which GPU device to use for your job.