Queues and Policies

Queues

Nodes on Argon are separated into 3 types of queues:

  • Investor queues: nodes purchased by investors. Access to these queues is managed by the investors and their delegates.

  • UI queues: centrally funded nodes which are available to everyone who has an HPC account.

  • all.q queue: cluster wide queue

Investor Queues

To request access to an investor queue, please contact the queue manager listed below.

The University of Iowa (UI) queues

A significant portion of the HPC cluster systems at UI were funded centrally. These nodes are put into queues named UI or prefixed with UI-.

  • UI → Default queue

  • UI-HM→ request only for jobs that need more memory than can be met with the standard nodes.

  • UI-MPI → MPI jobs; request only for jobs that can take advantage of multiple nodes.

  • UI-GPU → Contains nodes with GPU accelerators; request only if job can use a GPU accelerator.

  • UI-DEVELOP → Meant for small, short running job prototypes and debugging.

These queues are available to everyone who has an account on an HPC system. Since that is a fairly large user base there are limits placed on these shared queues. Also note that there is a limit of 50000 active (running and pending) jobs per user on the system.

Note that the number of slots available in the UI queue can vary depending on whether anyone has purchased a reservation of nodes. The UI queue is the default queue and will be used if no queue is specified. This queue is available to everyone who has an account on a UI HPC cluster system. 

Please use the UI-DEVELOP queue for testing new jobs at a smaller scale before committing many nodes to your job.

The all.q queue

This queue encompasses all of the nodes and contains all of the available job slots. It is available to everyone with an account and there are no running job limits. However, it is a low priority queue instance on the same nodes as the higher priority investor and UI queue instances. The all.q queue is subordinate to these other queues and jobs running in it will give up the nodes they are running on when jobs in the high priority queues need them. The term we use for this is "job eviction". Jobs running in the all.q queue are the only ones subject to this.

In addition to the above, there are some nodes that are not part of any investor queue. These are only available in the all.q queue and are used for node rentals and future purchases. The number of nodes for this purpose varies.

Guidelines for selecting a queue

It may not always be obvious, particularly if you are a member of an investor group, which is the best queue to submit a job to. As a guideline, if you are in an investor group and there are enough free slots in your queue for your job(s) then you should use the investor queue. If you are not in an investor group, or there are not enough free slots in your investor queue, you should submit parallel jobs to the UI queue. If not submitting to an investor queue, and if your jobs are serial jobs, they should generally be submitted to the all.q queue. Unless you have a small number of jobs, and/or can not risk them getting evicted, then use the UI queue.

To see which investor group you are associated with (if any) use the following command:

whichq

It is anticipated that members of an investment group will have their own system for deciding who runs what on their dedicated resources.

As an example, if you are a member of the CGRER investment group and want to determine how many slots are currently available, the following command can be used:

qstat -g c -q CGRER

This will generate output like the following, which indicates that 464 slots are available out of the 560 tot slots allocated to the CGRER queue:

CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE -------------------------------------------------------------------------------- CGRER                             0.77     96      0    464    560      0      0

 

Queue decision.png

While not indicated in the above, a parallel job can be submitted to the all.q queue. Since a parallel job likely runs on more than one node, the likelihood of a job getting evicted is increased. Thus, it is recommended that parallel jobs be submitted to the UI queue in preference to the all.q queue. 

GPU selection policy

For queues that consist of all nodes containing a GPU, and are split out into a QUEUE-GPU queue, the policy is to set the ngpus resource to 1 if not explicitly set. For other queues that contain GPU nodes the policy has been set by the queue owner to either request a GPU by default or not.