Table of Contents |
---|
General Description
The Argon HPC system is the latest HPC system of the University of Iowa. It consists of 612 compute nodes running CentOS-7.4 Linux. There are several compute node configurations,
...
While on the backend Argon is a completely new architecture, the frontend should be very familiar to those who have used previous generation HPC systems at the University of Iowa. There are, however, a few key differences that will be discussed in this page.
Heterogeneity
While previous HPC cluster systems at UI have been very homogenous, the Argon HPC system has a heterogeneous mix of compute node types. In addition to the variability in the GPU accelerator types listed above, there are also differences in CPU architecture. We generally follow Intel marketing names, with the most important distinction being the AVX (Advanced Vector Extensions) unit on the processor. The following table lists the processors in increasing generational order.
...
Note that code must be optimized during compilation to take advantage of AVX instructions. The CPU architecture is important to keep in mind both in terms of potential performance and compatibility. For instance, code optimized for AVX2 instructions will not run on the Sandybridge/Ivybridge architecture because it only supports AVX, not AVX2. However, each successive generation is backward compatible so code optimized with AVX instructions will run on Haswell/Broadwell systems.
Hyperthreaded Cores (HT)
One important difference between Argon and previous systems is that Argon has Hyperthreaded processor cores turned on. Hyperthreaded cores can be thought of as splitting a single processor into two virtual cores, much as a Linux process can be split into threads. That oversimplifies it but if your application is multithreaded then hyperthreaded cores can potentially run the application more efficiently. For non-threaded applications you can think of any pair of hyperthreaded cores to be roughly equivalent to two cores at half the speed if both cores of the pair are in use. This can help ensure that the physical processor is kept busy for processes that do not always use the full capacity of a core. The reasons for enabling HT for Argon are to try to increase system efficiency on the workloads that we have observed. There are some thing to keep in mind as you are developing your workflows.
...
Info |
---|
After the merger of Argon and Neon, there are a few of the older nodes that are not HT capable. These are the High Memory nodes with cpu_arch=sandybridge/ivybridge. |
Job Scheduler/Resource Manager
Like previous UI HPC systems, Argon uses SGE, although this version is based off of a slightly different code-base. If anyone is interested in the history of SGE there is an interesting write up at History of Grid Engine Development. The version of SGE that Argon uses is from the Son of Grid Engine project. For the most part this will be very familiar to people who have used previous generations of UI HPC systems. One thing that will look a little different is the output of the qhost command. This will show the CPU topology.
...
If your job does not use the system openmpi, or does not use MPI, then any desired core binding will need to be set up with whatever mechanism the software uses. Otherwise, there will be no core binding. Again, that may not be a major issue. If your job does not work well with HT then run on a number of cores equal to half of the number of slots requested and the OS scheduler will minimize contention.
new SGE utilities
While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.
- qstatus: Reformats output of qstat and can calculate job statistics.
- dead-nodes: This will tell you what nodes are not physically participating in the cluster.
- idle-nodes: This will tell you what nodes do not have any activity on them.
- busy-nodes: This will tell you what nodes are running jobs.
- nodes-in-job: This is probably the most useful. Given a job ID it will list the nodes that are in use for that particular job.
SSH to compute nodes
On previous UI HPC systems it was possible to briefly ssh to any compute node, before getting booted from that node if a registered job was not found. This was sufficient to run an ssh command, for instance, on any node. This is not the case for Argon. SSH connections to compute nodes will only be allowed if you have a registered job on that host. Of course, qlogin sessions will allow you to login to a node directly as well. Again, if you have a job running on a node you can ssh to that node in order to check status, etc. You can find the nodes of a job with the nodes-in-job
command mentioned above. We ask that you not do more than observe things while logged into the node as it may have shared jobs on it.
...