Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

...

  1. For high throughput jobs the use of HT can increase overall throughput by keeping cores active as jobs come and go. These jobs can treat each HT core as a processor.
  2. For multithreaded applications, HT will provide more efficient handling of threads. You must make sure to request the appropriate number of job slots. Generally, the number of job slots requested should equal the number of cores that will be running.
  3. For non-threaded CPU bound processes that can keep a core busy all of the time, you probably want to only run one process per core, and not run processes on HT cores. This can be accomplished by taking advantage of the Linux kernel's ability to bind processes to cores. In order to minimize processes running on the HT cores of a machine make sure that only half of the total number of cores are used. See below for more details but requesting twice the number of job slots as the number of cores that will be used will accomplish this. A good example of this type of job is non-threaded MPI jobs, but really any non-threaded job.
Info

After the merger of Argon and Neon, there are a few of the older nodes that are not HT capable. These are the High Memory nodes with cpu_arch=sandybridge/ivybridge.


Job Scheduler/Resource Manager

Like previous UI HPC systems, Argon uses SGE, although this version is based off of a slightly different code-base. If anyone is interested in the history of SGE there is an interesting writeup write up at History of Grid Engine Development. The version of SGE that Argon uses is from the Son of Grid Engine project. For the most part this will be very familiar to people who have used previous generations of UI HPC systems. One thing that will look a little different is the output of the qhost command. This will show the CPU topology.

...

Table plus
columnTypesI,H,I


92
Node memory (GB)Job slotsMemory (GB) per slot
64322
96402
128562
192404
192643
256328
256564
51224 (no HT)20
51232 (no HT)16
512568


Using the Basic Job Submission and Advanced Job Submission pages as a reference, how would one submit jobs taking HT into account? For single process high throughput type jobs it probably does not matter, just request one slot per job. For multithreaded or MPI jobs, request one job slot per thread or process. So if your application runs best with 4 threads then request something like the following.

...

For MPI jobs, the system provided openmpi will not bind processes to cores by default, as would be the normal default for openmpi. This is set this way to avoid inadvertently oversubcribing oversubscribing processes on cores. In addition, the system openmpi settings will map processes by socket. This should give a good process distribution in all cases. However, if you wish to use less than 28 processes per node in an MPI job then you may want to map by node to get the most even distribution of processes across nodes. You can do that with the --map-by node option flag to mpirun.

...

If your job does not use the system openmpi, or does not use MPI, then any desired core binding will need to be set up with whatever mechanism the software uses. Otherwise, there will be no core binding. Again, that may not be a major issue. If your job does not work well with HT then run on a number of cores equal to half of the number of slots requested and the OS scheduler will minimize contention. 

new SGE utilities

While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.

...