...
Note that code must be optimized during compilation to take advantage of AVX instructions. The CPU architecture is important to keep in mind both in terms of potential performance and compatibility. For instance, code optimized for AVX2 instructions will not run on the Sandybridge/Ivybridge architecture because it only supports AVX, not AVX2. However, each successive generation is backward compatible so code optimized with AVX instructions will run on Haswell/Broadwell systems.
...
Hyper Threaded Cores (HT)
One important difference between Argon and previous systems is that Argon has Hyperthreaded Hyper Threaded processor cores turned on. Hyperthreaded Hyper Threaded cores can be thought of as splitting a single processor into two virtual cores, much as a Linux process can be split into threads. That oversimplifies it but if your application is multithreaded then hyperthreaded Hyper Threaded cores can potentially run the application more efficiently. For non-threaded applications you can think of any pair of hyperthreaded Hyper Threaded cores to be roughly equivalent to two cores at half the speed if both cores of the pair are in use. This . Again, that is an over simplification, but the main point is that CPU bound processes perform better when not sharing a CPU core. Hyper Threaded cores can help ensure that the physical processor is kept busy for processes that do not always use the full capacity of a core. The reasons reason for enabling HT for Argon are is to try to increase system efficiency on the workloads that we have observed. There are some thing to keep in mind as you are developing your workflows.
- For high throughput jobs the use of HT can increase overall throughput by keeping cores active as jobs come and go. These jobs can treat each HT core as a processor.
- For multithreaded applications, HT will provide more efficient handling of threads. You must make sure to request the appropriate number of job slots. Generally, the number of job slots requested should equal the number of cores that will be running.
For non-threaded CPU bound processes that can keep a core busy all of the time, you probably want to only run one process per core, and not run processes on HT cores. This can be accomplished by taking advantage of the Linux kernel's ability to bind processes to cores. In order to minimize processes running on the HT cores of a machine make sure that only half of the total number of cores are used. See below for more details but requesting twice the number of job slots as the number of cores that will be used will accomplish this. A good example of this type of job is non-threaded MPI jobs, but really any non-threaded job. If your job script is written in
bash
syntax then you can use the$NSLOTS
SGE variable as follows, using mpirun as an example:No Format mpirun -np $(($NSLOTS/2)) ...
Info |
---|
After the merger of Argon and Neon, there are a few of the older nodes that are not HT capable. These are the High Memory nodes with cpu_arch=sandybridge/ivybridge. |
Job Scheduler/Resource Manager
Like previous UI HPC systems, Argon uses SGE, although this version is based off of a slightly different code-base. If anyone is interested in the history of SGE there is an interesting write up at History of Grid Engine Development. The version of SGE that Argon uses is from the Son of Grid Engine project. For the most part this will be very familiar to people who have used previous generations of UI HPC systems. One thing that will look a little different is the output of the qhost command. This will show the CPU topology.
...
- each node has 2 processor socketseach processor socket has 14 processor cores
- each processor core has 2 hardware threads (HT)
...