...
- For high throughput jobs the use of HT can increase overall throughput by keeping cores active as jobs come and go. These jobs can treat each HT core as a processor.
- For multithreaded applications, HT will provide more efficient handling of threads. You must make sure to request the appropriate number of job slots. Generally, the number of job slots requested should equal the number of cores that will be running.
- For non-threaded CPU bound processes that can keep a core busy all of the time, you probably want to only run one process per core, and not run processes on HT cores. That This can be accomplished by taking advantage of the Linux kernel's ability to bind processes to cores. The Argon resource manager will launch jobs with core binding , which will happen by default, and using by default so that processes will land on cores. In order to prevent processes from landing on the HT cores of a machine make sure that only half of the total number of cores as job slots requestedare used. See below for more details but requesting twice the number of job slots as the number of cores that will be used will accomplish this. A good example of this type of job is non-threaded MPI jobs, but really any non-threaded job.
...
Like previous UI HPC systems, Argon uses SGE, although this version is based off of a slightly different code-base. If anyone is interested in the history of SGE there is an interesting writeup at History of Grid Engine Development. The version of SGE that Argon uses is from the Son of Grid Engine (SoGE) project. For the most part this will be very familiar to people who have used previous generations of UI HPC systems. One thing that will look a little different is the output of the qhost command. This will show the CPU topology.
...
That would run the 4 MPI ranks on physical cores and not HT cores. That works because, unless overridden, we have set the default SGE core binding strategy to linear. Unless overridden by the user this strategy will bind processes the next launched process to the next available core. Since HT cores are mapped after all physical cores this will fill the actual cores first. Once the slots are used, as they will be because the number of slots is 2x the number of cores, the HT cores would be effectively blocked. Note that this will work for non-MPI jobs as well. If you have a non-threaded process that you want to ensure runs on an actual core, you could use the same 2x slot request.
...
Note that if you do not use the above strategy then it is possible that your job process shares a core with another job processwill share cores with other job processes. That may be okay, and preferred for high throughput jobs, but is something to keep in mind. It is especially important to keep this in mind when using the orte
parallel environment. There is more discussion on the orte
parallel environment on the Advanced Job Submission page. In short, that parallel environment is used in node sharing scenarios, which implies potential core sharing as well. For MPI jobs, that is probably not what you want. As on previous systems, there is a parallel environment for requesting entire nodes. This is especially useful for MPI jobs to ensure the best performance.
new SGE utilities
While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.
...