Table of Contents |
---|
The Argon HPC system is the latest HPC system of the University of Iowa. It consists of 229 compute nodes, each of which contain 28 2.4GHz Intel Broadwell processor cores, running CentOS-7.3 Linux. There are several compute node configurations,
...
Note that if you do not use the above strategy then it is possible that your job process will share cores with other job processes. That may be okay, and preferred for high throughput jobs, but is something to keep in mind. It is especially important to keep this in mind when using the orte
parallel environment. There is more discussion on the orte
parallel environment on the Advanced Job Submission page. In short, that parallel environment is used in node sharing scenarios, which implies potential core sharing as well. For MPI jobs, that is probably not what you want. As on previous systems, there is a parallel environment (56cpn) for requesting entire nodes. This is especially useful for MPI jobs to ensure the best performance.
Note that core binding is a soft request. If the binding can not be done the job will still run, if it otherwise has the resources. This is particularly true on machines where jobs are being shared as the actual cores can be bound while still leaving slots available. The only way to assure binding is with dedicated nodes. However, core binding in and of itself may not really boost performance much. Generally speaking, if you want to minimize contention with hardware threads then simply request twice the number of slots than cores your job will use. Even if the processes are not bound to cores, the OS scheduler will do a good job of minimizing contention.
The core binding will be handled differently depending on whether jobs can span multiple nodes, as in MPI, or not. For serial jobs, SGE will bind a process to the next available core. For parallel jobs within a node, ie, the smp
PE, SGE will bind to a number of cores equal to half the number of slots requested.
Info |
---|
You can modify binding attributes if you wish with the qsub -binding flag or clear all default requested parameters with the qsub -clear flag. |
SGE is not currently able to properly set core binding for multi node jobs. For jobs that span nodes with MPI, the system provided openmpi will bind processes to cores by default. The binding parameters in that case can be overridden with parameters to mpirun
. If your job does not use the system openmpi, or does not use MPI, then any desired core binding will need to be set up with whatever mechanism the software uses. Otherwise, there will be no core binding. Again, that may not be a major issue. If your job does not work well with HT then run on a number of cores equal to half of the number of slots requested and the OS scheduler will minimize contention.
new SGE utilities
While SoGE is very similar to previous versions of SGE there are some new utilities that people may find of interest. There are manual pages for each of these.
...