...
Note that if you do not use the above strategy then it is possible that your job process will share cores with other job processes. That may be okay, and preferred for high throughput jobs, but is something to keep in mind. It is especially important to keep this in mind when using the orte
parallel environment. There is more discussion on the orte
parallel environment on the Advanced Job Submission page. In short, that parallel environment is used in node sharing scenarios, which implies potential core sharing as well. For MPI jobs, that is probably not what you want. As on previous systems, there is a parallel environment (Xcpn, where X is the number of cores per node) for requesting entire nodes. This is especially useful for MPI jobs to ensure the best performance. Note that there are some additional parallel environments that are akin to the smp and Xcpn ones but are specialized for certain software packages. These are
- gaussian-sm and gaussian_lindaX, where X is the number of cores per node
- turbomole_mpiX, where X is the number of cores per node
- wien2k-sm and wien2k_mpiX, where X is the number of cores per node
For MPI jobs, the system provided openmpi will not bind processes to cores by default, as would be the normal default for openmpi. This is set this way to avoid inadvertently oversubscribing processes on cores. In addition, the system openmpi settings will map processes by socket. This should give a good process distribution in all cases. However, if you wish to use less than 28 processes per node in an MPI job then you may want to map by node to get the most even distribution of processes across nodes. You can do that with the --map-by node
option flag to mpirun.
...