Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. 40-core 96GB

  2. 40-core 192GB

  3. 56-core 128GB

  4. 56-core 256GB

  5. 56-core 512GB

  6. 64-core 192GB

  7. 64-core 384GB

  8. 64-core 768GB

  9. 80-core 96GB

  10. 80-core 192GB

  11. 80-core 384GB

  12. 80-core 768GB

  13. 80-core 1.5TB

  14. 112-core 256GB

  15. 112-core 512GB

  16. 112-core 1024GB

  17. 80112-core 1.5TB112

  18. 128-core 256GB

  19. 128-core 512GB

  20. 128-core 1TB

  21. 128-core 1.5TB

The Argon cluster is split between two data centers,

...

Most of the nodes in the LC datacenter are connected with the OmniPath high speed interconnect fabric, while most of those . The nodes in the ITF data center are connected with the InfiniPath fabric, with the latest nodes having a Mellanox Infiniband EDR fabric. There are two separate fabrics at ITF which do not interconnect. We refer to each of these fabrics as an island.

There are many machines with varying types of GPU accelerators:

  1. 21 machines with Nvidia P100 accelerators

  2. 2 machines with Nvidia K80 accelerators

  3. 2 machines with Nvidia P40 accelerators

  4. 17 machines with 1080Ti accelerators

  5. 19 machines with Titan V accelerators

  6. 14 machines with V100 accelerators

  7. 38 machines with 2080Ti accelerators

  8. 1 machine with RTX8000 accelerators

  9. 7 machines with A100 accelerators

  10. 5 machines with 4 A40 accelerators

...

  1. each

  2. 2 machines with 4 L40S accelerators each

  3. 1 machine with 4 L4 accelerators


Heterogeneity

While previous HPC cluster systems at UI have been very homogenous, the Argon HPC system has a heterogeneous mix of compute node types. In addition to the variability in the GPU accelerator types listed above, there are also differences in CPU architecture. We generally follow Intel marketing names, with the most important distinction being the AVX (Advanced Vector Extensions) unit on the processor. The following table lists the processors in increasing generational order.

Architecture

AVX level

Floating Point Operations per cycle

Haswell
Broadwell

AVX2

16

Skylake Silver

AVX512

16 (1) AVX unit per processor core

Skylake Gold

AVX512

32 (2) AVX units per processor core

Cascadelake Gold

AVX512

32

Sapphire Rapids Gold

AVX512

Note that code must be optimized during compilation to take advantage of AVX instructions. The CPU architecture is important to keep in mind both in terms of potential performance and compatibility. For instance, code optimized for AVX512 instructions will not run on the Haswell/Broadwell architecture because it only supports AVX2, not AVX512. However, each successive generation is backward compatible so code optimized with AVX2 instructions will run on Skylake/Cascadelake systems.

...

Table plus


Node memory (GB)

Job slots

Memory (GB) per slot

96

40

2

96

80

1

128

56

2

192

40

5

192

64

3

192

80

2

256

56

4

256

112

2

256

128

2

384

64

6

384

80

5

512

56

9

512

112

4

512

128

4

768

64

12

768

80

9

1024

112

9

1024

128

8

1536

80

19

1536

112

13

1536

128

12


Using theĀ Basic Job Submission andĀ Advanced Job Submission pages as a reference, how would one submit jobs taking HT into account? For single process high throughput type jobs it probably does not matter, just request one slot per job. For multithreaded or MPI jobs, request one job slot per thread or process. So if your application runs best with 4 threads then request something like the following.

...