Compiling Software

The HPC system contains the standard GNU compilers that come stock with the respective versions of CentOS. In addition, newer versions of the GNU compiler suite are installed in order to make use of advanced features. The newer versions of the GNU compilers are loaded via environment modules.

General information can be found here: http://gcc.gnu.org/onlinedocs/gcc/. Of particular interest may be: 

Intel compilers

The Intel compiler suite is installed on the ITS-RS HPC system.

The Intel compilers are accessed via environment modules as well. There are some further idiosyncrasies with Intel’s tools to be aware.

  1. Intel provides scripts to source to set up the environment. The environment modules do the same thing, and have the benefit of allowing one to load, unload, and switch environments. Intel’s scripts do not support that but some people prefer not to use environment modules. The point to make is that you should not use both mechanisms. If using environment modules then do not source the Intel environment scripts.
  2. Loading the “intel” environment will load the MKL environment distributed with the compilers. Intel’s naming scheme prevents this association from being obvious. The reason for having a separate mkl module, and Intel’s separate mkl script, is because MKL can be used with other compilers as well.

A guide to using Fortran can be downloaded here: A Fortran Tutorial

See the Intel site for information on their software. The links change frequently so are not listed here. There are also manual pages available once the respective environment module is loaded.

Optimization

To get the best performance out of your software you can pass optimization flags during compilation. Note that some projects will already have some optimizations in the build system. Those may or may not be the best optimizations for the hardware. Note that the Argon system has multiple types of hardware so that needs to factored in when making decisions about optimization. The compiler documentation is the best source of information on compiler optimization options. There can be very fine grained control but the -O flag, with a level specifier, is a good way to turn on sets of optimization features. The levels are typically -O1-O2, and -O3, with an increasing level of aggressiveness. There is also a -O0 level to turn off optimizations at certain points in the build. Using the -O flag will usually make a substantial difference in the performance of the code at run time. Note that is is possible to over optimize, which could lead to slower performance at a higher optimization level and/or instability. Higher optimization levels could also lead to the inability to compile the code in some cases.

SIMD (Single Instruction Multiple Data)

Modern CPU processors have the ability to generate vector instructions from code called SIMD (Single Instruction Multiple Data). The processor that the resulting code runs on must have the physical SIMD unit capability that was targeted during the compilation. If that is not the case then the code will fail when executing a SIMD instruction on a processor that does not have that SIMD unit in its feature set. This means that the highest levels of optimization can be obtained for specific hardware with the tradeoff that the code will not be portable across architectures. There has been a steady progression of enhanced SIMD units through the years:

  1. SSE
  2. SSE2
  3. SSE3
  4. SSSE3
  5. SSE4.1
  6. SSE4.2
  7. AVX
  8. AVX2
  9. AVX512

Fortunately, those are backwards compatible so code compiled with AVX instructions will run on a machine that contains an AVX2 unit. Often, the easiest way to compile code to use SIMD extensions is to use a compiler flag that turns on hardware specific optimizations. This is sometimes specified in the build system in software projects, which will effectively optimize the code for the specific hardware that is detected at build time. For the Intel compilers, that option is -xHost and for the GNU compilers it is -march=native. Doing this is fine if all of the machines that you will run the code on are identical. However, on a system like Argon, with multiple CPU architectures, use of those flags is discouraged.

There are a few strategies to deal with this issue.

  1. If you are compiling your code with the Intel compiler then you can build a multi-dispatch binaries, which will build code for multiple SIMD units and then pick the correct one at run time. For example, to compile a multi-dispatch binary with the Intel compiler for use on systems that have an AVX2 unit as well as one that only has an AVX unit, you could use the following flag. 

    -axCORE-AVX2,AVX

    The GNU GCC compilers do not have that capability. There is some development in newer versions of GCC to add a Function Multi Version capability which would also build code with multiple SIMD routines. However, those targets would have to be made explicit in the source code as opposed to having the compiler find them.

  2. Compile for the least common denominator.
    If there are a set of machines that your code will need to run on, some with AVX2 and some with only AVX, then compile the code for the AVX target. That can be accomplished with the -mavx optimization flag. This would be particularly useful with the GNU compilers.

  3. Maintain multiple compiled versions of the code.
    In this case, compile one version with the AVX target and one version with the AVX2 target. Then the appropriate binary would need to be manually selected depending on the host CPU architecture.

MPI

There are several instances of OpenMPI that are built with different compilers. To compile MPI programs use the compiler wrappers as described here.

Valgrind

Valgrind is a program for debugging and profiling Linux executables. It consists of a core, which provides a synthetic CPU in software, and a series of debugging and profiling tools. The architecture is modular, so that new tools can be created easily and without disturbing the existing structure. One of the most popular uses of Valgrind is to check for memory leaks using the Memcheck tool, as it can detect memory-related issues in C and C++ programs.

If you normally run your program like this:

myprog arg1 arg2

Using Valgrind would look like this:

valgrind --leak-check=yes myprog arg1 arg2

This gets you started with the most popular Valgrind tool, memcheck. More information about this and other tools is available via an extensive man page (man valgrind) and the Valgrind Quick Start Guide.

Ocaml

Ocaml is an implementation of the Caml language, and emphasizes error-checking, memory management and error recovery. Programs are verified by the compiler before they can be executed, which rules out many errors. OCaml's toolset includes an interactive top level interpreter, a bytecode compiler, and an optimizing native code compiler. It has a large standard library that makes it useful for many of the same applications as Python or Perl, as well as robust modular and object-oriented programming constructs that make it applicable for large-scale software engineering. OCaml provides a command line utility, "ocaml", which permits the interactive use of the OCaml system through a read-eval-print loop. Use "man ocaml" to learn more about this tool.

In addition to the Ocaml reference guide, the ocaml.org Web site has several useful tutorials and also a convenient OCaml Cheat Sheet.