Python

Available Python Environments

The HPC systems offer a variety of Python installs for users to choose from. Use the following command to see what is available:

module spider python

Note that the stacks are not equal in terms of extensions installed, although stacks of the same python version should have the same extensions. Generally, we provide stacks built with the Intel compiler and MKL. 

Also, note that you should generally load a python environment module before using python in a job script. Otherwise you will get the system python installation, which is part of the system but lacks many features because it's not maintained as a piece of research software.

Using MKL linked Python

If using a Python linked against MKL then you have the option of using OpenMP threads. This is controlled with the OMP_NUM_THREADS environment variable. To avoid inadvertently overloading compute nodes this is set to 1. If your code will perform better with threads then set that variable to the optimal number. Be sure to request an appropriate number of jobs slots from SGE when adjusting that. So, if your code performs best with 4 threads and using MKL, set

OMP_NUM_THREADS=4

and in the job script

#$ -pe smp 4

Adding Python Modules

We have attempted to add the most commonly-known scientific and data analysis modules that are available into the centrally available Python environments. If there is a Python module that you are interested in using, you can see if it is available in one of our offered Python environments. To do so, simply load the desired Python environment, enter the Python help system, and list the installed modules. Alternatively, look at the stack contents on the Wiki, Argon Software List - Python

If the module that you want is not available, then you may install it into your home directory. A few popular methods for doing this are described below.

Install the module into the $HOME/.local directory.

Using mpi4py  as an example, the steps to do this are as follows: 

1. First, figure out if there are any prerequisites that need to be met. In this case, mpi4py requires a working MPI so you must choose the one that most fits your needs (hint: You can invoke the "module avail" command to see which MPI are available).  

2. Next, load the appropriate supporting modules: 

$ module load <choice-of-python>
$ module load <choice-of-mpi>

3. Untar the package: 

$ tar xzf mpi4py-1.3.1

4. Move into the package directory:

$ cd mpi4py-1.3.1

5. Build the package locally:

$ python setup.py install --user

By default, the above steps will install your local Python module into $HOME/.local/lib/<python-version>. If you are using a Python version later than 2.6, you need do nothing to modify Python's search path, unless you used something other than the default user install path as illustrated above. If you did modify the install path, then you will also need to modify your PYTHONPATH to add your custom location. 

Once this is complete, you may launch the python interpreter, enter the help system, and see if your module is now available. If the module is listed, then you may import and use it. 

Using pip

Most python packages are available on PyPI - the Python Package Index : Python Package Index and can be installed using pip. This will mostly take care of dependencies for you but pip has some weaknesses in this regard. See User Guide — pip 8.1.2 documentation for more information. For this you would run

pip install --user mpi4py

That will download, build, and install the latest version.

Virtual Environment

This requires version 2.7.10 or later of the python environment modules on Argon. For python versions in stack versions newer than stack/legacy, the separate py-pip and py-virtualenv modules are also required.

Beginning with version 3.5, Python recommends invoking the venv module to create a virtual environment. To adapt the example shown below, use "python -m venv" instead of the virtualenv command (and you don't need to load Argon's py-virtualenv environment module at all).

A better alternative to installing with Python's --user scheme ($HOME/.local on Linux) is to use Python virtual environments. Each virtual environment is a complete Python environment with its own dedicated python interpreter, pip (the installer), and package installation which you can easily modify independent of the system or any other environment. This is useful to test multiple versions independently as you work on projects; in general, it lets you isolate codebases with differing or conflicting requirements. To create a virtual environment and install packages:

  1. Load whichever python module version you prefer:

    module load python/2.7.10


  2. You'll probably make a few environments for testing and unrelated tasks, and it's common convention to create a directory to keep them organized:

    mkdir virtenvs
  3. By default, this will create a virtual environment which will be isolated from any packages present outside itself, including only the packages you install into it later:

    virtualenv virtenvs/someProject

    Note that the name you choose will be a directory which contains the configuration of your virtual environment, so name it as you would name a directory.


  4. Alternately, if you want the new environment to include whatever packages are present outside itself (for example, those in the environment module you loaded), use the following:

    virtualenv --system-site-packages virtenvs/someProject

    This is useful if you want to use the MKL linked numpy/scipy packages in your virtual environment. Note that if you use this option, some packages you try to install later will be unable to meet their dependency requirements unless you also use the "--upgrade" flag with pip.

  5. Activate the virtual environment:

    source virtenvs/someProject/bin/activate

    This modifies your shell session's environment to contain the Python environment described by this virtual environment configuration.

  6. At this point you can install whatever python software that you need without explicitly specifying your virutal environment. This could be either setup.py or pip, just as above.

    pip install mpi4py
  7. After you are finished using or modifying the virtual environment, deactivate it with the following command:

    deactivate

    This modifies your shell session's environment to remove the active Python environment.

When you want to use the environment again later, for example to use it in a job script or modify its contents, simply source it again the same way to restore its previous state:

source virtenvs/someProject/bin/activate

With that, the environment is ready to use again. Packages can be added, removed, run, etc.

To use the virtual environment in a cluster job script, simply activate it there the same way.

Conda

Conda is a tool commonly used to install Python software along with dependencies and manage virtual environments on a laptop or workstation. You typically install Conda by downloading and running either the Miniconda installer, which installs the base Conda system so that you can then install any packages you need, or the Anaconda installer, which installs the base Conda system plus a large selection of popular software. For reference, see the Conda User Guide.

Note that Conda provides much of the same functionality already present in the HPC environment, namely environment modules which provide Python and other software such as pip and virtualenv. Therefore, although it's possible to install Conda in your home account or even a group shared drive, it's fundamentally a different tool compared to others present in the HPC environment. If you need specific Python packages not currently available in the HPC Python environment modules, it's usually possible for the HPC staff to install them upon request.

A default installation of Anaconda includes certain software used for graphical logins. By default it makes the Anaconda version active during login, masking Argon's own version and interfering with graphical logins using FastX; further info here, including workaround.

The following examples illustrate common Conda tasks.

You can create a Conda environment for Python 3 named "py3" like so:

conda create -n py3 python=3

Once the Conda environment is created, you can activate it like so:

source activate py3

Once you are finished working with the conda environment, you can deactivate like so:

source deactivate

Conda lets you install various packages containing programs which are not naturally installed on your HPC home directory, for exmple ffmpeg and opencv. You can search for packages available via Conda like so:

conda search opencv

List the packages already installed in one of your Conda environments like so:

conda list -n py3

Remove a particular package from a particular environment like so:

conda remove --name py3 opencv

Remove an entire Conda environment like so:

conda remove --name py3 --all

To verify the environment has been removed, list all remaining environments like so:

conda info --envs

Similar to Python virtual environments, jobs submitted within the Conda environment can access anything that has been installed within the environment.

Matlab engine for Python

Each version of Matlab is supplied with a version of the Matlab engine for Python compatible only with certain versions of Python. Therefore you'll need to find a compatible pairing of Matlab and Python module versions. Mutually compatible combinations should normally occur within each version of the stack. The following is an example method for installing the engine into a virtual environment using versions available as of stack/2022.1:

module purge
module load stack/2022.1-base_arch
module load py-virtualenv
module load matlab/R2021b
virtualenv someproject
source someproject/bin/activate
rm -rf /localscratch/Users/$(whoami)/buildmatlabengine
mkdir /localscratch/Users/$(whoami)/buildmatlabengine
cd ${ROOT_MATLAB}/extern/engines/python
python setup.py build -b /localscratch/Users/$(whoami)/buildmatlabengine install
rm -rf /localscratch/Users/$(whoami)/buildmatlabengine
deactivate
module purge
 More info...

Note that the standard installation method provided by Matlab through (at least) version R2021b use an old method for installing Python software without using pip. Some versions of pip made extra effort to accommodate installations using this method, but this functionality was problematic and eventually removed from newer versions of pip. Unless a subsequent version of Matlab explicitly uses pip, the most reliable and likely the only effective method of installation will remain "python setup.py ...", and you should prefer a virtual environment in order to prevent the installation from interfering with other sets of Python software you install for your subsequent projects.

In your job script (or your qlogin session, if you're exploring a dataset interactively), activate the required modules and virtual environment the normal way before invoking your Python code (or the python interpreter in your qlogin session):

module purge
module load stack/2022.1
module load py-virtualenv
module load matlab/R2021b
source someproject/bin/activate
# Do Python stuff, use matlab.engine

Building your own complete Python environment

Note that it is also possible to build your own Python environment from scratch, and modify it exactly to your needs. In addition to the packages available from Python.org, there are also pre-packaged versions available which include many oft-used scientific and analytic modules already built in. Two of the most popular of these are Enthought's Canopy, and Anaconda from Continuum Analytics. Both offer free versions that you may download and install. They offer the additional value of modifying your path for you, and also offer feature-rich packaging systems which make adding additional modules painless.