Python
Available Python Environments
The HPC systems offer a variety of Python installs for users to choose from. Use the following command to see what is available:
module spider python
Note that the stacks are not equal in terms of extensions installed, although stacks of the same python version should have the same extensions. Generally, we provide stacks built with the Intel compiler and MKL.
Also, note that you should generally load a python environment module before using python in a job script. Otherwise you will get the system python installation, which is part of the system but lacks many features because it's not maintained as a piece of research software.
Using MKL linked Python
If using a Python linked against MKL then you have the option of using OpenMP threads. This is controlled with the OMP_NUM_THREADS
environment variable. To avoid inadvertently overloading compute nodes this is set to 1. If your code will perform better with threads then set that variable to the optimal number. Be sure to request an appropriate number of jobs slots from SGE when adjusting that. So, if your code performs best with 4 threads and using MKL, set
OMP_NUM_THREADS=4
and in the job script
#$ -pe smp 4
Adding Python Modules
We have attempted to add the most commonly-known scientific and data analysis modules that are available into the centrally available Python environments. If there is a Python module that you are interested in using, you can see if it is available in one of our offered Python environments. To do so, simply load the desired Python environment, enter the Python help system, and list the installed modules. Alternatively, look at the stack contents on the Wiki, Argon Software List - Python
If the module that you want is not available, then you may install it into your home directory. A few popular methods for doing this are described below.
Install the module into the $HOME/.local
directory.
Using mpi4py as an example, the steps to do this are as follows:
1. First, figure out if there are any prerequisites that need to be met. In this case, mpi4py requires a working MPI so you must choose the one that most fits your needs (hint: You can invoke the "module avail
" command to see which MPI are available).
2. Next, load the appropriate supporting modules:
$ module load <choice-of-python> $ module load <choice-of-mpi>
3. Untar the package:
$ tar xzf mpi4py-1.3.1
4. Move into the package directory:
$ cd mpi4py-1.3.1
5. Build the package locally:
$ python setup.py install --user
By default, the above steps will install your local Python module into $HOME/.local/lib/<python-version>
. If you are using a Python version later than 2.6, you need do nothing to modify Python's search path, unless you used something other than the default user install path as illustrated above. If you did modify the install path, then you will also need to modify your PYTHONPATH to add your custom location.
Once this is complete, you may launch the python interpreter, enter the help system, and see if your module is now available. If the module is listed, then you may import and use it.
Using pip
Most python packages are available on PyPI - the Python Package Index : Python Package Index and can be installed using pip
. This will mostly take care of dependencies for you but pip has some weaknesses in this regard. See User Guide — pip 8.1.2 documentation for more information. For this you would run
pip install --user mpi4py
That will download, build, and install the latest version.
Virtual Environment
This requires version 2.7.10 or later of the python environment modules on Argon. For python versions in stack versions newer than stack/legacy, the separate py-pip and py-virtualenv modules are also required.
Beginning with version 3.5, Python recommends invoking the venv module to create a virtual environment. To adapt the example shown below, use "python -m venv" instead of the virtualenv command (and you don't need to load Argon's py-virtualenv environment module at all).
A better alternative to installing with Python's --user scheme ($HOME/.local
on Linux) is to use Python virtual environments. Each virtual environment is a complete Python environment with its own dedicated python interpreter, pip (the installer), and package installation which you can easily modify independent of the system or any other environment. This is useful to test multiple versions independently as you work on projects; in general, it lets you isolate codebases with differing or conflicting requirements. To create a virtual environment and install packages:
Load whichever python module version you prefer:
module load python/2.7.10
You'll probably make a few environments for testing and unrelated tasks, and it's common convention to create a directory to keep them organized:
mkdir virtenvs
By default, this will create a virtual environment which will be isolated from any packages present outside itself, including only the packages you install into it later:
virtualenv virtenvs/someProject
Note that the name you choose will be a directory which contains the configuration of your virtual environment, so name it as you would name a directory.
Alternately, if you want the new environment to include whatever packages are present outside itself (for example, those in the environment module you loaded), use the following:
virtualenv --system-site-packages virtenvs/someProject
This is useful if you want to use the MKL linked numpy/scipy packages in your virtual environment. Note that if you use this option, some packages you try to install later will be unable to meet their dependency requirements unless you also use the "--upgrade" flag with pip.
Activate the virtual environment:
source virtenvs/someProject/bin/activate
This modifies your shell session's environment to contain the Python environment described by this virtual environment configuration.
At this point you can install whatever python software that you need without explicitly specifying your virutal environment. This could be either
setup.py
orpip
, just as above.pip install mpi4py
After you are finished using or modifying the virtual environment, deactivate it with the following command:
deactivate
This modifies your shell session's environment to remove the active Python environment.
When you want to use the environment again later, for example to use it in a job script or modify its contents, simply source it again the same way to restore its previous state:
source virtenvs/someProject/bin/activate
With that, the environment is ready to use again. Packages can be added, removed, run, etc.
To use the virtual environment in a cluster job script, simply activate it there the same way.
Conda
Conda is a tool commonly used to install Python software along with dependencies and manage virtual environments on a laptop or workstation. You typically install Conda by downloading and running either the Miniconda installer, which installs the base Conda system so that you can then install any packages you need, or the Anaconda installer, which installs the base Conda system plus a large selection of popular software. For reference, see the Conda User Guide.
Note that Conda provides much of the same functionality already present in the HPC environment, namely environment modules which provide Python and other software such as pip and virtualenv. Therefore, although it's possible to install Conda in your home account or even a group shared drive, it's fundamentally a different tool compared to others present in the HPC environment. If you need specific Python packages not currently available in the HPC Python environment modules, it's usually possible for the HPC staff to install them upon request.
A default installation of Anaconda includes certain software used for graphical logins. By default it makes the Anaconda version active during login, masking Argon's own version and interfering with graphical logins using FastX; further info here, including workaround.
The following examples illustrate common Conda tasks.
You can create a Conda environment for Python 3 named "py3" like so:
conda create -n py3 python=3
Once the Conda environment is created, you can activate it like so:
source activate py3
Once you are finished working with the conda environment, you can deactivate like so:
source deactivate
Conda lets you install various packages containing programs which are not naturally installed on your HPC home directory, for exmple ffmpeg and opencv. You can search for packages available via Conda like so:
conda search opencv
List the packages already installed in one of your Conda environments like so:
conda list -n py3
Remove a particular package from a particular environment like so:
conda remove --name py3 opencv
Remove an entire Conda environment like so:
conda remove --name py3 --all
To verify the environment has been removed, list all remaining environments like so:
conda info --envs
Similar to Python virtual environments, jobs submitted within the Conda environment can access anything that has been installed within the environment.
Matlab engine for Python
Each version of Matlab is supplied with a version of the Matlab engine for Python compatible only with certain versions of Python. Therefore you'll need to find a compatible pairing of Matlab and Python module versions. Mutually compatible combinations should normally occur within each version of the stack. The following is an example method for installing the engine into a virtual environment using versions available as of stack/2022.1:
module purge module load stack/2022.1-base_arch module load py-virtualenv module load matlab/R2021b virtualenv someproject source someproject/bin/activate rm -rf /localscratch/Users/$(whoami)/buildmatlabengine mkdir /localscratch/Users/$(whoami)/buildmatlabengine cd ${ROOT_MATLAB}/extern/engines/python python setup.py build -b /localscratch/Users/$(whoami)/buildmatlabengine install rm -rf /localscratch/Users/$(whoami)/buildmatlabengine deactivate module purge
In your job script (or your qlogin session, if you're exploring a dataset interactively), activate the required modules and virtual environment the normal way before invoking your Python code (or the python interpreter in your qlogin session):
module purge module load stack/2022.1 module load py-virtualenv module load matlab/R2021b source someproject/bin/activate # Do Python stuff, use matlab.engine
Building your own complete Python environment
Note that it is also possible to build your own Python environment from scratch, and modify it exactly to your needs. In addition to the packages available from Python.org, there are also pre-packaged versions available which include many oft-used scientific and analytic modules already built in. Two of the most popular of these are Enthought's Canopy, and Anaconda from Continuum Analytics. Both offer free versions that you may download and install. They offer the additional value of modifying your path for you, and also offer feature-rich packaging systems which make adding additional modules painless.