Python
Available Python Environments
The HPC systems offer a variety of Python installs from which users can choose. Use the following command to see what is available:
$ module spider python
Note that the stacks are not equal in terms of extensions installed, although stacks of the same Python version should have the same extensions. Generally, we provide stacks built with the Intel compiler and MKL.
Also, note that you should generally load a Python environment module before using Python in a job script. Otherwise, you will get the system python installation, which is part of the system but lacks many features because it's not maintained as a piece of research software.
Using MKL-linked Python
If using a Python linked against MKL then you have the option of using OpenMP threads. This is controlled with the OMP_NUM_THREADS
environment variable. To avoid inadvertently overloading compute nodes this is set to 1. If your code will perform better with threads then set that variable to the optimal number. Be sure to request an appropriate number of job slots from SGE when adjusting that. So, if your code performs best with 4 threads and using MKL, set
OMP_NUM_THREADS=4
and in the job script
#$ -pe smp 4
Adding Python Modules
We have attempted to add the most commonly known scientific and data analysis modules that are available into the centrally available Python environments. If there is a Python module that you are interested in using, you can see if it is available in one of our offered Python environments. To do so, simply load the desired Python environment, enter the Python help system, and list the installed modules. Alternatively, look at the stack contents on the Wiki, Argon Software List - Python
If the module that you want is not available, then you may install it into your home directory. A few popular methods for doing this are described below.
Install the module into the $HOME/.local
directory.
Using mpi4py as an example, the steps to do this are as follows:
First, figure out if there are any prerequisites that need to be met. In this case, mpi4py requires a working MPI so you must choose the one that most fits your needs (hint: You can invoke the "
module avail
" command to see which MPI are available).Next, load the appropriate supporting modules:
$ module load <choice-of-python> $ module load <choice-of-mpi>
Untar the package:
$ tar xzf mpi4py-1.3.1
Move into the package directory:
Build the package locally:
By default, the above steps will install your local Python module into $HOME/.local/lib/<python-version>
. If you are using a Python version later than 2.6, you need do nothing to modify Python's search path, unless you used something other than the default user install path as illustrated above. If you did modify the install path, you will also need to modify your PYTHONPATH
to add your custom location. It can be configured by adding the modification to your .bash_profile
. (If your default shell is not bash, you will need to modify the corresponding rc script, such as .kshrc, .tcshrc, or .zshrc).
Once complete, you may launch the Python interpreter, enter the help system, and see if your module is now available. If the module is listed, then you may import and use it.
Using pip
Most Python packages are available on PyPI - the Python Package Index: Python Package Index and can be installed using pip
. This will mostly take care of dependencies for you but Pip has some weaknesses in this regard. See User Guide — Pip documentation for more information. For this, you would run
That will download, build, and install the latest version.
Virtual Environment
This requires version 2.7.10 or later of the Python environment modules on Argon. For Python versions in stack versions newer than stack/legacy, the separate py-pip and py-virtualenv modules are also required.
Beginning with version 3.5, Python recommends invoking the venv module to create a virtual environment. To adapt the example shown below, use "python -m venv" instead of the virtualenv command (and you don't need to load Argon's py-virtualenv environment module at all).
A better alternative to installing with Python's --user scheme ($HOME/.local
on Linux) is to use Python virtual environments. Each virtual environment is a complete Python environment with its own dedicated Python interpreter, pip (the installer), and package installation which you can easily modify the system or any other environment independently. This is useful for testing multiple versions independently as you work on projects; in general, it lets you isolate codebases with differing or conflicting requirements. To create a virtual environment and install packages:
Check the Python module in the stack environment
Check which stack provides the preferred Python version and load the stack and the Python:
You'll probably make a few environments for testing and unrelated tasks, and it's common convention to create a directory to keep them organized:
By default, this will create a virtual environment that will be isolated from any packages present outside itself, including only the packages you install into it later:
Note that the name you choose will be a directory containing the configuration of your virtual environment, so name it as you would a directory.
Alternately, if you want the new environment to include whatever packages are present outside itself (for example, those in the environment module you loaded), use the following:This is useful if you want to use the MKL linked numpy/scipy packages in your virtual environment. Note that if you use this option, some packages you try to install later will be unable to meet their dependency requirements unless you also use the "--upgrade" flag with pip. (see the further comments on this option:Python Virtual Environments)
Activate the virtual environment:
This modifies your shell session's environment to contain the Python environment described by this virtual environment configuration.
The command prompt in your Terminal will change to indicate the active environment. It will look like the following:
At this point, you can install whatever Python software that you need without explicitly specifying your virtual environment. This could be either
setup.py
orpip
, just as above. Before installing packages in this virtual environment, it's helpful to ensurepip
,setuptools
, andwheel
are up to date:At this step, it is ready to install the packages that you need.
The
-U
option upgrades all specified packages to the newest available version. Omit this option if you don't want to upgrade packages. For more options withpip install
, see the pip documentation.After you are finished using or modifying the virtual environment, deactivate it with the following command:
This modifies your shell session's environment to remove the active Python environment, and you will find that the “(someProject)” label disappears in your prompt.
When you want to use the environment again later, for example, to use it in a job script or modify its contents, simply source it again the same way to restore its previous state (step 5): With that, the environment is ready to use again. Packages can be added, removed, run, etc.
Conda
Conda is a tool commonly used to install Python software along with dependencies and manage virtual environments on a laptop or workstation. You typically install Conda by downloading and running either the Miniconda installer, which installs the base Conda system so that you can then install any packages you need, or the Anaconda installer, which installs the base Conda system plus a large selection of popular software. For reference, see the Conda User Guide.
Note that Conda provides much of the same functionality already present in the HPC environment, namely environment modules, which provide Python and other software such as pip and virtualenv. Therefore, although it's possible to install Conda in your home account or even a group shared drive, it's fundamentally a different tool compared to others present in the HPC environment. If you need specific Python packages not currently available in the HPC Python environment modules, it's usually possible for the HPC staff to install them upon request.
To install Anaconda in the command line, you should download one of the installers in the Anaconda repo (Index of / ). The files are listed with the naming of “Anaconda3-<release date>-<OS>-<architect>.<extension>”. For example, the Anaconda installer that works on Linux OS for AMD or Intel CPU architecture and that was released in October 2024 is named Anaconda3-2024.10-1-Linux-x86_64.sh
.
On Argon, you can download the installer by the command below:
Execute the installer
In the installation, you will be asked to choose whether it modify your shell script to initiate Conda whenever you log in. It may slow down a little for the login process, so it is up to you. If you select to modify your shell script, the lines for initiation (see below) will be added to your
.bashrc
file. You can comment them out afterward if you do not want to initialize the Conda environment every login process.If you don’t choose the shell modification at the installation but want to set the automatic Conda initialization afterward, you will need to activate Conda first:
This will add the configuration lines above to your
.bashrc
.Once it is activated, you will see the “(base)” indicator in your prompt:
Then, you can create a Conda virtual environment
Once the Conda environment is created, you can activate it like so:
The command prompt in your Terminal will change to indicate the active environment. It will look like the following:
You can search and install the packages you need within the Conda environment. For more information, visit Conda documentation: https://docs.conda.io/projects/conda/en/stable/user-guide/cheatsheet.html
Once you are finished working with the conda environment, you can deactivate like so:
Additional Tip:
List the packages already installed in one of your Conda environments like so:
Remove a particular package from a specific environment like so:
Remove an entire Conda environment like so:
To verify the environment has been removed, list all remaining environments like so:
Similar to Python virtual environments, jobs submitted within the Conda environment can access anything that has been installed within the environment.
Matlab engine for Python
Each version of Matlab is supplied with a version of the Matlab engine for Python compatible only with certain versions of Python. Therefore you'll need to find a compatible pairing of Matlab and Python module versions. Mutually compatible combinations should normally occur within each version of the stack. The following is an example method for installing the engine into a virtual environment using versions available as of stack/2022.1:
In your job script (or your qlogin session, if you're exploring a dataset interactively), activate the required modules and virtual environment the normal way before invoking your Python code (or the python interpreter in your qlogin session):
Building your own complete Python environment
Note that it is also possible to build your own Python environment from scratch, and modify it exactly to your needs. In addition to the packages available from Python.org, there are also pre-packaged versions available which include many oft-used scientific and analytic modules already built in. Two of the most popular of these are Enthought's Canopy, and Anaconda from Continuum Analytics. Both offer free versions that you may download and install. They offer the additional value of modifying your path for you and also offer feature-rich packaging systems which make adding additional modules painless.