This page presents a list of Frequently Asked Questions (and answers) about using the HPC cluster system at the University of Iowa.
Table of Contents |
---|
Why is my home account full even after I delete data from it?
This occurs because the cluster utilizes snapshots of your home account to provide protection against accidental deletion or corruption, but this means that data isn't actually removed when you delete a file.
Info |
---|
Note that snapshots are not backups as they will expire. |
The snapshots are not an issue for most users but can become problematic if you are running an application that writes and deletes a lot of data from your home account. There are few different potential solutions to this problem.
...
Another option if a scratch volume does not work for your job for some reason is to use /localscratch which is the local hard drive of a compute node. This will likely also be faster than your home account and does not have snapshot backups. Note: This will only work if the data does not need to be available to multiple nodes at once.
...
If none of the above are viable the sysadmin team can turn off snapshots completely on your home account.
Warning |
---|
This means that accidental deletion or corruption of files WILL NOT be recoverable by the sysadmin team! |
How do I access snapshots of my home account?
...
How do I access snapshots of my home account?
Snapshots are available as read-only directories in ~/.zfs/snapshot, each representing the state of your home directory at the time the snapshot was created. To restore a file or directory, use the cp command to copy it from a snapshot directory under the snapshot folder to a writable location in your home account (or elsewhere if you prefer). Here is an example of what you might expect to see during the restore process.:
No Format | ||
---|---|---|
| ||
[brogers@login-0-1:~]$ cd ~/.zfs/snapshot [brogers@login-0-1:snapshot]$ ls zfs-auto-snap.daily-2011-02-10-00.00 zfs-auto-snap.monthly-2010-10-25-08.58 zfs-auto-snap.weekly-2011-01-22-00.00 zfs-auto-snap.daily-2011-02-11-00.00 zfs-auto-snap.monthly-2010-11-01-00.00 zfs-auto-snap.weekly-2011-01-29-00.00 zfs-auto-snap.daily-2011-02-12-00.00 zfs-auto-snap.monthly-2010-12-01-00.00 zfs-auto-snap.weekly-2011-02-01-00.00 zfs-auto-snap.hourly-2011-02-11-16.00 zfs-auto-snap.monthly-2011-01-01-00.00 zfs-auto-snap.weekly-2011-02-08-00.00 zfs-auto-snap.hourly-2011-02-11-20.00 zfs-auto-snap.monthly-2011-02-01-00.00 zfs-auto-snap.hourly-2011-02-12-00.00 zfs-auto-snap.weekly-2011-01-15-00.00 [brogers@login-0-1:snapshot]$ cd zfs-auto-snap.monthly-2011-01-01-00.00/ [brogers@login-0-1:zfs-auto-snap.monthly-2011-01-01-00.00]$ cp computeilos ~/ |
...
The qstat command defaults to showing the status of all jobs. However, to view the status of just your own or another user's jobs, one can pass the '-u' flag to qstat. So, to see the status of jobs submitted by user jdoe:
No Format |
---|
qstat -u jdoe |
...
"Could not connect to
...
session bus" when connecting to Argon using FastX version 2
...
:
When connecting to Argon with FastX version 2 to open a Desktop such as MATE session, sometime you get may see an error saying "cannot Could not connect to
the bus session". This happens particularly if you have installed Anaconda to work with Jupyter on the cluster. Anaconda changes the .bashrc file with the PATH settings and causes the problem in first place. There is a fix available for this particularChange your PATH variable in .bashrc session bus: Failed to connect to socket
" while starting a graphical desktop such as MATE. The most common cause of this issue on Argon is that you have installed Anaconda using its default settings. Anaconda's installer configures your ~/.bashrc file to automatically activate Anaconda during the login process. But the installer also gives priority to Anaconda software, and because Anaconda includes software which interferes with graphical logins, its presence causes them to fail with this error.
In older versions of Anaconda, the installer simply adds its own path at the start of the PATH variable, so you can work around the problem by moving its path to the end, thus giving Anaconda software lower priority. That is, edit your ~/.bashrc to change the definition like so:
FROM:
No Format | ||
---|---|---|
| ||
export PATH="/Users/YOURHAWKID/anaconda2/bin:$PATH” |
...
No Format | ||
---|---|---|
| ||
export PATH="$PATH:/Users/YOURHAWKID/anaconda2/bin” |
The most important change is to take the $PATH variable at the end to the beginning of the PATH settingMore recent versions of Anaconda configure activation in your ~/.bashrc using a very different mechanism, and a fix analogous to the above is less convenient. In this case, you can leave that configuration in place so that Anaconda itself becomes active during login, but reconfigure Anaconda so that the default "base" environment is not automatically activated during login. You can use the standard conda commands in your shell session or job script to activate any environment when you need to use it.
I see jobs pending in my queue from people who do not have access.
...
This is the default behavior for filezilla. Set the timeout to 0 in the settings.
System based programs fail to load after loading environment modules
The environment modules set up the environment for the respective applications. While most library paths are baked in, it is still necessary to provide a hint to the path of libraries for many things. Due to this, the LD_LIBRARY_PATH variable is set to ensure that module packages are finding the correct libraries. Unfortunately, that can cause issues when trying to launch non-module programs, ie., programs that are available on the system without using environment modules. If you see error messages related to libraries when launching a system program you will have to unset LD_LIBRARY_PATH. There are two options:
Launch programs from a session without any modules loaded. The environment can be reset with
No Format module reset
Unset LD_LIBRARY_PATH as part of the command. For example
No Format LD_LIBRARY_PATH='' gedit