...
- Will you need to recompile your code to run on our cluster?
If you are brining code over from another system, you may need to recompile it to work on our systems, especially if you are using MPI (of which we offer a few different varieties). We have some additional notes on compiling here: Compiling Software - What software will your job need, and is it available centrally, or could it be installed in your home directory?
Our list of installed software is here: Software Installations. If you don't see a package you need, please let us know, and if it is broadly applicable to a number of users, we my install it centrally, or we will help install it into your home directory. - Can you estimate how much memory your job will need?
Knowing approximately how many processes you will need or how much memory to request will help ensure you request enough resources to get your job to complete. One way to discover this is to run a small version of the job to see how much memory it uses and then calculate how much it would use if you were to double or triple it in size. We also offer a small development queue on the HPC cluster that you may submit small jobs to to see how things go, and then tweak your resource requests accordingly.
...
No Format | ||
---|---|---|
| ||
qstat -u aarenas job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 44348 1.00692 BC80_9 aarenas r 01/10/2019 11:21:50 IWA@argon-lc-h21-16.hpc 56 |
Note Note the "job-ID" is in the leftmost column. Other columns of note are the "state" column, which tells you what state your job is in. In this case the job is in "r" or "running" state, which means it has been assigned a node and is running. The "queue" column indicates which queue instance (queue+host) the job is running on. The "slots" column tells how many slots the job is requested. The "prior" field shows the scheduling priority of the job. When a job is in the "running" state, the priority is not really that meaningful, but the value would be more useful for pending jobs. However, syncing the scheduling priority values from the scheduler to the main queue process is a very expensive operation. Due to the size of Argon, and the varied types of jobs that are run on it, the synchronization of the scheduling priorities can cause severe slow downs, and possibly time outs, of SGE commands as SGE spends most of its time updating job priorities for display. Because of this, the synchronization of job priorities from the scheduler to the primary queue process has been turned off. The scheduler still schedules jobs with relative priority as one factor but the values of those priorities are not available to qstat, and display as 0.00000.
Use the qstat -j <jobid> command to view additional details about your job (note the below is abbreviated output):
...
This was a high-level introduction to HPC computing, and there are many topics not covered by this wiki page. Our other wiki pages offer more detail on various aspects of the system and how to use it. We also offer consulting services, so if you have questions about using our resources, HPC in general, or would simply like additional assistance getting started, please do not hesitate to contact our staff: research-computing@uiowa.edu, and one of us will be happy to help you.