Job Management and Monitoring
A user can find the job ID, the assigned node(s), and other useful information using the squeue
command. Specifically, the following command displays all running and queued jobs for a specific user:
$ squeue -u <user>
A useful environment variable is the SQUEUE_FORMAT
variable and can be set, for example, with the following command:
$ export SQUEUE_FORMAT="%.9i %9P %35j %.8u %.2t %.12M %.12L %.5C %.7m %.4D %R"
Further details on the usage of this variable are available on Slurm’s squeue documentation page. Another useful job monitoring command is:
$ scontrol show job <jobid>
Also, a job can be cancelled with
$ scancel <jobid>
Valuable information can be obtained by monitoring a job on the compute node(s) as the job runs. Connect to the compute node of a running job with the ssh
command. Note that a compute node can only be reached if the user has a resource reservation on that specific node. After connecting to the compute node, the top
and ps
commands are useful tools.
$ ssh <comp-node-id>
$ top -Hu <user>
$ ps -aux | grep <user>