ROAR User Guide   »   Job Management and Monitoring
Feedback [ + ]

Job Management and Monitoring

A user can find the job ID, the assigned node(s), and other useful information using the squeue command. Specifically, the following command displays all running and queued jobs for a specific user:

 

$ squeue -u <user>

 

A useful environment variable is the SQUEUE_FORMAT variable and can be set, for example, with the following command:

 

$ export SQUEUE_FORMAT="%.9i %9P %35j %.8u %.2t %.12M %.12L %.5C %.7m %.4D %R"

 

Further details on the usage of this variable are available on Slurm’s squeue documentation page. Another useful job monitoring command is:

 

$ scontrol show job <jobid>

 

Also, a job can be cancelled with

$ scancel <jobid>

 

Valuable information can be obtained by monitoring a job on the compute node(s) as the job runs. Connect to the compute node of a running job with the ssh command. Note that a compute node can only be reached if the user has a resource reservation on that specific node. After connecting to the compute node, the top and ps commands are useful tools.

 

$ ssh <comp-node-id>
$ top -Hu <user>
$ ps -aux | grep <user>