ROAR User Guide   »   Best Practices for Roar Collab Users
Feedback [ + ]

Best Practices for Roar Collab Users

Roar Collab is shared by many users, and a user’s operating behavior can inadvertently impact system functionality for other users. All users must follow a set of best practices which entail limiting activities that may impact the system for other users. Exercise good citizenship to ensure that your activity does not adversely impact the system and the RC research community.

Do Not Run Jobs on the Submit Nodes

RC has a few login nodes that are shared among all users. Dozens, and sometimes hundreds, of users may be logged on at any one time accessing the file systems. Think of the submit nodes as a prep area, where users may edit and manage files, perform file management, initiate file transfers, submit new jobs, and track existing batch jobs. The submit nodes provide an interface to the system and to the computational resources.

The compute nodes are where intensive computations may be performed and where research software may be utilized. All batch jobs and executables, as well as development and debugging sessions, must be run on the compute nodes. To access compute nodes on RC, either submit a batch job or request an interactive session. The Submitting Jobs section of the RC User Guide provides further details on requesting computational resources.

A single user running computationally expensive or disk intensive tasks on a submit node negatively impacts performance for other users. Additionally, since the submit nodes are not configured for intensive computations, the computational performance of such processes is poor. Habitually running jobs on the submit nodes can potentially lead to account suspension.

Do Not Use Scratch as a Primary Storage Location

Scratch serves as a temporary repository for compute output and is explicitly designed for short-term usage. Unlike other storage locations, scratch is not backed up. Files are subject to automatic removal if they are not accessed within a timeframe of 30 days. The Handling Data section of the RC User Guide provides further details on storage options.

Make an Effort to Minimize Resource Requests

The amount of time jobs are queued grows as the amount of requested resources increases. To minimize the amount of time a job is queued, minimize the amount of resources requested. It is best to run small test cases to verify that the computational workflow runs successfully before scaling up the process to a large dataset. The Submitting Jobs section of the RC User Guide provides further details on requesting computational resources.

Remain Cognizant of Storage Quotas

All available storage locations on RC have associated quotas. If the usage of a storage location approaches these quotas, software may not functional nominally and produce cryptic error messages. The Handling Data section of the RC User Guide provides further details on checking storage usage relative to the quotas.

Policies

The policies regarding the use of RC can be found on the ICDS Policies page.