Triton Shared Computing Cluster supports five queues for job submission: hotel, condo, pdafm, home and glean. Users must have an account so that the accounting system can charge for the execution time.
The base unit for all billable TSCC queues is the Service Unit, or SU. Allocations are defined in terms of the number of SUs available to each account, and accounts may be shared among users.
Charges on condo, hotel and PDAF nodes (i.e. through the condo, hotel, pdafm and home queues) are calculated in SUs, which are measured per processing core per hour. All core hours so used are equivalent such that 1 core–hour = 1 Service Unit.
Jobs are allocated on a per–core basis, and only allocated cores will be charged. Hotel and condo nodes have 16 cores, while PDAF nodes have 32. Jobs can request fewer cores than the node maximum and be charged accordingly, rather than being charged for the entire node. The scheduler may allocate independent jobs for different users simultaneously on the same node in order to optimize queue waits.
The glean queue has no SU charges associated with it. However, it has a low priority and jobs submitted to it are subject to termination at any time.
The following general policies are in effect for scheduling jobs on TSCC:
User accounts are set up to charge jobs against user-specific default account. If preferred, the default can be set to charge against another account, such as a project or shared account based on a TSCC purchase or allocation.
If a job is submitted and the account is depleted below the estimated SUs needed to run, it will be deferred until the account is replenished. The qstat -f command will report a message similar to:
cannot debit job account - no funds
After the account balance is adjusted, the job will be able to run without being resubmitted. It will go into the idle state when the scheduler rechecks balances, and then get scheduled normally.
You can check your account balance and status by running gbalance -u <username>
. This will show the balance of all accounts that you can charge jobs to.
To specify the account to be charged, use the -A
option. It is recommended to use this option with all job submission scripts and qsub
commands, to clearly indicate which account the user wants to be charged for the job.
#PBS -A <account name>
Memory requests are for all nodes combined. Node and core requests are per-node.
The maximum processors value for a queue is the maximum total processors that any single job can request. So these commands would be allowed since the hotel resources_max.proc value is 128:
qsub -q hotel -l nodes=128:np=1
qsub -q hotel -l nodes=8:np=16
but these would block:
qsub -q hotel -l nodes=129:np=1
qsub -q hotel -l nodes=9:np=16
Requests for resources exceeding the available maximums will be deferred and retried by the scheduler. After a limited number of retries, they will be put on hold and require administrator intervention.
Requests for more than the maximum number of nodes will not be rejected, as the scheduler makes no assumptions regarding future node availability. Requests that do not specify a memory size will be given the default amount of memory per node (approximately 4GB/core for hotel and small condo nodes, 8GB/core for large condo nodes, and 16GB for PDAF nodes).
Before a job can be scheduled, the system verifies available credits in the user account. It does not actually charge the account at this time, but SUs (CPU-hour credits) equal to the estimated charges must be available. The system uses values from the job script to estimate these charges according to the following formula:
The formula for hotel queue requests is:
#CPUs x #nodes x wall time
If a job runs more than five minutes beyond its requested wall time, it will be canceled by the system. Job charges that exceed available SUs in the account will not be canceled, but will result in a negative balance that can be credited later.