In order to run interactive parallel batch jobs on TSCC, use the command:
qsub -I
which will provide a login to the launch node as well as the PBS_NODEFILE file with all nodes assigned to the interactive job.
Other qsub
options can be used, such as those described by the man qsub
command.
As with any job, the interactive job will wait in the queue until the specified number of nodes becomes available. Requesting fewer nodes and shorter wall clock times may reduce the wait time because the job can more easily backfill among larger jobs.
The showbf
command gives information on available time slots:
Partition Tasks Nodes Duration StartOffset StartDate --------- ----- ----- ------------ ------------ -------------- ALL 8 8 INFINITY 00:00:00 13:45:30_04/03
This command will provide an accurate prediction of when the submitted job will be allowed to run.
The exit
command will end the interactive job.
To run an interactive job with a wall clock limit of 30 minutes using two nodes and two processors per node:
$ qsub -I -V -l walltime=00:30:00 -l nodes=2:ppn=2
qsub: waiting for job 75.tscc-login.sdsc.edu to start
qsub: job 75.tscc-login.sdsc.edu ready
$ echo $PBS_NODEFILE
/opt/torque/aux/75.tscc-login.sdsc.edu
$ more /opt/torque/aux/75.tscc-login.sdsc.edu
compute-0-31
compute-0-31
compute-0-25
compute-0-25
$ mpirun -machinefile /opt/torque/aux/75.triton-42.sdsc.edu -np 4 <hostname>
compute-0-25.local
compute-0-25.local
compute-0-31.local
compute-0-31.local
To run an interactive job with a wall clock limit of 30 minutes using two PDAF nodes and 32 processors per node:
qsub -I -q pdafm -l walltime=00:30:00 -l nodes=2:ppn=32