Bundling Serial Jobs on TSCC

How to submit multiple serial jobs with a single script

Occasionally, a group of serial jobs need to be run on TSCC. Rather than submit each job individually, they can be grouped together and submitted using a single batch script procedure such as the one described below.

Overview

Although it's preferable to run parallel codes whenever possible, sometimes that is not cost-effective, or the tasks are simply not parallelizable. In that case, using a procedure like this can save time and effort by organizing multiple serial jobs into a single input file and submitting them all in one step.

The code for this process is given below in a very simple example that uses basic shell commands as the serial tasks. Your complex serial tasks can easily be substituted for those commands and submitted using a modified version of these scripts and run from your own home directory.

Note that the /home filesystem on TSCC uses autofs. Under autofs, filesystems are not always visible to the ls command. If you cd to the/home/beta directory, for example, it will get mounted and become accessible. You can also reference it explicitly, e.g. ls /home/beta, to verify its availability.

Autofs is used to minimize the number of mounts visible to active nodes. All users have their own filesystem for their home directory.

submit.qsub (batch script to run to submit the job)
my_script.pl (perl script invoked by batch job)
jobs-list (input to perl script with names of serial jobs)
getid (executable to obtain the processor number, called by perl script)

Example Batch File

The following is an example script that can be modified to suit users with similar needs. This file is named submit.qsub.

#!/bin/sh
#
#PBS -q hotel
#PBS -m e
#PBS -o outfile
#PBS -e errfile
#PBS -V

###################################################################
### Update the below variables with correct values

### Name your job here again
#PBS -N jobname

### Put your node count and time here
#PBS -l nodes=1:np=5
#PBS -l walltime=00:10:00

### Put your notification E-mail ID here
#PBS -M username@some.domain

### Set this to the working directory
cd /home/beta/scripts/bundling

####################################################################

## Run my parallel job
/opt/openmpi_pgimx/bin/mpirun -machinefile $PBS_NODEFILE -np 5 \
                ./my_script.pl  jobs-list

Example Script and Input Files

The above batch script refers to this script file, named my_script.pl.

#!/usr/bin/perl
#
# This script executes a command from a list of files,
# based on the current MPI id.
#
# Last modified: Mar/11/2005
#

# call getid to get the MPI id number

($myid,$numprocs) = split(/\s+/,`./getid`);
$file_id = $myid;
$file_id++;


# open file and execute appropriate command

$file_to_use = $ARGV[0];
open (INPUT_FILE, $file_to_use) or &showhelp;

for ($i = 1; $i <= $file_id; $i++)
{
        $buf = <INPUT_FILE>;
}

system("$buf");

close INPUT_FILE;


sub showhelp
{
        print "\nUsage: mpiscript.pl <filename>\n\n";
        print "<filename> should contain a list of executables,";
        print " one-per-line, including the path.\n\n";
}

The batch script refers to this input file, named jobs-list.

hostname; date
hostname; ls
hostname; uptime
uptime
uptime > line-5

Sample Output

Running the above script writes output like this to the file <outfile>. Notice that the output lines are non-sequential and may be written to the same file.

 12:20:53 up 3 days,  5:41,  0 users,  load average: 0.92, 1.00, 0.99
compute-0-51.local
compute-0-51.local
Wed Aug 19 12:20:53 PDT 2009
 12:20:53 up 3 days,  5:41,  0 users,  load average: 0.92, 1.00, 0.99
compute-0-51.local
getid  getid.c  jobs-list  line-5  my_script.pl  submit.qsub

Line 5 in the above script writes output like this to the file <line-5>. This output does not appear in the shared output file.

 12:20:53 up 3 days,  5:41,  0 users,  load average: 0.92, 1.00, 0.99

Summary and Potential Other Uses

A modification of this procedure is available from TSCC User Support (member-only list) that matches the number of scripts to the number of processors, when more scripts are being run than processors are available.

It should also be possible to modify this script to run parallel jobs. Feel free to try it or ask support for help through the Discussion List.