8. Slurm Jobs

8.1. Jobs and Job Steps

  • A compute job can consist of several steps. For example: you download the files, you down sample and then you do an alignment. These steps are job steps and are invoked by srun from with in the batch script. Each srun command in a batch script can ask for its own set of resources as long as it fits in the allocation. In other words the srun commands in a batch script are bounded by the allocation for the batch script. For example an srun can not ask for more cpu’s than asked for in the sbatch file.

  • Some useful srun flags are:
    • -c, --cpus-per-task=ncpus

      number of cpus required per task

    • -n, --ntasks=ntasks

      number of tasks to run

    • -N, --nodes=N

      number of nodes on which to run (N = min[-max])

    • -o, --output=out

      location of stdout redirection

    • -w, --nodelist=hosts

      request a specific list of hosts

  • Print the name of the node with the command hostname

srun hostname
#omics-cn004  this job was allocated to cn004
  • Carry out this task four times:

srun -n4 hostname
#omics-cn004
#omics-cn004
#omics-cn004
#omics-cn004  again allocation on cn004

Note that the effect of the -n4 flag is that the program (hostname) is automatically started 4 times. For software which runs on multiple nodes and commucates between instances through mpi this is useful. In case of a multi threaded program it is usually not what you want.

  • Now, ask for four nodes

srun -N4 hostname
#omics-cn002
#omics-cn003
#omics-cn001
#omics-cn004  now, cn001, cn002, cn003 and cn004 are allocated
  • Ask for a specific host:

srun -n2 -w omics-cn002 hostname
#omics-cn002
#omics-cn002
  • Output to a file (here: hn.txt that is stored in your current directory)

srun -N3 -n5 -o hn.txt hostname
cat hn.txt
#omics-cn001
#omics-cn001
#omics-cn002
#omics-cn002
#omics-cn003  #001 and 002 are used twice, 003 is used once
  • A job consists in two parts: resource requests and job steps. Resource requests consist in a number of CPUs, computing expected duration, amounts of RAM or disk space, etc. Job steps describe tasks that must be done, software which must be run.

  • The typical way of creating a job is to write a submission script. A submission script is a shell script, e.g. a Bash script, whose comments, if they are prefixed with SBATCH, are understood by Slurm as parameters describing resource requests and other submissions options. You can get the complete list of parameters from the sbatch manpage man sbatch.

  • get hints for writing a job script at the script generator wizzard Script Generator Wizzard Ceci Ignore the cluster names and replace #SBATCH --partition=defq with #SBATCH --partition=all).

8.2. Batch jobs: sbatch

  • Make a text file with the content as in the box below and save it as batch1.sh:
    • it writes the output to the file res.txt

    • it consists of one task

    • it allocates 10 minutes of compute time and 10 MB memory

    • I expect this script to run at least 15 seconds and to print 3 hostnames and 3 dates.

    • I can execute the script with sbatch sbatch1.sh and monitor the script with squeue

#!/bin/bash
#
#SBATCH --job-name=batch1
#SBATCH --output=res.txt
#
#SBATCH --ntasks=1
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=10
#
srun hostname
srun sleep 5
srun date
srun hostname
srun sleep 5
srun date
srun hostname
srun sleep 5
srun date
  • execute the script and monitor it

sbatch batch1.sh
squeue
  • This is the output:

    _images/slurm3.png
    • The job goes through the PENDING state (PD), then enters the RUNNING state (R) and finally goes to the COMPLETED state, or FAILED state.

    • Indeed 3 times hostname and 3 times date some 7 seconds apart

    • The job id issued was 4963

8.3. Parallel Jobs

  • Here, we will only discuss parallel jobs
    • by running several instances of a single-threaded program (so-called embarrassingly parallel paradigm or a job array)

    • by running a multithreaded program (shared memory paradigm, e.g. with OpenMP or pthreads)

  • Other types of parallel jobs: see Ceci - see Going parallel.

  • From this same website: Tasks are requested/created with the –ntasks option, while CPUs, for the multithreaded programs, are requested with the –cpus-per-task option. Tasks can be split across several compute nodes, so requesting several CPUs with the –cpus-per-task option will ensure all CPUs are allocated on the same compute node. By contrast, requesting the same amount of CPUs with the –ntasks option may lead to several CPUs being allocated on several, distinct compute nodes.
    • Multithreaded programs run on one specific compute node: use the –cpus-per-task flag with these programs.

8.3.1. Multithreaded bowtie2 Example

  • Many genomics software use a multithreaded approach. We start with a bowtie2 example:
    • We want to align 2 fastq files from the European Nucleotide Archive to the Mycoplasma G37 genome.

    • Workflow:
      • download the G37 genome to the /scratch directory of the node

      • build the genome index on this /scratch directory

      • download the fastq files to scratch

      • do the alignment

      • store the resulting sam files in your personal directory.

    • Our batch script is below (save it as align_Mycoplasma and run it with sbatch align_Mycoplasma and monitor it with squeue):

#!/bin/bash
#
#SBATCH --job-name=align_Mycoplasma
#SBATCH --output=res_alignjob.txt
#
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=10:00
#SBATCH --mem-per-cpu=2000
#
cd /scratch
srun wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/027/325/GCF_000027325.1_ASM2732v1/GCF_000027325.1_ASM2732v1_genomic.fna.gz -P ./
srun bowtie2-build GCF_000027325.1_ASM2732v1_genomic.fna.gz MG37
srun wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR486/ERR486827/ERR486827_1.fastq.gz -P ./
srun wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR486/ERR486827/ERR486827_2.fastq.gz -P ./
srun bowtie2 -x MG37 -1 ERR486827_1.fastq.gz -2 ERR486827_2.fastq.gz --very-fast -p $SLURM_CPUS_PER_TASK -S /zfs/omics/personal/${USER}/result.sam
  • the output of the job (res_alignjob.txt) is stored where you execute the sbatch. It contains the information that is normally written to your standard output (your screen). In this case, the progress of the download, the progress of the indexing and the alignment summary.

  • the actual result of the alignment (the sam file) is written to your personal directory.

  • the number of threads in the bowtie2 command (job step) is taken from the SLURM variable $SLURM_CPUS_PER_TASK that was given at the start of the job. You could have given any number up to 8 with the -p flag. When you issue a number >8 the job will still be executed with the number of threads defined by $SLURM_CPUS_PER_TASK (in this case 8).

8.4. Multithreaded Example in C

  • The example below is a C code illustration of how treads are forked from a master thread. This idea can be used when you make your own parallelized code.

  • Save the file below as omp_hoi.c:

/******************************************************************************
* FILE: omp_hoi.c
* DESCRIPTION:
*   OpenMP Example - Hello World - C/C++ Version
*   In this simple example, the master thread forks a parallel region.
*   All threads in the team obtain their unique thread number and print it.
*   The master thread only prints the total number of threads.  Two OpenMP
*   library routines are used to obtain the number of threads and each
*   thread's number.
* AUTHOR: Blaise Barney  5/99
* LAST REVISED: 04/06/05
******************************************************************************/
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#
int main (int argc, char *argv[])
{
int nthreads, tid;
/* Fork a team of threads giving them their own copies of variables */
#pragma omp parallel private(nthreads, tid)
{
        /* Obtain thread number */
        tid = omp_get_thread_num();
        printf("Hello World from Crunchomics thread = %d\n", tid);
        /* Only master thread does this */
        if (tid == 0)
        {
                nthreads = omp_get_num_threads();
                printf("Master thread says: number of threads = %d\n", nthreads);
        }
  }  /* All threads join master thread and disband */
}
  • and compile it:

gcc -fopenmp omp_hoi.c -o hoi.omp
  • run it through the following sbatch script:

#!/bin/bash
#SBATCH --job-name=test_omp
#SBATCH --output=res_omp.txt
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=5
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
./hoi.omp
_images/Slurm4.png

8.4.1. Job Arrays: Parallel example with Rscript

  • You can use the sbatch --array=<indexes> parameter to submit a job array, i.e., multiple jobs to be executed with identical parameters. The indexes specification identifies what array index values should be used.

  • Suppose we study 9 chromosomes of a certain organism (for which we have 9 files, 9 R-objects or something of the kind that we want to process).

  • Below a sbatch script is shown that spawns 9 sub-tasks (the array starting with 0 to 8). With each sub-task an R script is run that uses the element of the vector ${CHROMS} as indicated by the index of the array. Thus, 9 Rscript-processes are run, each for a chromosome:

  • copy the code below in a file called parJobBatch.sh

#!/bin/bash
#
#SBATCH --job-name=ParTest
#SBATCH --ntasks=1
#SBATCH --array=0-8
#
CHROMS=("chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chrX" "chrY" "Mit")
#
srun Rscript parJob.R ${CHROMS[$SLURM_ARRAY_TASK_ID]}
  • copy the code below in a file called parJob.R

args=commandArgs(trailingOnly=TRUE)
chromosome=args[1]
#
cat("Start work on",chromosome,"\n")
cat("working ....\n")
# put the code what you want to do with each chromosome here
Sys.sleep(20)
cat("Done work on",chromosome,"\n")
  • The job is started with sbatch parJobBatch.sh

  • The job 9 job steps each have an entry in the queue

_images/slurm6.png
  • Result:
    • 9 files each outputted by a different R process. With content such as:

cat slurm_[jobid]_0.txt

Where the number after the underscore is the jobstep number:

Start work on chr1
working ....
Done work on chr1
  • Rscript was used here to illustrate the --array functionality, any program which has to be run for a number of inputs can be set up in this way.

  • A list of files can be used instead of chromosomes. If there are 20 file to be processed the relevant part of the batch script would look something like:

#SBATCH --array=0-19
FILES=(/zfs/omics/personal/*)
srun program ${FILES[$SLURM_ARRAY_TASK_ID]}

8.4.2. Use Conda Environments on the Compute Nodes

  • Run the flye assembly (see 6.2.1) on a compute node using sbatch.

  • Remark: before you execute the sbatch command, activate the proper conda environmnent. In this case it is necessary to activate nptools because flye was installed in this environment. activate conda nptools The activation will be passed to the compute nodes.

#!/bin/bash
#
#SBATCH --job-name=ecoli_assemble
#SBATCH --output=res_ecoli_assembly.txt
#
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --time=20:00
#SBATCH --mem-per-cpu=32000
#
SCRATCH=/scratch/$USER/ecoli
mkdir -m 700 -p $SCRATCH
cd $SCRATCH
srun wget https://zenodo.org/record/1172816/files/Loman_E.coli_MAP006-1_2D_50x.fasta
date
srun flye --nano-raw Loman_E.coli_MAP006-1_2D_50x.fasta --out-dir /zfs/omics/personal/${USER}/ecoli-batch --threads     $SLURM_CPUS_PER_TASK
date
  • Save this file as assemble_ecoli.sh and run from the headnode sbatch assemble_ecoli.sh

8.5. Interactive Shells Continued

  • For interactive work: use the head node.

  • In some cases it might be convenient to have shell access on the compute node, for instance to look at the memory and cpu allocation of a specific process.
    • limit the duration of this shell by issueing the -t <min>

    • use the -w flag to go to the node you want your shell to live in.

    • have a shell on cn001 for 1 minute:

hostname
#omics-h0.science.uva.nl
srun -n 1  -t 1 --pty -w omics-cn001 bash -i
hostname
#omics-cn001

8.5.1. Example using an interactive shell

  • Re-run the Multithreaded bowtie2 example. Configure to use 2 threads. Use squeue to find out the node on which the job runs. Then, from the head node (a shell for a minute on -in this case- cn001):

sbatch
srun -n 1  -t 1 --pty -w omics-cn001 bash -i
_images/slurm7.png
  • I see with squeue that my alignment script is running as slurm job 7094 on compute node cn001. Hence, I start an interactive shell at compute node 001 (for a minute) and monitor with top that indeed there is a processor load of 200% (2 threads) used by the bowtie2-align-s program.