This is an old revision of the document!
Parallel Bilby Analysis
The Basic Idea
Once everything is set up, parallel_bilby
is ready to go and you might simply hit “parallel_bilby_analysis to finish the job.
However, in a realistic setting of gravitational wave inference, the run will hardly converge in an acceptable amount of time on your local machine. Instead, you will need to work on a computer cluster that handles expensive computations. To allocate its ressources efficiently, most modern clusters use a
slurm-workload manager.
Think of this as visiting a not-so-nice restaurant. There is no chance for you to cook stuff yourself and you will get exactly what you asked for, but only at a time that suits the kitchen's workflow. And changing your can result in further delays, no matter how hungry you are.
===The batch script===
The form of specifying your demands is to submit a batch-script to
slurm Let's call it
analysis.sh. Instead of typing all your commands in the command line, the computer will consecutively execute each line of your
batch-file. See below for an example script:
#!/bin/bash
#SBATCH –qos=regular
#SBATCH –time=24:00:00
#SBATCH –nodes=30
#SBATCH –ntasks-per-node=68
#SBATCH –constraint=knl
#SBATCH -o [PATH_TO_YOUR_OUTDIR]/log_data_analysis/log
#SBATCH -e [PATH_TO_YOUR_OUTDIR]/log_data_analysis/err
#SBATCH -D ./
#SBATCH –export=ALL
#SBATCH –get-user-env
#SBATCH –no-requeue
#SBATCH –account=[YOUR_COST_CENTRE]
#SBATCH –mail-type=BEGIN,END,FAIL
#SBATCH –mail-user=mail@me.com
#SBATCH –job-name=my_fancy_jobname
module load openmpi/4.0.2
module load python
conda activate [YOUR_PBILBY_ENVIRONMENT]
export MKL_NUM_THREADS=“1”
export MKL_DYNAMIC=“FALSE”
export OMP_NUM_THREADS=1
export MPI_PER_NODE=68
export PMI_MMAP_SYNC_WAIT_TIME=600
srun -n $SLURM_NTASKS parallel_bilby_analysis [PATH_TO_YOUR_OUTDIR]/data/inj_data_dump.pickle –nlive 2048 –nact 30 –maxmcmc 10000 –sampling-seed 10130134 –no-plot –check-point-deltaT 36000 –outdir outdir_TPE/result
The core command is given in the last line: We ask slurm to run (
srun), using an amount of
$SLURM_NTASKS nodes, a
parallel_bilby_analysis on all the information stored in the specified
.pickle-file with all the following arguments being passed to
parallel_bilby_analysis.
But instead of executing the command directly, we provide additional commands that are specified in the above lines in our batch-file.
===Required and Useful slurm-options===
''bash'' is a very common shell on linux systems. Essentially, it allows for communication with the operating system's kernel.
The initial
#!/bin/bash is a shebang, telling the computer it is supposed to run all the subsequent commands using
bash. A different way would be to simply execute the script as
bash analysis.sh.
But slurm has a slightly different approach to handling
batch-scripts: To allow for efficient management, it comes with its own
sbatch-command. All lines immediately after the shebang that start with
#SBATCH will be taken as additional arguments until a
–account=[YOUR_COST_CENTRE]bash
-executable line is found.
Necessary options are:
* – specifies the cluster account to be charged
*
–qos=regular – the quality of service is specific to options offered by your cluster and will affect your account's balance
*
*
*
*
*
Useful options include:
* ”#SBATCH –mail-type=BEGIN,END,FAIL“ – automatic emails are sent when your run begins, ends or fails.
*
#SBATCH –mail-user=mail@me.com – the adress to which these are sent
*
#SBATCH –job-name=my_fancy_jobname – a unique jobname that assists you in monitoring your jobs
*
#SBATCH –no-requeue – keep control over your balance by disallowing administrators to restart jobs, e.g. after node failure
*
#SBATCH -D [YOUR_DIR] – specify the directory relative to which other calls are made. Will otherwise assume your current directory.
*
*
*
*
''
Fixing your Batch
You are likely to find yourself every once in a while in a situation that you need to make changes to your submission. While sometimes there is no way to avoid cancelling it in total