Both sides previous revision
Previous revision
|
|
pbilby_ana [2022/08/30 17:10] theoastro |
pbilby_ana [2022/08/30 18:45] (current) theoastro |
#SBATCH -o [PATH_TO_YOUR_OUTDIR]/log_data_analysis/log | #SBATCH -o [PATH_TO_YOUR_OUTDIR]/log_data_analysis/log |
#SBATCH -e [PATH_TO_YOUR_OUTDIR]/log_data_analysis/err | #SBATCH -e [PATH_TO_YOUR_OUTDIR]/log_data_analysis/err |
#SBATCH -D ./ | |
#SBATCH --export=ALL | |
#SBATCH --get-user-env | |
#SBATCH --no-requeue | #SBATCH --no-requeue |
#SBATCH --account=[YOUR_COST_CENTRE] | #SBATCH --account=[YOUR_COST_CENTRE] |
export PMI_MMAP_SYNC_WAIT_TIME=600 | export PMI_MMAP_SYNC_WAIT_TIME=600 |
| |
srun -n $SLURM_NTASKS parallel_bilby_analysis [PATH_TO_YOUR_OUTDIR]/data/inj_data_dump.pickle --nlive 2048 --nact 30 --maxmcmc 10000 --sampling-seed 10130134 --no-plot --check-point-deltaT 36000 --outdir outdir_TPE/result | srun -n $SLURM_NTASKS parallel_bilby_analysis [PATH_TO_YOUR_OUTDIR]/data/inj_data_dump.pickle --nlive 2048 --nact 30 --maxmcmc 10000 --sampling-seed 10130134 --no-plot --check-point-deltaT 36000 --outdir [PATH_TO_YOUR_OUTDIR]/result |
| |
The core command is given in the last line: We ask slurm to run (''srun''), using an amount of ''$SLURM_NTASKS'' nodes, a ''parallel_bilby_analysis'' on all the information stored in the specified ''.pickle''-[[https://docs.python.org/3/library/pickle.html|file]] with all the following arguments being passed to ''parallel_bilby_analysis''. | The core command is given in the last line: We ask slurm to run (''srun''), using an amount of ''$SLURM_NTASKS'' nodes, a ''parallel_bilby_analysis'' on all the information stored in the specified ''.pickle''-[[https://docs.python.org/3/library/pickle.html|file]] with all the following arguments being passed to ''parallel_bilby_analysis''. |
| |
Necessary options are: | Necessary options are: |
* ''--account=[YOUR_COST_CENTRE]'' -- specifies the cluster account to be charged | * ''#SBATCH --account=[YOUR_COST_CENTRE]'' -- specifies the cluster account to be charged |
*''--qos=regular'' -- the //quality of service// is specific to options offered by your cluster and will affect your account's balance | *''#SBATCH --qos=regular'' -- the //quality of service// is specific to options offered by your cluster and will affect your account's balance |
*'' '' | *''#SBATCH --time=24:00:00'' -- the time your job should run at most |
*'' '' | *''#SBATCH --nodes=30'' -- the amount of nodes |
*'' '' | *''#SBATCH --ntasks-per-node=68 '' -- the amount of tasks allocated to each node |
*'' '' | Note that the latter are subject to cluster rules and a poor choice could result in a job pending indefinitely. Generally, it is prefered by ''slurm'' to have more tasks running for a shorter duration. This needs to be balanced against ''parallel_bilby'' discouraging the amount of tasks (the product of ''nodes'' and ''ntasks-per-node'' to significantly exceed the amount of live-points. |
*'' '' | |
Useful options include: | Useful options include: |
* ''#SBATCH --mail-type=BEGIN,END,FAIL'' -- automatic emails are sent when your run begins, ends or fails. | * ''#SBATCH --mail-type=BEGIN,END,FAIL'' -- automatic emails are sent when your run begins, ends or fails. |
*''#SBATCH --no-requeue'' -- keep control over your balance by disallowing administrators to restart jobs, e.g. after node failure | *''#SBATCH --no-requeue'' -- keep control over your balance by disallowing administrators to restart jobs, e.g. after node failure |
*''#SBATCH -D [YOUR_DIR] '' -- specify the directory relative to which other calls are made. Will otherwise assume your current directory. | *''#SBATCH -D [YOUR_DIR] '' -- specify the directory relative to which other calls are made. Will otherwise assume your current directory. |
*'' '' | Many more options are [[https://slurm.schedmd.com/sbatch.html|available]]. |
*'' '' | |
*'' '' | ===Batch commands === |
*'' '' | Your batch commands should include these: |
| module load openmpi |
| module load python |
| conda activate [YOUR_PBILBY_ENVIRONMENT] |
| These commands make sure that your environment is prepared to handle ''parallel_bilby''. |
| |
| export MKL_NUM_THREADS="1" |
| export MKL_DYNAMIC="FALSE" |
| export OMP_NUM_THREADS=1 |
| export MPI_PER_NODE=[YOUR_ntasks-per-node] |
| export PMI_MMAP_SYNC_WAIT_TIME=600 |
| These are [[https://en.wikipedia.org/wiki/Environment_variable#Assignment:_Unix|environment variables]] that affect computation efficiency. These should be editey only carefully. |
| |
| ===Options to Analyis=== |
| There are [[https://lscsoft.docs.ligo.org/parallel_bilby/data_analysis.html|many options]] available to the parallel_bilby_analysis. |
| Some import that are used in the above example include: |
| *''--nlive 2000'' The number of live points governs the overall precision of your inference. It should significantly exceed the expected number of modes in the posterior distribution, being on the order of 1000. |
| *'' --nact 30 --maxmcmc 10000'' These quantities affect [[https://github.com/lscsoft/bilby/blob/b1e02f1dfae03d4939cae9c95eff300c22919689/bilby/core/sampler/dynesty.py#L715|the algorithm]] to generate new live-points. Increasing them will result in better convergence at the cost of higher runtime. Regard these values as convenient defaults. |
| *''--sampling-seed 12345'' A sampling seed guarantees the reproducability of stochastic data. |
| *''--check-point-deltaT 36000'' Clusters can be subject to failures, so you will prefer to have your data saved at convenient checkpoints. Since bilby is very computationally expensive, even saving can cost significant runtime. Therefore, check-points should be taken on the order of hours (but below your time-limit). |
| *''--outdir [PATH_TO_YOUR_OUTDIR]/result'' |
| |
| |