HPC with Apptainer / Singularity
Containerized Installation (HPC with Apptainer / Singularity)
This section describes how to install and run CBIcall on High-Performance Computing (HPC) systems where Docker is not available, using Apptainer (formerly Singularity).
Apptainer can execute Docker images directly, but container images are read-only by design. As a result, databases, configuration files, and workflows must be stored outside the container and bind-mounted at runtime.
Downloading Required Databases and Software
Note: this process can be lengthy.
Begin by downloading the required databases and software. Save the data outside the container; this preserves it across container runs and avoids repeated downloads.
Install the Python dependency (on the host):
pip3 install --user gdown
Choose a location on the host filesystem where the databases will be stored:
CBICALL_DATA=/software/biomed/cbicall-data
mkdir -p "$CBICALL_DATA"
cd "$CBICALL_DATA"
Download the data preparation script and execute it:
wget https://raw.githubusercontent.com/mrueda/cbicall/refs/heads/main/scripts/download_cbicall_bundle.py
python3 ./download_cbicall_bundle.py --outdir "$CBICALL_DATA"
To verify only the catalog-to-Google-Drive bundle identity before starting the large archive download:
python3 ./download_cbicall_bundle.py \
--outdir "$CBICALL_DATA" \
--verify-resource-id-only
Google Drive may occasionally block or stall automated downloads. If this happens, print the manual download list:
python3 ./download_cbicall_bundle.py \
--outdir "$CBICALL_DATA" \
--print-manual-download
Download every listed file into $CBICALL_DATA, then let the script continue from those files:
python3 ./download_cbicall_bundle.py \
--outdir "$CBICALL_DATA" \
--skip-download
The script will:
- download missing split files when possible
- reassemble
data.tar.gz - verify the split parts or assembled archive with
data.tar.gz.md5 - load the CBIcall resource catalog, locally or from the catalog URL
- optionally verify a small GDrive resource identifier file such as
cbicall-resource-id.json - rename the verified archive using the bundle identity, for example
cbicall-germline-resources-v1.tar.gz - extract the archive into
DATADIR - write
cbicall-resource-installation.jsonwith the installed bundle provenance
If disk space is tight and the checksum has passed, add --remove-parts to remove data.tar.gz.part-* after assembly.
CBIcall keeps the rich resource registry in resources/cbicall-resource-catalog.json. The GDrive bundle only needs a small identifier file, for example cbicall-resource-id.json containing {"resource_key": "cbicall-germline-resources-v1"}. When that identifier file is available, the registry can store its Google Drive file ID and SHA-256 so the downloader can verify that the remote bundle matches the local CBIcall catalog entry.
Installing the Container Image
Loading Apptainer
If your HPC system uses environment modules, load Apptainer first:
module load apptainer
Note: On some HPC systems,
apptaineris already available by default and does not need to be loaded explicitly.
Obtaining the CBIcall Container Image
There are two ways to run the CBIcall container with Apptainer.
For most users and all production workflows, explicitly pulling the image is recommended.
Option A: Pull the image explicitly (apptainer pull) — recommended
Pull the latest CBIcall image from Docker Hub and save it as a Singularity Image File (.sif):
apptainer pull cbicall_latest.sif docker://manuelrueda/cbicall:latest
This command:
- downloads the image once
- creates a reusable
.siffile - avoids reliance on the user cache
- is suitable for batch scheduling and offline execution
This step only needs to be performed once.
Option B: Run directly from Docker Hub (apptainer run)
Apptainer can execute the CBIcall container directly from Docker Hub without explicitly pulling the image first:
apptainer run docker://manuelrueda/cbicall:latest
On first use, the image is automatically downloaded, converted to a Singularity Image File (.sif), and cached locally. Subsequent runs reuse the cached image.
This method does not create a persistent .sif file in the working directory. The cached image is stored in Apptainer’s user cache directory.
This option is suitable for:
- quick testing
- exploratory or interactive use
For reproducible workflows, offline execution, or batch jobs, explicitly pulling the image (Option A) is strongly recommended.
Preparing a Writable CBIcall Directory
The CBIcall installation inside the container (/usr/share/cbicall) is read-only.
To allow configuration changes and execution of tests, create a writable copy in your home directory:
apptainer exec cbicall_latest.sif \
bash -lc 'mkdir -p $HOME/cbicall && cp -a /usr/share/cbicall/. $HOME/cbicall/'
This step only needs to be done once.
Running and Interacting with the Container
Start an interactive shell inside the container, overlaying the writable CBIcall copy and binding the external data directory:
apptainer shell \
--pwd /usr/share/cbicall \
--bind "$HOME/cbicall":/usr/share/cbicall \
--bind "$CBICALL_DATA":/cbicall-data \
cbicall_latest.sif
You will start directly in the CBIcall working directory.
Point CBIcall to /cbicall-data
CBIcall workflows read resource paths from Bash env.sh files and the
Snakemake config.yaml. In Apptainer, bind your external resource directory as
/cbicall-data and point CBIcall to that container path:
sed -i 's|^DATADIR=.*|DATADIR=/cbicall-data|' workflows/bash/gatk-4.6/env.sh
sed -i 's|^DATADIR=.*|DATADIR=/cbicall-data|' workflows/bash/gatk-3.5/env.sh
sed -i 's|^datadir:.*|datadir: "/cbicall-data"|' workflows/snakemake/gatk-4.6/config.yaml
Confirm that CBIcall sees the mounted resources:
bin/cbicall validate-resources
bin/cbicall doctor -p examples/input/param.yaml
Performing Integration Tests
Inside the container, from the CBIcall repository root:
WES
bin/cbicall test --wes -t 1
mtDNA
bin/cbicall test --mit -t 1
Notes
$HOMEinside the container corresponds to your host home directory.- All configuration changes are performed on the writable copy in
$HOME/cbicall. - The container image itself remains immutable, ensuring reproducibility and HPC safety.
How to run a job in Slurm
You can find an example here:
Cleaning Up (Apptainer / Singularity)
Unlike Docker, Apptainer does not manage running containers or image caches. A container image is simply a regular file on disk.
Removing the Container Image
To delete a CBIcall container image (.sif file), simply remove it:
cd # home directory
rm cbicall_latest.sif
If the image is currently in use by a running job, the removal will fail until the job finishes.
Removing the Writable CBIcall Copy (Optional)
If you created a writable copy of the CBIcall installation in your home directory and want to reset it:
rm -rf $HOME/cbicall
This does not affect the container image itself.
Removing Downloaded Databases (Optional)
To remove the externally downloaded CBIcall databases:
rm -rf $CBICALL_DATA
⚠️ This operation is irreversible and will require re-downloading the data.
Summary
- Apptainer images (
.sif) are removed withrm - There is no equivalent to
docker stop,docker rm, ordocker rmi - User data and configuration live outside the container and must be cleaned separately