Skip to main content

Docker

Containerized Installation

Install CBIcall with Docker

CBIcall installation and CBIcall resource installation are separate steps. Pull or build the CBIcall image first; download the native CBIcall resource bundle only if the workflow path you choose needs it.

Method 1: Installing from Docker Hub (fast)

Pull the latest Docker image from Docker Hub:

docker pull manuelrueda/cbicall:latest
docker image tag manuelrueda/cbicall:latest cnag/cbicall:latest

Method 2: Installing from Dockerfile (slow)

Download the Dockerfile from GitHub:

wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/cbicall/main/docker/Dockerfile

Then build the container:

The Dockerfile installs the CBIcall Python dependencies, including Snakemake, plus pinned Nextflow, Cromwell, and WOMtool launchers. Bash workflows do not need those engines, but the image can run the packaged Snakemake, Nextflow, and Cromwell WES/WGS workflows without extra engine installation.

  • For Docker version 19.03 and above (supports buildx):

    docker buildx build --no-cache -t cnag/cbicall:latest .
  • For Docker versions older than 19.03 (no buildx support):

    docker build --no-cache -t cnag/cbicall:latest .

Choose a Workflow Path

Workflow pathNeeds /cbicall-data?Notes
workflow_provider: nf-coreNo CBIcall bundle requiredNextflow/nf-core manage their own test data, references, and containers. The selected runtime profile must still work in your environment.
Native CBIcall Bash/Snakemake/Nextflow/Cromwell WES/WGS/mtDNAYesMount the CBIcall resource bundle as /cbicall-data and configure DATADIR.

For Docker-based nf-core tests, make sure the selected nf-core profile can run from your container setup. Many users run nf-core directly on the host with Docker or on HPC with Singularity/Apptainer.

Download the Resource Bundle for Native Workflows

Note: this process can be lengthy.

Begin by downloading the required databases and software for native CBIcall workflows. Save the data outside the container; this preserves it across container restarts and lets you update the software without downloading the data again. Skip this section when you only want to validate or run registered nf-core provider workflows.

Install dependencies for Python 3:

pip3 install gdown

Finally, navigate to a directory where you want the databases stored and execute:

mkdir -p /absolute/path/to/cbicall-data
wget https://raw.githubusercontent.com/mrueda/cbicall/refs/heads/main/scripts/download_cbicall_bundle.py
python3 ./download_cbicall_bundle.py --outdir /absolute/path/to/cbicall-data

To verify only the catalog-to-Google-Drive bundle identity before starting the large archive download:

python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--verify-resource-id-only

Google Drive can be restrictive with large files. If the Python download stalls or fails, print the manual download list:

python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--print-manual-download

Download every listed file into /absolute/path/to/cbicall-data, then let the script continue from those files. This step can take time because it assembles, verifies, and extracts the full resource bundle. On a typical VM or workstation disk, expect roughly 20-50 minutes after all parts are present; faster disks may be shorter.

python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--skip-download

The script will:

  • download missing split files when possible
  • reassemble data.tar.gz
  • verify the split parts or assembled archive with data.tar.gz.md5
  • load the CBIcall resource catalog, locally or from the catalog URL
  • optionally verify a small GDrive resource identifier file such as cbicall-resource-id.json
  • rename the verified archive using the bundle identity, for example cbicall-germline-resources-v1.tar.gz
  • extract the archive into DATADIR
  • write cbicall-resource-installation.json with the installed bundle provenance

If disk space is tight and the checksum has passed, add --remove-parts to remove data.tar.gz.part-* after assembly.

CBIcall keeps the rich resource registry in resources/cbicall-resource-catalog.json. The GDrive bundle only needs a small identifier file, for example cbicall-resource-id.json containing {"resource_key": "cbicall-germline-resources-v1"}. When that identifier file is available, the registry can store its Google Drive file ID and SHA-256 so the downloader can verify that the remote bundle matches the local CBIcall catalog entry.

Running and Interacting with the Container

For native CBIcall workflows:

# Please update '/absolute/path/to/cbicall-data' with your actual local data path
docker run -tid --volume /absolute/path/to/cbicall-data:/cbicall-data -e USERNAME=root --name cbicall cnag/cbicall:latest

For nf-core-only validation, the CBIcall bundle mount is not required:

docker run -tid -e USERNAME=root --name cbicall cnag/cbicall:latest

The container/runtime setup must still support the nf-core profile you select.

To connect to the container:

docker exec -ti cbicall bash

Point Native Workflows to /cbicall-data

Run these commands inside the container after docker exec. Native CBIcall workflows read resource paths from Bash env.sh files and from Snakemake/Nextflow/Cromwell config.yaml files stored in the container's CBIcall checkout. Because the host resource directory was mounted as /cbicall-data, point the workflow configuration files to that container path:

sed -i 's|^DATADIR=.*|DATADIR=/cbicall-data|' workflows/bash/gatk-3.5/env.sh
sed -i 's|^datadir:.*|datadir: "/cbicall-data"|' workflows/snakemake/gatk-4.6/config.yaml

The GATK 4.6 Bash env.sh is a symlink to the GATK 3.5 Bash env.sh, so one Bash edit is enough. The native Nextflow and Cromwell configs are symlinks to this shared GATK 4.6 backend config, so one config edit updates Snakemake, native Nextflow, and Cromwell workflows.

Then confirm that CBIcall sees the mounted resources:

bin/cbicall validate-resources
bin/cbicall validate-parameters -p examples/input/param.yaml

Performing integration tests

Inside the container, from the CBIcall repository root:

WES

bin/cbicall test --wes-bash -t 1

mtDNA

bin/cbicall test --mit-bash -t 1

System requirements

  • OS/ARCH supported: linux/amd64 and linux/arm64.
  • Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well.
  • 16GB of RAM
  • >= 1 core (ideally i7 or Xeon).
  • At least 100GB HDD.

Common errors: Symptoms and treatment