Docker
Containerized Installation
Downloading Required Databases and Software
Note: this process can be lenghty.
Begin by downloading the required databases and software. Save the data outside the container; this preserves it across container restarts and lets you update the software without downloading the data again.
Install dependencies for Python 3:
pip3 install gdown
Finally, navigate to a directory where you want the databases stored and execute:
mkdir -p /absolute/path/to/cbicall-data
wget https://raw.githubusercontent.com/mrueda/cbicall/refs/heads/main/scripts/download_cbicall_bundle.py
python3 ./download_cbicall_bundle.py --outdir /absolute/path/to/cbicall-data
To verify only the catalog-to-Google-Drive bundle identity before starting the large archive download:
python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--verify-resource-id-only
Google Drive can be restrictive with large files. If the Python download stalls or fails, print the manual download list:
python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--print-manual-download
Download every listed file into /absolute/path/to/cbicall-data, then let the script continue from those files:
python3 ./download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--skip-download
The script will:
- download missing split files when possible
- reassemble
data.tar.gz - verify the split parts or assembled archive with
data.tar.gz.md5 - load the CBIcall resource catalog, locally or from the catalog URL
- optionally verify a small GDrive resource identifier file such as
cbicall-resource-id.json - rename the verified archive using the bundle identity, for example
cbicall-germline-resources-v1.tar.gz - extract the archive into
DATADIR - write
cbicall-resource-installation.jsonwith the installed bundle provenance
If disk space is tight and the checksum has passed, add --remove-parts to remove data.tar.gz.part-* after assembly.
CBIcall keeps the rich resource registry in resources/cbicall-resource-catalog.json. The GDrive bundle only needs a small identifier file, for example cbicall-resource-id.json containing {"resource_key": "cbicall-germline-resources-v1"}. When that identifier file is available, the registry can store its Google Drive file ID and SHA-256 so the downloader can verify that the remote bundle matches the local CBIcall catalog entry.
Point CBIcall to your resource directory
CBIcall workflows read resource paths from Bash env.sh files and the
Snakemake config.yaml. In Docker, mount your host resource directory as
/cbicall-data and point CBIcall to that container path:
sed -i 's|^DATADIR=.*|DATADIR=/cbicall-data|' workflows/bash/gatk-4.6/env.sh
sed -i 's|^DATADIR=.*|DATADIR=/cbicall-data|' workflows/bash/gatk-3.5/env.sh
sed -i 's|^datadir:.*|datadir: "/cbicall-data"|' workflows/snakemake/gatk-4.6/config.yaml
Method 1: Installing from Docker Hub (fast)
Pull the latest Docker image from Docker Hub:
docker pull manuelrueda/cbicall:latest
docker image tag manuelrueda/cbicall:latest cnag/cbicall:latest
Method 2: Installing from Dockerfile (slow)
Download the Dockerfile from GitHub:
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/cbicall/main/docker/Dockerfile
Then build the container:
-
For Docker version 19.03 and above (supports buildx):
docker buildx build --no-cache -t cnag/cbicall:latest . -
For Docker versions older than 19.03 (no buildx support):
docker build --no-cache -t cnag/cbicall:latest .
Running and Interacting with the Container
# Please update '/absolute/path/to/cbicall-data' with your actual local data path
#docker run -tid --volume /absolute/path/to/cbicall-data:/cbicall-data -e USERNAME=root --name cbicall cnag/cbicall:latest
# Real example
#docker run -tid --volume /media/mrueda/4TBB/cbicall-data:/cbicall-data -e USERNAME=root --name cbicall cnag/cbicall:latest
To connect to the container:
docker exec -ti cbicall bash
Inside the container, confirm that CBIcall sees the mounted resources:
bin/cbicall validate-resources
bin/cbicall doctor -p examples/input/param.yaml
Performing integration tests
Inside the container, from the CBIcall repository root:
WES
bin/cbicall test --wes -t 1
mtDNA
bin/cbicall test --mit -t 1
System requirements
- OS/ARCH supported: linux/amd64 and linux/arm64.
- Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well.
- 16GB of RAM
- >= 1 core (ideally i7 or Xeon).
- At least 100GB HDD.
Common errors: Symptoms and treatment
-
Dockerfile:
-
DNS errors
-
Error: Temporary failure resolving 'foo'
-
-