Non-containerized
Non-containerized installation
Feel free to work with your preferred virtual environment. For this document, we'll move directly to the setup steps.
Method 1: Download from GitHub
Use git clone to get the latest (stable) version:
git clone https://github.com/CNAG-Biomedical-Informatics/cbicall.git
cd cbicall
If you only need to update to the latest version do:
git pull
Install dependencies for Python 3:
python3 -m pip install --upgrade -r requirements.txt
Note: If you are installing
cbicallin an HPC environment for shared use, we recommend installing the required Python 3 modules in a central location. This allows users to simply do:
# Load Python + modules
module load Python/3.10.8-GCCcore-12.2.0
export PYTHONPATH="/software/biomed/cbi_py3/lib/python3.10/site-packages:${PYTHONPATH}"
Testing the deployment:
pytest
Choose a workflow path
CBIcall can now be installed and used before downloading the large CBIcall germline resource bundle.
| Workflow path | Resource bundle required? | Extra runtime |
|---|---|---|
workflow_provider: nf-core | No | Nextflow plus the selected nf-core runtime profile, such as Docker or Singularity/Apptainer. |
| Native CBIcall Bash/Snakemake/Nextflow WES/WGS/mtDNA | Yes | The CBIcall resource bundle installed as DATADIR. |
For a quick nf-core test from a source checkout:
cd examples/input
../../bin/cbicall validate-parameters -p nf-core-demo.yaml --no-color
../../bin/cbicall run -p nf-core-demo.yaml -t 4 --no-color
This uses nf-core's own test data and does not require the CBIcall-provided bundle. Install or load Nextflow and the selected container/runtime profile before running it.
Download the Resource Bundle for Native Workflows
Note: this process can be lengthy.
This section is required for the native CBIcall WES/WGS/mtDNA workflows. It is not required for nf-core provider workflows where resources are managed by Nextflow/nf-core.
Choose a directory where the databases and bundled external tools should be installed. This directory will become your DATADIR.
mkdir -p /absolute/path/to/cbicall-data
python3 $path_to_cbicall/scripts/download_cbicall_bundle.py --outdir /absolute/path/to/cbicall-data
Replace $path_to_cbicall with your CBIcall installation path.
To verify only the catalog-to-Google-Drive bundle identity before starting the large archive download:
python3 $path_to_cbicall/scripts/download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--verify-resource-id-only
Google Drive can be restrictive with large files. If the Python download stalls or fails, print the manual download list:
python3 $path_to_cbicall/scripts/download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--print-manual-download
Download every listed file into /absolute/path/to/cbicall-data, then let the script continue from those files. This step can take time because it assembles, verifies, and extracts the full resource bundle. On a typical VM or workstation disk, expect roughly 20-50 minutes after all parts are present; faster disks may be shorter.
python3 $path_to_cbicall/scripts/download_cbicall_bundle.py \
--outdir /absolute/path/to/cbicall-data \
--skip-download
The script will:
- download missing split files when possible
- reassemble
data.tar.gz - verify the split parts or assembled archive with
data.tar.gz.md5 - load the CBIcall resource catalog, locally or from the catalog URL
- optionally verify a small GDrive resource identifier file such as
cbicall-resource-id.json - rename the verified archive using the bundle identity, for example
cbicall-germline-resources-v1.tar.gz - extract the archive into
DATADIR - write
cbicall-resource-installation.jsonwith the installed bundle provenance
If disk space is tight and the checksum has passed, add --remove-parts to remove data.tar.gz.part-* after assembly.
CBIcall keeps the rich resource registry in resources/cbicall-resource-catalog.json. The GDrive bundle only needs a small identifier file, for example cbicall-resource-id.json containing {"resource_key": "cbicall-germline-resources-v1"}. When that identifier file is available, the registry can store its Google Drive file ID and SHA-256 so the downloader can verify that the remote bundle matches the local CBIcall catalog entry.
Point Native Workflows to your resource directory
Native CBIcall workflows read resource paths from Bash env.sh files and from
Snakemake/Nextflow/Cromwell config.yaml files. In a non-containerized installation,
point those files
to the host directory where you installed the CBIcall-provided resource bundle:
export CBICALL_DATA="/absolute/path/to/cbicall-data"
sed -i "s|^DATADIR=.*|DATADIR=${CBICALL_DATA}|" workflows/bash/gatk-3.5/env.sh
sed -i "s|^datadir:.*|datadir: \"${CBICALL_DATA}\"|" workflows/snakemake/gatk-4.6/config.yaml
The GATK 4.6 Bash env.sh is a symlink to the GATK 3.5 Bash env.sh, so one Bash edit is enough. The native Nextflow and Cromwell configs are symlinks to this shared GATK 4.6 backend config, so one config edit updates Snakemake, native Nextflow, and Cromwell workflows.
Confirm that CBIcall sees the configured resources:
bin/cbicall validate-resources
bin/cbicall validate-parameters -p examples/input/param.yaml
Install Java 17 and the compatibility libraries required by the bundled native tools:
sudo apt install openjdk-17-jdk libncurses5 libtinfo5
GATK 4.6 requires Java 17. The bundled legacy samtools-0.1.19 links against libncurses.so.5.
Performing integration tests
Once you are in the root directory of the repo:
WES:
bin/cbicall test --wes-bash -t 1
mtDNA:
bin/cbicall test --mit-bash -t 1
System requirements
- OS/ARCH supported: linux/amd64 and linux/arm64.
- Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
- Python >= 3.8
- Java 17 for current GATK 4.6 workflows
- 16GB of RAM
- >= 1 core (ideally i7 or Xeon).
- At least 100GB HDD.
Platform Compatibility
This distribution is written in Python 3 and is intended to run on any platform supported by Python 3. It has been tested on Debian Linux and macOS. Please report any issues.