Skip to content

CBIcall

CBIcall
Reproducible germline variant calling for Illumina DNA sequencing

What is CBIcall?

CBIcall (CNAG Biomedical Informatics framework for variant calling) is a lightweight, reproducible framework for running Illumina DNA-seq germline variant calling workflows using curated Bash and Snakemake pipelines ๐Ÿงฌ.

CBIcall provides a stable CLI, a strictly validated YAML configuration, and a registry-driven dispatcher that ensures only supported and executable workflows are launched.


What CBIcall does

CBIcall is an orchestrator (it runs workflows; it does not re-implement bioinformatics tools). It:

  • Validates user parameters and compatibility (engine, GATK version, genome, mode).
  • Loads a versioned workflow registry (YAML) validated with JSON Schema.
  • Resolves workflow scripts and fails fast if referenced files are missing or not executable.
  • Creates a per-run project directory with a unique run ID.
  • Writes seeing-is-believing metadata (log.json) with args, resolved config, and parameters.
  • Executes the selected workflow and captures stdout/stderr into a single log file โœ….

All biological processing is performed by external workflows and tools (e.g. BWA/GATK/MToolBox), invoked in a controlled and reproducible manner.


Supported pipelines

CBIcall currently supports these workflows:

WES (Whole-Exome Sequencing)

  • Modes: single, cohort
  • GATK: gatk-3.5, gatk-4.6
  • Genome: b37 (default)

WGS (Whole-Genome Sequencing)

  • Modes: single, cohort
  • GATK: gatk-4.6 only
  • Genomes: b37 (default), hg38

MIT (mtDNA / mitochondrial)

  • Modes: single, cohort
  • Genome: fixed to rsrs
  • Not supported on ARM/aarch64 systems

Workflow engines

CBIcall dispatches workflows declared in an external registry:

  • Bash (fully supported)
  • Snakemake (supported with GATK โ‰ฅ 4.6)
  • Nextflow (declared but not implemented yet)

Selection follows:

engine โ†’ GATK version โ†’ pipeline โ†’ mode

Only workflows declared in the registry and present on disk with executable permissions can be executed.


Configuration philosophy

CBIcall uses a single YAML parameter file with:

  • Explicit defaults
  • Strict enum validation
  • Fail-fast semantic checks (e.g. invalid genome/pipeline/engine combinations)
  • Safe inference where appropriate (e.g. default genome selection)

This helps catch misconfigurations before any compute-heavy work begins.


Reproducibility and traceability

For every run, CBIcall automatically:

  • Generates a unique run identifier
  • Creates a dedicated project directory
  • Writes a structured log.json containing:
  • CLI arguments
  • Final resolved configuration
  • User parameters
  • Captures the full workflow stdout/stderr into a single log file

This makes runs easier to audit, reproduce, and debug ๐Ÿ”.


Getting started

โžก๏ธ Get Started