CBIcall¶
Reproducible germline variant calling for Illumina DNA sequencing
What is CBIcall?¶
CBIcall (CNAG Biomedical Informatics framework for variant calling) is a lightweight, reproducible framework for running Illumina DNA-seq germline variant calling workflows using curated Bash and Snakemake pipelines ๐งฌ.
CBIcall provides a stable CLI, a strictly validated YAML configuration, and a registry-driven dispatcher that ensures only supported and executable workflows are launched.
What CBIcall does¶
CBIcall is an orchestrator (it runs workflows; it does not re-implement bioinformatics tools). It:
- Validates user parameters and compatibility (engine, GATK version, genome, mode).
- Loads a versioned workflow registry (YAML) validated with JSON Schema.
- Resolves workflow scripts and fails fast if referenced files are missing or not executable.
- Creates a per-run project directory with a unique run ID.
- Writes seeing-is-believing metadata (
log.json) with args, resolved config, and parameters. - Executes the selected workflow and captures stdout/stderr into a single log file โ .
All biological processing is performed by external workflows and tools (e.g. BWA/GATK/MToolBox), invoked in a controlled and reproducible manner.
Supported pipelines¶
CBIcall currently supports these workflows:
WES (Whole-Exome Sequencing)¶
- Modes:
single,cohort - GATK:
gatk-3.5,gatk-4.6 - Genome:
b37(default)
WGS (Whole-Genome Sequencing)¶
- Modes:
single,cohort - GATK:
gatk-4.6only - Genomes:
b37(default),hg38
MIT (mtDNA / mitochondrial)¶
- Modes:
single,cohort - Genome: fixed to
rsrs - Not supported on ARM/aarch64 systems
Workflow engines¶
CBIcall dispatches workflows declared in an external registry:
- Bash (fully supported)
- Snakemake (supported with GATK โฅ 4.6)
- Nextflow (declared but not implemented yet)
Selection follows:
Only workflows declared in the registry and present on disk with executable permissions can be executed.
Configuration philosophy¶
CBIcall uses a single YAML parameter file with:
- Explicit defaults
- Strict enum validation
- Fail-fast semantic checks (e.g. invalid genome/pipeline/engine combinations)
- Safe inference where appropriate (e.g. default genome selection)
This helps catch misconfigurations before any compute-heavy work begins.
Reproducibility and traceability¶
For every run, CBIcall automatically:
- Generates a unique run identifier
- Creates a dedicated project directory
- Writes a structured
log.jsoncontaining: - CLI arguments
- Final resolved configuration
- User parameters
- Captures the full workflow stdout/stderr into a single log file
This makes runs easier to audit, reproduce, and debug ๐.