Skip to main content

Architecture

CBIcall is a thin orchestration layer around one or more concrete pipelines. Its main responsibilities are:

  • Reading a parameters YAML file
  • Validating required parameters and paths
  • Resolving the selected pipeline and workflow engine
  • Preparing the project directory structure
  • Calling the appropriate workflow scripts (Bash or Snakemake)
  • Managing logs and collecting results in a standard layout

The actual bioinformatics work (alignment, variant calling, QC) is implemented in modular pipelines that can be extended or replaced.


Architecture diagram

CBIcall architecture diagram

CBIcall resolves a validated parameters YAML into a concrete workflow engine and pipeline implementation.


Main components

At a high level, CBIcall consists of:

Python

Execution driver

Reads YAML, validates parameters, builds the run directory, and dispatches the selected workflow.

Registry

Workflow resolver

Maps engine, GATK version, pipeline, and mode to a concrete script or Snakefile.

Workflows

Analysis layer

Runs alignment, variant calling, mtDNA analysis, QC, and report generation.

Run folder

Outputs

Stores BAMs, VCFs, stats, browser reports, logs, and the resolved run metadata.

ComponentRoleMain files or directories
Python execution driverReads the YAML configuration, validates parameters, resolves paths, and dispatches execution to the selected workflow.src/cbicall/config.py, src/cbicall/dnaseq.py
Workflow registryDeveloper-facing map that connects parameters YAML choices (workflow_engine, gatk_version, pipeline, mode, and pipeline implementation version) to concrete workflow scripts. Validate it with bin/cbicall validate-registry after editing.workflows/registry/workflows.yaml, src/cbicall/workflow_registry.py
PipelinesImplement WES, WGS, and mtDNA analyses. A pipeline may provide Bash workflows, Snakemake workflows, or both.workflows/bash/, workflows/snakemake/
Workflow enginesExecute the resolved workflow. Bash is transparent and direct; Snakemake adds rule-based orchestration and partial targets.BashRunner, SnakemakeRunner in src/cbicall/dnaseq.py
Run directoryStores outputs, logs, and log.json for one execution.01_bam/, 02_varcall/, 03_stats/, logs/
External dataProvides third-party tools, reference genomes, known-sites resources, and accessory databases.DATADIR, NGSUTILS, Databases

Directory structure

<project_dir>/
01_bam/
02_varcall/
03_stats/
logs/

Typical usage:

  • Intermediate alignment and BAM files are stored under 01_bam/.
  • Variant-calling outputs (gVCFs, VCFs and related files) are stored under 02_varcall/.
  • Summary statistics and QC metrics are collected under 03_stats/.
  • Log files for all steps are stored under logs/.

Execution model

CBIcall supports two execution modes:

  • Single mode
    Each sample is processed independently.

  • Cohort mode
    Joint analysis using per-sample gVCFs from previous single runs.

The workflow engine is selected in the YAML:

  • workflow_engine: bash
  • workflow_engine: snakemake

Rule of thumb: Bash workflows are direct and transparent. Snakemake workflows are better when rule-based orchestration or partial targets matter.


Supported pipelines

PipelineModeGenomeGATK versionStatus / Notes
WESsingleb37 (default)gatk-3.5, gatk-4.6✓ Supported
WEScohortb37 (default)gatk-3.5, gatk-4.6✓ Supported
WGSsingleb37 (default), hg38gatk-4.6✓ Supported
WGScohortb37 (default), hg38gatk-4.6✓ Supported
MIT (mtDNA)singlersrs (fixed)gatk-3.5⚠ Not supported on ARM / aarch64
MIT (mtDNA)cohortrsrs (fixed)gatk-3.5⚠ Not supported on ARM / aarch64

✓ Fully supported configuration ⚠ Platform limitation

Date: March-2026


Extensibility

New pipelines can be added without modifying the core system:

  • Each pipeline lives under workflows/<engine>/<name>
  • The execution driver resolves the pipeline implementation through the workflow registry
  • Pipelines reuse the same directory layout and logging conventions
  • Pipelines may support single and/or cohort mode

See:

Adding a pipeline