๐๏ธ Architecture
Overview¶
CBIcall is a thin orchestration layer around one or more concrete pipelines. Its main responsibilities are:
- Reading a YAML configuration file
- Validating required parameters and paths
- Resolving the selected pipeline and workflow engine
- Preparing the project directory structure
- Calling the appropriate workflow scripts (Bash or Snakemake)
- Managing logs and collecting results in a standard layout
The actual bioinformatics work (alignment, variant calling, QC) is implemented in modular pipelines that can be extended or replaced.
Main components¶
At a high level, CBIcall consists of:
-
Python execution driver
Parses the YAML configuration, validates parameters, resolves paths and dispatches execution to the selected pipeline and workflow engine. -
Pipelines
Implement the variant-calling workflows for WES, WGS and mtDNA analyses. Each pipeline lives in its own directory and can provide Bash and/or Snakemake workflows.
Common parameters are loaded viaenv.sh. -
Workflow engines
Control how the workflow is executed:- Bash scripts for simple, transparent runs
- Snakemake workflows for dependency tracking and parallelization
-
Project layout and logs
A standard output structure with separate01_bam/,02_varcall/,03_stats/andlogs/directories reused across all pipelines. -
External data
Executables and databases for third-party tools, reference genomes and accessory data.
Architecture diagram¶

Directory structure¶
Typical usage:
- Intermediate alignment and BAM files are stored under
01_bam/. - Variant-calling outputs (gVCFs, VCFs and related files) are stored under
02_varcall/. - Summary statistics and QC metrics are collected under
03_stats/. - Log files for all steps are stored under
logs/.
Execution model¶
CBIcall supports two execution modes:
-
Single mode
Each sample is processed independently. -
Cohort mode
Joint analysis using per-sample gVCFs from previous single runs.
The workflow engine is selected in the YAML:
workflow_engine: bashworkflow_engine: snakemake
Supported pipelines¶
| Pipeline | Mode | Genome | GATK version | Status / Notes |
|---|---|---|---|---|
| WES | single |
b37 (default) |
gatk-3.5, gatk-4.6 |
โ Supported |
| WES | cohort |
b37 (default) |
gatk-3.5, gatk-4.6 |
โ Supported |
| WGS | single |
b37 (default), hg38 |
gatk-4.6 |
โ Supported |
| WGS | cohort |
b37 (default), hg38 |
gatk-4.6 |
โ Supported |
| MIT (mtDNA) | single |
rsrs (fixed) |
gatk-3.5 |
โ Not supported on ARM / aarch64 |
| MIT (mtDNA) | cohort |
rsrs (fixed) |
gatk-3.5 |
โ Not supported on ARM / aarch64 |
โ Fully supported configuration โ Platform limitation
Date: March-2026
Extensibility¶
New pipelines can be added without modifying the core system:
- Each pipeline lives under
workflows/<engine>/<name> - The execution driver resolves the pipeline implementation through the workflow registry
- Pipelines reuse the same directory layout and logging conventions
- Pipelines may support single and/or cohort mode
See: