ποΈ Architecture
OverviewΒΆ
CBIcall is a thin orchestration layer around one or more concrete pipelines. Its main responsibilities are:
- Reading a YAML configuration file
- Validating required parameters and paths
- Selecting the pipeline and workflow engine
- Preparing the project directory structure
- Calling the appropriate workflow scripts (Bash or Snakemake)
- Managing logs and collecting results in a standard layout
The actual bioinformatics work (alignment, variant calling, QC) is implemented in modular pipelines that can be extended or replaced.
Main componentsΒΆ
At a high level, CBIcall consists of:
-
Python wrapper
Parses the YAML configuration, validates parameters, resolves paths and dispatches to the selected pipeline and engine. -
Pipelines
Implement the domain logic for WES, WGS and mtDNA runs. Each pipeline lives in its own directory and can provide Bash and/or Snakemake workflows.
Common parameters are loaded viaparameters.sh. -
Workflow engines
Control execution: - Bash scripts for simple, transparent runs
-
Snakemake workflows for dependency tracking and parallelization
-
Project layout and logs
A standard output structure with separate01_bam/,02_varcall/,03_stats/andlogs/directories, reused across all pipelines. -
External data
Executables and databases for third-party tools, reference genomes and accessory data.
Architecture diagramΒΆ
Directory structureΒΆ
Typical usage:
- Intermediate alignment and BAM files are stored under
01_bam/. - Variant-calling outputs (gVCFs, VCFs and related files) are stored under
02_varcall/. - Summary statistics and QC metrics are collected under
03_stats/. - Log files for all steps are stored under
logs/.
Execution modelΒΆ
CBIcall supports two main execution modes:
-
Single mode
Each sample is processed independently. -
Cohort mode
Joint analysis using per-sample gVCFs from previous single runs.
The workflow engine is selected in the YAML:
workflow_engine: bashworkflow_engine: snakemake
Supported pipelinesΒΆ
The following table shows valid pipeline and mode combinations for each GATK version:
| GATK Version | wes_single | wes_cohort | wgs_single | wgs_cohort | mit_single | mit_cohort |
|---|---|---|---|---|---|---|
| gatk-3.5 | + | + | - | - | + | + |
| gatk-4.6 | + | + | + | + | - | - |
Date: Oct-2025
ExtensibilityΒΆ
New pipelines can be added without modifying the core system:
- Each pipeline lives under
workflows/<name>/ - The wrapper maps
pipeline: <name>to the correct implementation - Pipelines reuse the same directory layout and logging conventions
- Pipelines may support single and/or cohort mode
See: