Skip to main content

Configuration Reference

CBIcall runs from a YAML parameters file plus the CLI thread setting.

bin/cbicall run -p parameters.yaml -t 4

Unknown YAML keys are rejected, so misspellings fail early instead of being ignored. Run configuration is defined in YAML. The CLI supplies runtime controls such as the parameter file, thread count, color output, and validation/test commands; it does not override YAML analysis keys.

Minimal WES single-sample run
mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37

Core Keys

KeyDefaultValuesUse
modesinglesingle, cohortSelects one-sample processing or cohort-level processing.
pipelineweswes, wgs, mitSelects the analysis type.
workflow_enginebashbash, snakemakeSelects the execution backend supported by the current workflows.
profilelocallocal, cnag-hpcSelects the runtime environment file. cnag-hpc uses cnag-hpc-env.sh instead of the default env.sh for Bash workflows.
gatk_versiongatk-3.5gatk-3.5, gatk-4.6Selects the workflow version. Use gatk-4.6 for current WES/WGS workflows.
resourcecbicall-germline-resources-v1resource keySelects one bundle entry from resources/cbicall-resource-catalog.json.
genomeinferredb37, hg38, rsrsReference genome. If omitted, CBIcall uses b37 for WES/WGS and rsrs for mtDNA.
input_dirnullpathInput sample or project directory. Relative paths are resolved from the YAML file location.
sample_mapnullpathCohort-mode TSV containing sample IDs and gVCF paths. Relative paths are resolved from the YAML file location.
project_dircbicallpath or prefixPrefix for the generated run directory.
cleanup_bamfalsetrue, falseDeletes intermediate BAM and BAI files after successful WES/WGS single-sample runs.

Compatibility Matrix

WorkflowSupported
gatk-4.6 + bash + wes single/cohortYes
gatk-4.6 + bash + wgs single/cohortYes
gatk-4.6 + snakemake + wes single/cohortYes
gatk-4.6 + snakemake + wgs single/cohortYes
gatk-3.5 + bash + wes single/cohortLegacy
gatk-3.5 + bash + mit single/cohortYes, x86_64 only
mit + snakemakeNo
gatk-3.5 + snakemakeNo
Genome rules
  • pipeline: mit always uses genome: rsrs.
  • genome: hg38 is supported only with pipeline: wgs.
  • pipeline: wes currently uses b37.

Input Rules

Single-Sample WES/WGS

Use input_dir pointing to the sample directory containing paired FASTQ files.

mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37

Cohort WES/WGS

Use sample_map pointing to a TSV with sample identifiers and gVCF paths.

mode: cohort
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
genome: b37
sample_map: ./sample_map.tsv

mtDNA

mtDNA workflows consume BAMs from previous WES/WGS runs. They do not start from FASTQ files.

mode: single
pipeline: mit
workflow_engine: bash
gatk_version: gatk-3.5
input_dir: CNAG999_exome/CNAG99901P_ex

Bundle Provenance

resource selects the external tools and reference data expected for the run. CBIcall checks that the selected resource is compatible with the resolved workflow and records resource provenance in log.json and run-report.json.

Use Resource Validation for resource checks and Run Comparison to compare repeated runs.

Pipeline Implementation Version

Each workflow registry entry has a CBIcall pipeline implementation version, currently v1 for the bundled workflows. Normal YAML files do not need to set this; the registry provides the default.

Set pipeline_version only when a registry entry exposes more than one implementation and a run must pin a non-default one.

Runtime Profiles

Profiles select the environment mapping used by a workflow. The default profile is local; additional profiles can be declared in the workflow registry when the same workflow needs more than one env.sh layout, for example on a shared HPC system.

Select a non-default profile in YAML:

profile: cnag-hpc

Validate the parameters YAML and resolved setup without starting the workflow:

bin/cbicall validate-param -p parameters.yaml

During a real run, the resolved profile and selected environment file are written to log.json. validate-param prints the same resolved values without creating a run directory or log file.

Command Utilities

CommandUse
bin/cbicall run -p parameters.yaml -t 4Execute a normal analysis run.
bin/cbicall validate-param -p parameters.yamlDry-run preflight for one concrete run. It validates the parameters YAML, workflow, profile env file, and selected resource without launching the workflow.
bin/cbicall validate-resourcesCheck the resource catalog and, optionally, one resource key.
bin/cbicall compare-runs RUN_A RUN_B [RUN_C ...]Compare two or more run directories or run-report.json files.
bin/cbicall test --wes, --mit, or --allRuns the bundled integration examples without remembering the script path.

Advanced Keys

KeyDefaultUse
pipeline_versionRegistry default, currently v1Advanced pin for a specific CBIcall pipeline implementation. Leave unset for normal runs.
workflow_rulenullSnakemake target for a partial run. Leave unset for normal full runs.
allow_partial_runfalseMust be true when workflow_rule is set. This prevents accidental partial starts.
organismHomo sapiensMetadata field.
technologyIllumina HiSeqMetadata field.
Partial runs

Partial runs are intended for targeted Snakemake execution and restarts. If workflow_rule is set without allow_partial_run: true, CBIcall refuses to start.

Output Directory Naming

Every run gets a generated directory:

<project_dir>_<workflow_engine>_<pipeline>_<mode>_<genome>_<gatk_version>_<run-id>/

When input_dir is set, this directory is created inside input_dir. See Outputs for the files produced by each workflow.