Configuration Reference
CBIcall runs from a YAML parameters file plus the CLI thread setting.
bin/cbicall run -p parameters.yaml -t 4
Unknown YAML keys are rejected, so misspellings fail early instead of being ignored. Run configuration is defined in YAML. The CLI supplies runtime controls such as the parameter file, thread count, color output, and validation/test commands; it does not override YAML analysis keys.
mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
Core Keys
| Key | Default | Values | Use |
|---|---|---|---|
mode | single | single, cohort | Selects one-sample processing or cohort-level processing. |
pipeline | wes | wes, wgs, mit | Selects the analysis type. |
workflow_engine | bash | bash, snakemake | Selects the execution backend supported by the current workflows. |
profile | local | local, cnag-hpc | Selects the runtime environment file. cnag-hpc uses cnag-hpc-env.sh instead of the default env.sh for Bash workflows. |
gatk_version | gatk-3.5 | gatk-3.5, gatk-4.6 | Selects the workflow version. Use gatk-4.6 for current WES/WGS workflows. |
resource | cbicall-germline-resources-v1 | resource key | Selects one bundle entry from resources/cbicall-resource-catalog.json. |
genome | inferred | b37, hg38, rsrs | Reference genome. If omitted, CBIcall uses b37 for WES/WGS and rsrs for mtDNA. |
input_dir | null | path | Input sample or project directory. Relative paths are resolved from the YAML file location. |
sample_map | null | path | Cohort-mode TSV containing sample IDs and gVCF paths. Relative paths are resolved from the YAML file location. |
project_dir | cbicall | path or prefix | Prefix for the generated run directory. |
cleanup_bam | false | true, false | Deletes intermediate BAM and BAI files after successful WES/WGS single-sample runs. |
Compatibility Matrix
| Workflow | Supported |
|---|---|
gatk-4.6 + bash + wes single/cohort | Yes |
gatk-4.6 + bash + wgs single/cohort | Yes |
gatk-4.6 + snakemake + wes single/cohort | Yes |
gatk-4.6 + snakemake + wgs single/cohort | Yes |
gatk-3.5 + bash + wes single/cohort | Legacy |
gatk-3.5 + bash + mit single/cohort | Yes, x86_64 only |
mit + snakemake | No |
gatk-3.5 + snakemake | No |
pipeline: mitalways usesgenome: rsrs.genome: hg38is supported only withpipeline: wgs.pipeline: wescurrently usesb37.
Input Rules
Single-Sample WES/WGS
Use input_dir pointing to the sample directory containing paired FASTQ files.
mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
Cohort WES/WGS
Use sample_map pointing to a TSV with sample identifiers and gVCF paths.
mode: cohort
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
genome: b37
sample_map: ./sample_map.tsv
mtDNA
mtDNA workflows consume BAMs from previous WES/WGS runs. They do not start from FASTQ files.
mode: single
pipeline: mit
workflow_engine: bash
gatk_version: gatk-3.5
input_dir: CNAG999_exome/CNAG99901P_ex
Bundle Provenance
resource selects the external tools and reference data expected for the run.
CBIcall checks that the selected resource is compatible with the resolved
workflow and records resource provenance in log.json and run-report.json.
Use Resource Validation for resource checks and Run Comparison to compare repeated runs.
Pipeline Implementation Version
Each workflow registry entry has a CBIcall pipeline implementation version,
currently v1 for the bundled workflows. Normal YAML files do not need to set
this; the registry provides the default.
Set pipeline_version only when a registry entry exposes more than one
implementation and a run must pin a non-default one.
Runtime Profiles
Profiles select the environment mapping used by a workflow. The default profile is local; additional profiles can be declared in the workflow registry when the same workflow needs more than one env.sh layout, for example on a shared HPC system.
Select a non-default profile in YAML:
profile: cnag-hpc
Validate the parameters YAML and resolved setup without starting the workflow:
bin/cbicall validate-param -p parameters.yaml
During a real run, the resolved profile and selected environment file are written to log.json. validate-param prints the same resolved values without creating a run directory or log file.
Command Utilities
| Command | Use |
|---|---|
bin/cbicall run -p parameters.yaml -t 4 | Execute a normal analysis run. |
bin/cbicall validate-param -p parameters.yaml | Dry-run preflight for one concrete run. It validates the parameters YAML, workflow, profile env file, and selected resource without launching the workflow. |
bin/cbicall validate-resources | Check the resource catalog and, optionally, one resource key. |
bin/cbicall compare-runs RUN_A RUN_B [RUN_C ...] | Compare two or more run directories or run-report.json files. |
bin/cbicall test --wes, --mit, or --all | Runs the bundled integration examples without remembering the script path. |
Advanced Keys
| Key | Default | Use |
|---|---|---|
pipeline_version | Registry default, currently v1 | Advanced pin for a specific CBIcall pipeline implementation. Leave unset for normal runs. |
workflow_rule | null | Snakemake target for a partial run. Leave unset for normal full runs. |
allow_partial_run | false | Must be true when workflow_rule is set. This prevents accidental partial starts. |
organism | Homo sapiens | Metadata field. |
technology | Illumina HiSeq | Metadata field. |
Partial runs are intended for targeted Snakemake execution and restarts. If workflow_rule is set without allow_partial_run: true, CBIcall refuses to start.
Output Directory Naming
Every run gets a generated directory:
<project_dir>_<workflow_engine>_<pipeline>_<mode>_<genome>_<gatk_version>_<run-id>/
When input_dir is set, this directory is created inside input_dir.
See Outputs for the files produced by each workflow.