Configuration Reference
CBIcall runs from a YAML parameters file plus CLI runtime settings.
bin/cbicall run -p parameters.yaml -t 4
Unknown YAML keys are rejected, so misspellings fail early instead of being ignored. Analysis configuration is defined in YAML. Runtime controls such as thread count, color output, validation commands, and the CBIcall native runtime profile are selected on the CLI.
A YAML contract is the parameters YAML after CBIcall has validated and
resolved it against the workflow registry and resource catalog. run and
validate-parameters use the same validation and resolution path;
validate-parameters stops before launching the workflow.
mode: single
pipeline: wes
workflow_backend: bash
software_stack: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
Core Keys
| Key | Default | Values | Use |
|---|---|---|---|
mode | single | single, cohort | Selects one-sample processing or cohort-level processing. |
pipeline | wes | wes, wgs, mit; external names are registry-defined | Selects the analysis type. For workflow_provider: nf-core, the value is resolved through the workflow registry. |
workflow_backend | bash | bash, snakemake, nextflow, cromwell | Selects the execution backend supported by the current workflows. |
software_stack | gatk-3.5 | gatk-3.5, gatk-4.6 | Selects the GATK release for CBIcall-native workflows. Use gatk-4.6 for current bundled WES/WGS workflows. |
workflow_provider | cbicall | cbicall, nf-core | Selects whether the workflow is a CBIcall-maintained implementation or an external nf-core workflow. Use workflow_provider: nf-core for external nf-core workflows. |
resource | cbicall-germline-resources-v1 | resource key | Selects one entry from resources/cbicall-resource-catalog.json. |
genome | inferred | b37, hg38, rsrs, external | Reference genome. If omitted, CBIcall uses b37 for WES/WGS, rsrs for mtDNA, and external for nf-core/Sarek. |
input_dir | null | path | Input sample or project directory. Relative paths are resolved from the YAML file location. |
sample_map | null | path | Cohort-mode TSV containing sample IDs and gVCF paths. Relative paths are resolved from the YAML file location. |
input_vcf | null | path | Gathered raw VCF used by native GATK 4.6 cohort cohort_stage: finalize. Relative paths are resolved from the YAML file location. |
project_dir | cbicall | path or prefix | Prefix for the generated run directory. |
output_basename | null | filename stem | Optional basename for generated VCFs. In staged cohort runs this is useful for names such as cohort.chr1. |
cohort_stage | all | all, shard, finalize | Native GATK 4.6 cohort staging mode. all keeps the standard one-job behavior. |
interval_shard | null | contig label | Required for cohort_stage: shard; selects the contig or interval-list shard to joint-genotype. |
cleanup_bam | false | true, false | Deletes intermediate BAM and BAI files after successful WES/WGS single-sample runs. |
qc_coverage_region | chr1 | contig name | Contig used only for the lightweight coverage summary. It does not change variant-calling intervals. |
The resource catalog is the inventory of selectable resource entries and their workflow compatibility metadata.
Compatibility Matrix
Native CBIcall workflows
| Pipeline | Mode | Genome | Software stack | Bash | Snakemake | Nextflow | Cromwell |
|---|---|---|---|---|---|---|---|
| WES | single | b37 | gatk-3.5, gatk-4.6 | V | gatk-4.6 | gatk-4.6 | gatk-4.6 |
| WES | cohort | b37 | gatk-3.5, gatk-4.6 | V | gatk-4.6 | gatk-4.6 | gatk-4.6 |
| WGS | single | b37, hg38 | gatk-4.6 | V | V | V | gatk-4.6 |
| WGS | cohort | b37, hg38 | gatk-4.6 | V | V | V | gatk-4.6 |
| mtDNA | single | rsrs | gatk-3.5 | V | X | X | X |
| mtDNA | cohort | rsrs | gatk-3.5 | V | X | X | X |
CBIcall does not ship a validated gatk-3.5 WGS workflow; native WGS support is provided through the gatk-4.6 stack.
The bundled mtDNA workflow is not supported on ARM / aarch64 because of legacy third-party dependencies.
Registered external workflows
| Pipeline | Mode | Registry source | Release | Resource model |
|---|---|---|---|---|
demo | single | nf-core/demo | 1.1.0 | Nextflow/nf-core managed |
sarek | cohort | nf-core/sarek | 3.8.1 | Nextflow/nf-core managed |
pipeline: mitalways usesgenome: rsrs.- External nf-core pipelines use
genome: external; the reference is selected by the nf-core parameters innfcore_parameters. genome: hg38is supported only withpipeline: wgs.pipeline: wescurrently usesb37.
Input Rules
Single-Sample WES/WGS
Use input_dir pointing to the sample directory containing paired FASTQ files.
mode: single
pipeline: wes
workflow_backend: bash
software_stack: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
Cohort WES/WGS
Use sample_map pointing to a TSV with sample identifiers and gVCF paths.
mode: cohort
pipeline: wes
workflow_backend: bash
software_stack: gatk-4.6
genome: b37
sample_map: ./sample_map.tsv
Staged Cohort Runs
Native GATK 4.6 cohort runs can be split into shard jobs and one finalize job. This is useful when running chromosomes in parallel on a scheduler.
Shard one contig:
mode: cohort
pipeline: wgs
workflow_backend: bash
software_stack: gatk-4.6
genome: hg38
sample_map: ./sample_map.tsv
cohort_stage: shard
interval_shard: chr1
output_basename: cohort.chr1
There is no user-facing workspace key in the parameters YAML. CBIcall controls
GenomicsDB workspace names and creates one workspace per run under
01_genomicsdb/cohort.genomicsdb.<run-id>. Use output_basename for shard-specific
VCF names.
After all raw shard VCFs have been concatenated and indexed, run final filtering:
mode: cohort
pipeline: wgs
workflow_backend: bash
software_stack: gatk-4.6
genome: hg38
cohort_stage: finalize
input_vcf: ./cohort.gathered.gv.raw.vcf.gz
output_basename: cohort
Staged cohort keys are currently supported only with CBIcall-native
software_stack: gatk-4.6, mode: cohort, and workflow_backend set to
bash, snakemake, nextflow, or cromwell. See the WES/WGS cohort page for
a GNU parallel chromosome-sharding example.
mtDNA
mtDNA workflows consume BAMs from previous WES/WGS runs. They do not start from FASTQ files.
mode: single
pipeline: mit
workflow_backend: bash
software_stack: gatk-3.5
input_dir: CNAG999_exome/CNAG99901P_ex
nf-core/demo
nf-core/demo is useful for testing CBIcall's external nf-core support without
modeling a full biological workflow. It uses the nf-core test profile.
mode: single
pipeline: demo
workflow_backend: nextflow
workflow_provider: nf-core
resource: nf-core-demo-managed-resources-v1
nfcore_profile: test,singularity
Use the checked-in test,singularity profile on HPC. Here, test supplies
nf-core's built-in demo inputs and smoke-test settings, while singularity
selects the Singularity/Apptainer runtime. On an x86_64 Docker workstation,
test,docker is also possible. If nfcore_parameters.input is set, it overrides
the input supplied by the test profile. For workstation and cluster runs, see
nf-core External Workflows.
nf-core/Sarek
Sarek is launched as an external nf-core Nextflow workflow. CBIcall validates the
YAML, pins the registered nf-core release, writes a small params file in the run
directory, and leaves Sarek outputs in their native layout under sarek/.
mode: cohort
pipeline: sarek
workflow_backend: nextflow
workflow_provider: nf-core
resource: nf-core-sarek-managed-resources-v1
nfcore_profile: singularity
# nfcore_singularity_cache_dir: nxf-singularity-cache
nfcore_parameters:
input: sarek_samplesheet.csv
genome: GATK.GRCh38
tools: haplotypecaller
skip_tools: haplotypecaller_filter
wes: true
intervals: ../../workflows/nextflow/nf-core/sarek/grch38_chr22_test.bed
max_memory: 30.GB
CBIcall does not interpret Sarek-specific parameters. Values under
nfcore_parameters are passed to the generated nf-core parameters file. Use the
samplesheet format and parameter names expected by the selected Sarek release.
For nf-core/Sarek, the CLI thread value is written to the generated params file
as max_cpus. For example, bin/cbicall run -p nf-core-sarek.yaml -t 6 passes
max_cpus: 6 to Sarek and writes a small Nextflow config with
process.resourceLimits so individual processes do not request more than six
CPUs. Memory caps stay in nfcore_parameters, for example max_memory: 30.GB;
CBIcall writes the same value to Nextflow process.resourceLimits.memory.
On HPC, set nfcore_singularity_cache_dir to a user- or project-owned
directory so the generated Nextflow config points away from unreadable
site-level container libraries. If the HPC module exports NXF_* variables,
keep those exports in the shell or SLURM bootstrap before invoking CBIcall.
For the tiny chr22 smoke test, skip_tools: haplotypecaller_filter avoids a
GATK FilterVariantTranches failure caused by too few overlapping resource
variants. Remove it for production Sarek runs if you want Sarek's default
HaplotypeCaller filtering.
Bundle Provenance
resource selects the external tools and reference data expected for the run.
CBIcall checks that the selected resource is compatible with the resolved
workflow and records resource key, version, and fingerprint provenance in
log.json and run-report.json.
Use Resource Validation for resource checks and Run Comparison to compare repeated runs.
Registry Version
Each workflow registry entry has a CBIcall registry version, currently v1 for
the bundled workflows. Normal YAML files do not need to set this; the registry
provides default_registry_version.
Set registry_version only when a registry entry exposes more than one
registry version and a run must pin a non-default one.
Runtime Profiles
CBIcall runtime profiles are currently a native Bash environment-file feature.
The default profile is local; additional profiles can be declared in the
workflow registry when the same Bash workflow needs more than one env.sh
layout, for example on a shared HPC system. At launch, CBIcall passes the
selected Bash env file through CBICALL_ENV_FILE, and the Bash script sources it
instead of its colocated $BINDIR/env.sh fallback.
Snakemake, Nextflow, Cromwell, and nf-core workflows do not use this Bash env-file
switch. They use their own backend-specific configuration mechanisms, such as
Snakemake config.yaml, Nextflow params/config/profiles, Cromwell WDL inputs,
or nf-core profiles.
Select a non-default CBIcall profile on the CLI:
bin/cbicall run -p parameters.yaml -t 4 --runtime-profile cnag-hpc
Validate the parameters YAML and resolved setup without starting the workflow:
bin/cbicall validate-parameters -p parameters.yaml --runtime-profile cnag-hpc
The profile key is not accepted in the parameters YAML. During a real run, the
resolved profile and selected environment file are written to log.json.
validate-parameters prints the same resolved values without creating a run
directory or log file.
Command Utilities
Most users need only these commands:
| Command | Use |
|---|---|
bin/cbicall run -p parameters.yaml -t 4 | Execute one analysis. |
bin/cbicall validate-parameters -p parameters.yaml | Check one parameters YAML before launch. |
bin/cbicall validate-resources | Check the configured resource catalog and installed bundle. |
bin/cbicall compare-runs RUN_A RUN_B [RUN_C ...] | Compare completed runs. Three or more runs automatically include all-to-all evidence. |
bin/cbicall report RUN_DIR | Summarize one completed run. |
bin/cbicall test --wes-bash -t 1 | Run the minimal shipped WES contract test. |
Advanced flags
| Flag | Use when |
|---|---|
--runtime-profile cnag-hpc | Running a native workflow with a site-specific environment profile. |
--alias A B C | Comparing runs whose directory names are long or opaque. |
--multiqc | Exporting CBIcall summaries as MultiQC custom content. |
--html | Rendering a browser report from an existing run report. |
--refresh | Updating output-derived metadata in run-report.json after files changed. |
| Backend test flags | Optional integration checks; see Integration Tests. |
For a higher-level explanation of pipelines, providers, and execution backends, see Workflows.
Advanced Keys
| Key | Default | Use |
|---|---|---|
registry_version | Registry default, currently v1 | Advanced pin for a specific CBIcall registry version. Leave unset for normal runs. |
snakemake_parameters | {} | Snakemake-specific options. target selects a Snakemake target instead of the default all; other keys are passed through as extra --config key=value entries after CBIcall-managed config values. |
nextflow_parameters | {} | Native CBIcall Nextflow parameters passed as --key value. CBIcall blocks keys it owns, such as pipeline, genome, threads, qc_coverage_region, helper scripts, and cohort workspace settings. |
cromwell_parameters | {} | Native CBIcall Cromwell/WDL inputs for advanced workflow-specific values. CBIcall blocks overrides of inputs it owns, including tool paths, reference paths, sample identity, genome, pipeline, qc_coverage_region, and thread count. |
nfcore_profile | null | nf-core profile passed to external nf-core workflows, for example docker, singularity, or test,singularity. |
nfcore_parameters | {} | Pass-through nf-core parameters written to the generated params file. CBIcall controls outdir and max_cpus. |
nfcore_singularity_cache_dir | null | Optional Singularity/Apptainer image cache directory for external nf-core workflows. CBIcall writes it to the generated Nextflow config as cache and library directories. |
organism | Homo sapiens | Metadata field. |
technology | Illumina HiSeq | Metadata field. |
Use snakemake_parameters, nextflow_parameters, cromwell_parameters, and nfcore_parameters only for parameters owned by that backend or external workflow. CBIcall still owns the compatibility contract and blocks overrides of core values it resolves itself.
Output Directory Naming
Every run gets a generated directory:
<project_dir>_<workflow_backend>_<software_stack>_<pipeline>_<mode>_<genome>_<run-id>/
External nf-core workflows use software_stack: nf-core; the displayed genome
label is inferred from nfcore_parameters.genome when present:
<project_dir>_nextflow_nf-core_<pipeline>_<mode>_<display-genome>_<run-id>/
When input_dir is set, this directory is created inside input_dir.
See Outputs for the files produced by each workflow.