Adding a new pipeline¶
CBIcall is designed to be extensible: you can add new analysis pipelines without changing the core framework as long as the workflow wiring is declared in the workflow registry.
In CBIcall, the source of truth for available workflows is:
workflows/config/cbicall.workflows.yaml(registry; what exists)workflows/schema/workflows.schema.json(schema; what a valid registry looks like)
The Python code validates user parameters (enums, defaults, compatibility rules) and then resolves the correct workflow script from the YAML registry.
1. How CBIcall discovers workflows¶
At runtime, CBIcall:
- Reads the user parameter YAML (e.g.
params.yaml) - Applies defaults and validates semantic rules (pipeline/mode/genome/engine compatibility)
- Loads and schema-validates the workflow registry:
workflows/config/cbicall.workflows.yamlworkflows/schema/workflows.schema.json- Resolves workflow scripts from the registry and checks:
- referenced files exist
- Bash scripts are executable (
+x) - Dispatches the workflow through the selected engine (Bash or Snakemake)
Key point: adding a pipeline is mostly YAML + scripts, not Python.
2. Create workflow scripts¶
Create one or more workflow entrypoints under the engine directory and GATK version:
Bash¶
Snakemake¶
Naming conventions¶
CBIcall follows the pattern:
{pipeline}_{mode}.{sh|smk}pipeline: e.g.wes,wgs,mit,mypipemode:singleorcohort
3. Register the pipeline in the workflow registry¶
Edit workflows/config/cbicall.workflows.yaml and add your pipeline under the appropriate engine + version.
Minimal example (Bash, single mode)¶
workflows:
bash:
base_dir: "workflows/bash"
versions:
gatk-4.6:
common:
parameters: "parameters.sh"
coverage: "coverage.sh"
jaccard: "jaccard.sh"
vcf2sex: "vcf2sex.sh"
pipelines:
mypipe:
single: "mypipe_single.sh"
Adding cohort mode¶
Snakemake example¶
workflows:
bash:
base_dir: "workflows/bash"
versions:
gatk-4.6:
common:
parameters: "parameters.sh"
coverage: "coverage.sh"
jaccard: "jaccard.sh"
vcf2sex: "vcf2sex.sh"
pipelines:
wes:
single: "wes_single.sh"
snakemake:
base_dir: "workflows/snakemake"
versions:
gatk-4.6:
common:
config: "config.yaml"
pipelines:
mypipe:
single: "mypipe_single.smk"
The schema currently requires
workflows.bashto exist, even if you primarily use Snakemake. Keep a minimal bash block if needed.
4. Expose user parameters via the parameter YAML¶
Users select the workflow by providing:
Then add pipeline-specific fields as needed (reference, targets, etc.). The recommended approach is:
- keep “policy” options in YAML (things the user should control)
- keep script-specific internal wiring inside the workflow scripts
Important: Note on the sample parameter
CBIcall uses the value of sample to determine where the runtime directory is created and where input FASTQ files are expected.
- In single mode, the runtime directory is created inside the sample directory.
- In cohort mode, the runtime directory is also created inside the directory passed as
sample(which, in this case, corresponds to the cohort directory).
In all cases, CBIcall expects the input FASTQ files to be located directly under each sample directory (e.g., *_ex/).
This behavior reflects how the current pipelines are implemented. For a concrete example of the expected directory layout, please take a look at one of our pipelines.
5. Single vs cohort mode¶
If you support both modes, register both in the YAML and provide both entrypoints.
If cohort mode does not make sense, omit it from the registry. CBIcall will raise a clear error if a user requests an unavailable mode.
6. Validation and guardrails¶
CBIcall will fail fast when:
pipeline,mode,workflow_engine,gatk_versionare invalid- the pipeline/mode combination is not allowed for the selected GATK version
- incompatible combinations are selected (example:
snakemakewithgatk-3.5if restricted) - a registry entry points to a missing file
- a Bash workflow script is not executable (
+x)
This keeps “what is allowed” stable and “what is wired” configurable.
7. Document the pipeline page¶
Add a new page:
Include:
- requirements (tools, references)
- supported modes
- required YAML parameters
- example command(s)
- output layout
Then add it to mkdocs.yml:
8. Optional: Container support¶
If the pipeline requires extra software, pin versions for reproducibility:
- extend your Dockerfile, or
- publish a new container image
Make sure the workflow scripts remain runnable in the container environment.
Quick checklist¶
- First, take a look to the included workflows at
workflows/{bash|snakemake}/{gatk-version}/and understand how they work - Add workflow script(s) under
workflows/{bash|snakemake}/{gatk-version}/ - Register pipeline in
workflows/config/cbicall.workflows.yaml - Ensure registry passes
workflows/schema/workflows.schema.json - Make Bash scripts executable (
chmod +x) - Add example
params.yaml - Add docs page and link it in
mkdocs.yml