Adding a new pipeline¶
What you will learn
This page explains how to integrate a new workflow into CBIcall by registering it in the workflow registry and adding the corresponding workflow scripts.
CBIcall is designed to be extensible. New pipelines are added by providing workflow scripts and registering them in the workflow registry.
The components involved are:
workflows/config/cbicall.workflows.yamlโ workflow registry describing available pipelinesworkflows/schema/workflows.schema.jsonโ schema used to validate the registry
The execution driver reads the user configuration, resolves the requested workflow from the registry, creates a runtime directory, and executes the workflow script.
1. How CBIcall runs workflows¶
At runtime, CBIcall:
- Reads the user configuration file (e.g.
params.yaml) - Validates parameters and compatibility rules
- Loads the workflow registry
- Resolves the workflow script corresponding to the selected pipeline
- Creates a runtime directory and launches the workflow
Because the workflow registry defines the available pipelines, adding a new workflow typically involves only a script and a registry entry.
Execution layers¶
The runtime path is split into two responsibilities:
src/cbicall/config.pyvalidates parameters, applies semantic rules, and assembles resolved runtime metadatasrc/cbicall/workflow_registry.pyloads the workflow registry, resolves the workflow entrypoint, and validates referenced filessrc/cbicall/dnaseq.pydispatches execution to an engine-specific runner
The current execution runners are:
BashRunnerSnakemakeRunner
This separation is intentional: adding a new workflow usually means adding scripts plus a registry entry, while adding a new engine should be done by introducing a new runner class rather than growing a single dispatcher.
2. Create workflow scripts¶
Workflow entrypoints are stored under the engine directory and tool version.
Example structure:
or
Engine-specific expectations¶
For Bash workflows:
- the registry must define the common helper scripts required by the selected version
- the pipeline entrypoint is executed directly as a command
GENOMEis exported in the runtime environment
For Snakemake workflows:
- the registry must define a version-specific
config.yaml - CBIcall launches
snakemakewith the resolved Snakefile via-sand the shared config file via--configfile genomeis always passed to Snakemake via--configpipeline,sample_map, andworkspaceare added when required by the selected mode/version- if
workflow_ruleis set, CBIcall uses that rule as the Snakemake target instead ofall
3. Naming conventions¶
Workflow filenames follow the pattern:
Where:
pipelineis the pipeline name (e.g.wes,wgs,mit)modeis eithersingleorcohort
Examples:
4. Register the pipeline¶
Edit the workflow registry:
Example entry:
workflows:
bash:
base_dir: "workflows/bash"
versions:
gatk-4.6:
common:
env: "env.sh"
pipelines:
mypipe:
single: "mypipe_single.sh"
If cohort mode is supported:
5. Minimal pipeline example¶
The following script illustrates a minimal workflow.
Execution context
Workflow scripts are executed inside the runtime directory created by CBIcall.
Input FASTQ files are typically located in the parent sample directory.
Example script:
#!/usr/bin/env bash
set -eu
BINDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RUNDIR=$(pwd)
mkdir -p logs
echo "Running example pipeline in $RUNDIR"
for R1 in ../*_R1_*fastq.gz; do
echo "Found FASTQ: $R1"
done
echo "Pipeline finished"
Make it executable:
6. Select the pipeline in params.yaml¶
Example configuration:
Run:
CBIcall validates the configuration, creates the runtime directory, and executes the workflow script.
If input_dir is defined, the runtime directory is created below that input directory with a generated run identifier appended to the configured project_dir prefix.
7. Validation¶
CBIcall checks that:
- the pipeline is registered in the workflow registry
- the selected execution backend exists
- the selected backend/version/pipeline/mode combination is allowed by the Python validation layer
- referenced workflow scripts are present
- Bash scripts are executable
This ensures that workflows are always executed from registered and validated configurations.
8. Optional Python configuration¶
Most pipelines only require workflow scripts and registry entries.
If additional parameters or configuration options are needed, the Python changes may span several modules depending on the type of change:
src/cbicall/config.pyfor parameter defaults, semantic validation, and resolved runtime metadatasrc/cbicall/workflow_registry.pyfor workflow resolution and registry-related validationsrc/cbicall/models.pyif a new option must become part of the typed internal runtime/config modelsrc/cbicall/dnaseq.pyif the change affects execution behavior or engine-specific command building
For most workflow additions, no Python code changes are required beyond scripts and registry entries.
If you are unsure where a change belongs, inspect the existing implementations in the repository first and follow the closest matching pattern.
Quick checklist¶
- Inspect existing pipelines in
workflows/{bash|snakemake}/{gatk-version}/ - Check the existing Python implementation paths in
src/cbicall/when adding new parameters or execution behavior - Add workflow script(s)
- Register the pipeline in
workflows/config/cbicall.workflows.yaml - Validate registry against the JSON schema
- Make Bash scripts executable (
chmod +x) - Provide example
params.yaml