Adding a Pipeline

CBIcall is built so most new workflows can be added with two things:

Workflow script(s)
A registry entry in workflows/registry/workflows.yaml

Most additions do not need Python changes

If the new pipeline fits the existing concepts of pipeline, mode, workflow_engine, gatk_version, input_dir, and sample_map, start with workflow scripts and the registry only.

What You Are Adding

Before editing files, define the shape of the workflow.

Decision	Options	Why it matters
Engine	`bash`, `snakemake`	Determines where the entrypoint lives and how CBIcall launches it.
Version	`gatk-3.5`, `gatk-4.6`	Determines the workflow subdirectory and shared helper files.
Pipeline name	e.g. `mypipe`	Becomes the value users set as `pipeline: mypipe`.
Mode	`single`, `cohort`, or both	Determines which entrypoint filenames are needed.
Inputs	`input_dir`, `sample_map`, or both	Determines how users configure the run.

For example, a Bash single-sample pipeline named mypipe for GATK 4.6 would use:

workflows/bash/gatk-4.6/mypipe_single.sh

and users would select it with:

pipeline: mypipe
mode: single
workflow_engine: bash
gatk_version: gatk-4.6

How Execution Works

At runtime, CBIcall:

Reads the YAML parameters file
Validates values and compatibility rules
Loads the workflow registry
Resolves the requested workflow script
Creates a run directory
Launches the workflow from inside that run directory

The main implementation layers are:

File	Responsibility
`src/cbicall/config.py`	Parameter defaults, semantic validation, runtime metadata.
`src/cbicall/workflow_registry.py`	Registry loading, workflow resolution, referenced-file validation.
`src/cbicall/dnaseq.py`	Engine-specific execution through `BashRunner` and `SnakemakeRunner`.
`workflows/registry/workflows.yaml`	Declares available workflow scripts.
`workflows/schema/workflows.schema.json`	Validates the registry structure.

What is the registry?

The workflow registry is CBIcall's developer-facing workflow map: workflows/registry/workflows.yaml. It tells CBIcall which Bash or Snakemake file to launch when a parameters YAML selects values such as workflow_engine, gatk_version, pipeline, mode, and pipeline_version in the parameters YAML.

Normal users do not edit this file. Pipeline maintainers edit it when adding or changing workflow implementations. After editing it, run:

bin/cbicall validate-registry

This checks that the workflow registry is structurally valid against workflows/schema/workflows.schema.json; it does not run a workflow.

1. Add the Workflow Entrypoint

Workflow filenames follow:

{pipeline}_{mode}.{sh|smk}

Bash Layout

workflows/bash/gatk-4.6/
  env.sh
  mypipe_single.sh
  mypipe_cohort.sh

Bash workflows are executed directly. CBIcall sets GENOME in the environment and launches the script from inside the generated run directory.

Entrypoint location

CBIcall does not copy Bash workflow scripts into the run directory. It launches the registered script from workflows/bash/... while setting the process working directory to the generated run directory.

This keeps workflow code centralized and keeps helper paths such as env.sh stable through BASH_SOURCE[0]. The tradeoff is that workflow .sh files should not be edited while jobs are running.

Minimal Bash example:

#!/usr/bin/env bash
set -eu

BINDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
RUNDIR="$(pwd)"

source "$BINDIR/env.sh"

mkdir -p logs results

echo "Running mypipe in $RUNDIR" | tee logs/mypipe.log

for R1 in ../*_R1_*fastq.gz; do
  R2="${R1/_R1_/_R2_}"
  echo "Pair: $R1 $R2" >> logs/mypipe.log
done

echo "done" > results/mypipe.done

Make it executable:

chmod +x workflows/bash/gatk-4.6/mypipe_single.sh

Snakemake Layout

workflows/snakemake/gatk-4.6/
  config.yaml
  mypipe_single.smk
  mypipe_cohort.smk

CBIcall launches Snakemake with:

the resolved Snakefile through -s
the shared config file through --configfile
genome through --config
sample_map and workspace for cohort workflows when needed
workflow_rule as the Snakemake target when partial execution is requested

2. Register the Pipeline

Edit:

workflows/registry/workflows.yaml

Add the pipeline under the correct engine and version.

workflows:
  bash:
    base_dir: "workflows/bash"
    versions:
      gatk-4.6:
        helpers:
          env: "env.sh"
        pipelines:
          mypipe:
            single:
              default: "v1"
              versions:
                v1:
                  script: "mypipe_single.sh"

If cohort mode is supported:

pipelines:
  mypipe:
    single:
      default: "v1"
      versions:
        v1:
          script: "mypipe_single.sh"
    cohort:
      default: "v1"
      versions:
        v1:
          script: "mypipe_cohort.sh"

Registry paths

Registry paths are relative to the engine/version directory. For Bash GATK 4.6, mypipe_single.sh resolves below workflows/bash/gatk-4.6/.

Pipeline implementation versions

The registry version above is the tool family version, for example gatk-4.6. The nested v1 is the CBIcall pipeline implementation version. Normal run YAML files do not need to set it because the registry default is used. If a pipeline script changes in a way that may affect outputs, add a new implementation such as v2, point it to the new script, and choose whether to move default to that version.

3. Make the Pipeline Selectable

The registry makes the script discoverable, but Python validation still controls which pipeline, mode, and gatk_version combinations are allowed.

If mypipe is a new pipeline name, update the validation sets in src/cbicall/config.py:

PIPELINE_VALUES = {"wes", "wgs", "mit", "mypipe"}

_ALLOWED_COMBOS = {
    "gatk-4.6": {
        "wes": ["single", "cohort"],
        "wgs": ["single", "cohort"],
        "mypipe": ["single"],
    },
}

If you are only adding a missing mode or script for an existing pipeline and version, the registry entry may be enough.

4. Add a User-Facing YAML Example

Create a minimal parameters file that exercises the new workflow:

mode:            single
pipeline:        mypipe
workflow_engine: bash
gatk_version:    gatk-4.6
input_dir:       SAMPLE01
genome:          b37

Run it:

bin/cbicall run -p mypipe.yaml -t 4

CBIcall should create a run directory similar to:

SAMPLE01/cbicall_bash_mypipe_single_b37_gatk-4.6_<run-id>/

5. Validate the Addition

Check the pipeline at three levels.

Level	What to verify
Registry	The workflow appears under the correct engine/version/pipeline/mode.
Resource catalog	Compatible bundle entries point to real registry workflow keys.
Files	Referenced scripts exist; Bash scripts are executable.
Runtime	CBIcall creates a run directory, writes `log.json`, and produces expected outputs.

Good first checks:

bin/cbicall validate-registry
bin/cbicall validate-resources
bin/cbicall validate-param -p mypipe.yaml
bin/cbicall run -p mypipe.yaml -t 2

Then inspect:

<run-dir>/log.json
<run-dir>/logs/

When Python Changes Are Needed

Most workflow additions only need validation changes in config.py if the pipeline name is new. Broader Python changes are needed when the execution model changes.

Change	Likely file
New YAML key or default	`src/cbicall/config.py`
New value in the typed runtime model	`src/cbicall/models.py`
New registry resolution behavior	`src/cbicall/workflow_registry.py`
New execution engine	`src/cbicall/dnaseq.py`
Different command-line launch behavior	`src/cbicall/dnaseq.py`

New engine

Adding an engine is different from adding a pipeline. Prefer adding a new runner class rather than expanding conditional logic inside an existing runner.

Contributor Checklist

Pick engine, version, pipeline name, and mode.
Inspect the closest existing workflow in workflows/{bash|snakemake}/{gatk-version}/.
Add workflow entrypoint scripts.
Make Bash scripts executable.
Register scripts in workflows/registry/workflows.yaml.
Update Python validation if the pipeline name or compatibility matrix changes.
Add a minimal YAML example.
Run CBIcall and inspect log.json, logs/, and expected outputs.
Add or update tests when validation or execution behavior changes.

Next Steps

Review the execution model in Architecture.
Check YAML behavior in Configuration Reference.
Document generated files in Outputs if the new pipeline produces user-facing outputs.

What You Are Adding​

How Execution Works​

1. Add the Workflow Entrypoint​

Bash Layout​

Snakemake Layout​

2. Register the Pipeline​

3. Make the Pipeline Selectable​

4. Add a User-Facing YAML Example​

5. Validate the Addition​

When Python Changes Are Needed​

Contributor Checklist​

Next Steps​