Skip to main content

Bundle v1

CBIcall production workflows require a CBIcall-provided bundle containing third-party executables, reference genomes, known-sites files, interval lists, and auxiliary databases.

The current bundle is selected in the run YAML with:

resource: "cbicall-germline-resources-v1"

CBIcall resolves this key against the resource catalog:

resources/cbicall-resource-catalog.json

The resource catalog can contain different resource types. This page documents the current bundle type and the CBIcall-provided bundle used by the packaged workflows.

The scripts/download_cbicall_bundle.py utility is intentionally scoped to CBIcall-provided bundle entries in the resource catalog. It is not a general-purpose installer for arbitrary local, HPC module, or third-party layouts. The downloader uses the local catalog when it is present. If the script is used standalone and the local catalog is absent, it fetches the canonical catalog URL before selecting the bundle entry.

Bundle Identity

FieldValue
Resource keycbicall-germline-resources-v1
Why this is explicit

The resource key is the identifier users select with resource. log.json records this key and a catalog fingerprint, so two runs can be checked for the same declared external dependency set.

Supported Workflows

EnginePipelineModeGATK version
bashwessinglegatk-3.5
bashwescohortgatk-3.5
bashwessinglegatk-4.6
bashwescohortgatk-4.6
bashwgssinglegatk-4.6
bashwgscohortgatk-4.6
bashmitsinglegatk-3.5
bashmitcohortgatk-3.5
snakemakewessinglegatk-4.6
snakemakewescohortgatk-4.6

Downloaded Files

The production bundle is distributed as a small identifier JSON, split archive parts, and a checksum file.

FilePurpose
cbicall-resource-id.jsonDeclares the resource key and is pinned by SHA-256 in the catalog.
data.tar.gz.md5MD5 checksum file. The current bundle records the split archive parts.
data.tar.gz.part-00Split archive part.
data.tar.gz.part-01Split archive part.
data.tar.gz.part-02Split archive part.
data.tar.gz.part-03Split archive part.
data.tar.gz.part-04Split archive part.
data.tar.gz.part-05Split archive part.

The setup utility verifies the files covered by data.tar.gz.md5, reassembles the parts into an archive, and verifies the downloaded CBIcall-provided bundle payload before extraction.

An optional small remote identifier file can also be used:

{"resource_key": "cbicall-germline-resources-v1"}

When available, this file is named cbicall-resource-id.json. Its SHA-256 can be pinned in the local catalog to confirm that the remote bundle declares the expected resource key.

Expected Layout

After extraction, DATADIR should contain:

DATADIR/
Databases/
NGSutils/

The bundle layout uses these conventional top-level names:

VariableMeaning
DATADIRRoot of the installed CBIcall-provided bundle.
DBDIRDATADIR/Databases
NGSUTILSDATADIR/NGSutils

Workflow-specific files such as Bash env.sh and Snakemake config.yaml resolve the installed bundle layout into concrete executable and reference paths.

Tools

ToolVersionPath hint
GATK 44.6.2.0NGSutils/gatk/gatk-4.6.2.0/gatk
BWA0.7.18NGSutils/bwa-0.7.18/bwa
Samtools0.1.19NGSutils/samtools-0.1.19/samtools
note

Some workflow branches may use architecture-specific executable paths or legacy tool paths. The catalog records the intended bundle identity; the workflow logs and log.json record the concrete paths resolved during a run.

Reference Resources

b37

ResourcePath hint
Reference FASTADatabases/GATK_bundle/b37/references_b37_Homo_sapiens_assembly19.fasta
dbSNPDatabases/dbSNP/human_9606_b144_GRCh37p13/All_20160408.vcf.gz
Mills / 1000G INDELsDatabases/GATK_bundle/b37/b37_Mills_and_1000G_gold_standard.indels.b37.vcf.gz

hg38

ResourcePath hint
Reference FASTADatabases/GATK_bundle/hg38/resources_broad_hg38_v0_Homo_sapiens_assembly38.fasta

rsrs / mtDNA

The mtDNA workflows use MToolBox-related mitochondrial files from the bundle.

Provenance in Runs

The selected resource bundle is stored in log.json under config.resources.bundle.

Example:

{
"key": "cbicall-germline-resources-v1",
"compatible": true,
"fingerprint": "..."
}

Two runs used the same declared external dependency set when their config.resources.bundle.fingerprint values match.

Installation Manifest

The setup utility also writes:

cbicall-resource-installation.json

This local manifest records the installed resource key, archive checksum result, source files, extraction status, and optional remote identifier provenance.

Runtime Check

Before launching a workflow, CBIcall resolves DATADIR from the selected Bash env.sh or Snakemake config.yaml.

If bundle metadata exists beside DATADIR, CBIcall validates it:

FileRuntime check
cbicall-resource-id.jsonResource key must match the selected resource; SHA-256 must match the catalog when pinned.
cbicall-resource-installation.jsonInstalled resource key must match the selected resource; the manifest catalog entry must match the local catalog fingerprint.

This check is intentionally small. It validates the installed bundle identity without hashing the full resource archive on every run.