Skip to main content

Cross-Environment Reproducibility

Cross-environment reproducibility checks whether the same CBIcall analysis can recover the same final variant calls across machines and runtime environments. It is separate from truth-set benchmarking, which is covered in GIAB Benchmarking.

To evaluate CBIcall portability and reproducibility, a 1000 Genomes Project WES sample, HG00103 (SRR1596639), was processed with identical workflow definitions across four computational environments. The WES sample used the native Bash GATK 4.6 single-sample workflow on b37 with the cbicall-germline-resources-v1 resource bundle.

Installation smoke test

CBIcall also includes a small repository reference sample used by the installation and backend-equivalence checks. That sample is covered in Integration Tests; it is useful for confirming that an installation works before running larger external data.

Compared Runs

Run labelEnvironmentNotes
gcloudGoogle Cloud Platform VM, Ubuntu 22.04Cloud VM run.
ws5x86_64 workstation, Linux Mint 20.3Local workstation baseline.
ws1macOS ARM64 workstation running Ubuntu 24.04 in a VMHeterogeneous CPU architecture.
hpcSLURM-managed HPC cluster, x86_64 CentOS Linux 7.9.2009Production-style HPC environment.

All runs were compared with cbicall compare-runs using the final QC VCF fingerprints recorded in each run-report.json. The primary evidence is the CBIcall text and HTML comparison report, not the optional MultiQC export.

Result

The final VCF contained 23,562 variant records: 19,578 PASS records and 3,984 non-PASS records. All four runs produced identical final variant calls under the CBIcall call-level VCF fingerprint. The ws5, hpc, and gcloud runs also matched under the stricter full-record fingerprint. The ARM ws1 run differed only under the strict-record fingerprint. Manual inspection localized this to a minor numeric difference in non-call fields for one record (3:196281208 T>C), consistent with environment-level numerical drift rather than a changed variant call.

Screenshot of the 1000 Genomes cross-environment CBIcall Final VCF calls NxN heatmap, showing identical call-level fingerprints across all four environments.

Final VCF calls NxN heatmap from compare-runs.html; all 23,562 call-level records, including PASS and non-PASS records, match across every environment pair.

Comparison targetVCF calls fingerprintVCF strict records fingerprintInterpretation
ws5 baseline727f877de6ec...0a26a976a659e0105d8b...4a5539c0Baseline.
hpc727f877de6ec...0a26a976a659e0105d8b...4a5539c0Matches baseline at call and strict-record level.
gcloud727f877de6ec...0a26a976a659e0105d8b...4a5539c0Matches baseline at call and strict-record level.
ws1727f877de6ec...0a26a9765c03b2a73c25...7407487eCall-equivalent to baseline; strict-record-only numeric drift.
Call-level versus strict VCF hashes

CBIcall records two VCF fingerprints because different questions need different levels of sensitivity.

FingerprintFields hashedExample of change detectedTypical interpretation
callsCHROM, POS, REF, ALT, FILTER, and each sample GT in VCF sample order, across all final VCF records.A variant becomes filtered instead of PASS, a genotype changes from 0/1 to 0/0, or a site appears/disappears.Final reported calls changed. This is the primary reproducibility check for WES/WGS outputs. Because FILTER is hashed, PASS and non-PASS records are both audited.
strict recordsComplete sorted non-header VCF records, including QUAL, INFO, all FORMAT fields, PL, annotations, and other numeric fields.Same PASS and GT, but QUAL shifts from 2267.64 to 2266.64, or PL shifts from 2275,0,3196 to 2274,0,3196.The full VCF record changed. This can detect small numerical drift even when the final call is unchanged.

In this validation, ws1 and ws5 had the same FILTER=PASS and GT=0/1 for 3:196281208 T>C, so the calls hash matched. The strict hash changed because only non-call numeric fields shifted.

Evidence Artifacts

ArtifactUse
compare-runs.txtCanonical text audit report for archiving and diffing.
compare-runs.htmlStatic browser report; this validation uses the Final VCF calls heatmap plus the VCF calls / strict records rows.
compare-runs_mqc/Optional MultiQC custom-content export for projects that already aggregate QC with MultiQC. It is not the primary evidence artifact for this validation.

The evidence bundle can be regenerated from collected run directories:

bin/cbicall compare-runs \
ws5/HG00103/SRR1596639/cbicall_bash_gatk-4.6_wes_single_b37_178178909641565/ \
hpc/HG00103/SRR1596639/cbicall_bash_gatk-4.6_wes_single_b37_178179425143788/ \
gcloud/HG00103/SRR1596639/cbicall_bash_gatk-4.6_wes_single_b37_178185765566264/ \
ws1/HG00103/SRR1596639/cbicall_bash_gatk-4.6_wes_single_b37_178153803413337/ \
--alias ws5 hpc gcloud ws1 \
--output compare-runs.txt \
--html compare-runs.html
Optional MultiQC export

Add --multiqc compare-runs_mqc only if the comparison needs to be included in a larger MultiQC project report. For interpreting this reproducibility check, use compare-runs.txt and compare-runs.html.