Skip to main content

Run Comparison

cbicall compare-runs compares completed CBIcall run directories or run-report.json files. Use it to audit whether repeated local, HPC, container, cloud, or backend runs used the same framework, workflow, resources, execution contract, and comparable outputs.

Audit, not biological validation

compare-runs does not prove that two biological analyses are equivalent. It checks the CBIcall execution and output evidence recorded for completed runs. For variant-output reproducibility, start with the normalized VCF fingerprint.

Run It

bin/cbicall compare-runs run_a/ run_b/ --output compare-report.txt

This prints a direct pairwise comparison and writes compare-report.html by default.

Screenshot of a two-run CBIcall compare-runs HTML report, showing the overview for a Bash versus Snakemake comparison.

Keep These Files

For a concise methods or reviewer audit, archive the comparison report plus the run reports that were compared.

FileWhy it matters
compare-report.txtCanonical text audit artifact; easy to diff and archive.
compare-report.htmlStatic browser view with overview, matrices, evidence, and raw text.
run-report.jsonCompact provenance report used by compare-runs.
log.jsonFull resolved configuration, runtime parameters, and resource details.
cbicall-execution-contract.jsonBackend-ready command, generated launch files, and normalized execution fingerprint.
Workflow logExecution log for Bash, Snakemake, Nextflow, or Cromwell.
03_stats/*.vcf.sha256.txtNormalized VCF fingerprint report when produced by the workflow.

What Is Compared

LayerMain evidence
FrameworkCBIcall, Python, Java, configured native Java, and backend versions.
PipelineWorkflow key, registry version, entrypoint, external release, and workflow fingerprint.
ExecutionTask count and peak RSS/VMEM when the backend provides an execution trace.
Execution contractNormalized execution-contract fingerprint, command fingerprint, and generated launch-file hashes.
SoftwareSoftware-version fingerprint from the resource catalog or workflow-reported version table.
Workflow filesEntrypoint and helper/config file paths plus SHA-256 values.
ResourcesResource key, version, and fingerprint from the selected resource catalog entry.
OutputsFile-inventory fingerprint, inventory size, and normalized VCF fingerprints.
Most important output check

For output reproducibility, prioritize the normalized VCF fingerprint. It is computed from VCF records, not raw compressed bytes, so header timestamps, command lines, and compression metadata do not create false differences.

Read The HTML

The text report is the canonical artifact, but the HTML report is easier to scan:

TabUse it for
OverviewQuick run count, status summary, and high-level differences.
Baseline MatrixField-by-field comparison against the first run, when the report includes a baseline view.
Pairwise AuditOne NxN matrix per audit layer, shown for multi-run all-to-all reports; each cell shows a derived category plus Jaccard similarity.
EvidenceBaseline values and compact fingerprints behind the visual summaries.
Raw TextExact terminal-style report embedded in the HTML.
Pairwise audit

The Pairwise Audit tab combines the two comparison signals in one matrix. Each cell combines CBIcall's strict pair status with a Jaccard similarity score over normalized report facts for the same run pair and audit layer. The visible category is derived for readability: same, near, partial, diverged, missing, or n/a; the hover text keeps the exact strict status. This helps triage and cluster comparable runs, but it does not replace exact hash comparisons or biological concordance analyses.

Interpret Differences

Use this order when reading a comparison:

  1. Check Framework and Software to see whether the driver, runtime, or tool table changed.
  2. Check Execution Contract to confirm CBIcall launched the same backend-ready plan.
  3. Check Pipeline and Workflow files to locate changed workflow code or config.
  4. Check Resources to confirm the external dependency bundle matches.
  5. Check Outputs, especially the normalized VCF fingerprint.

If the workflow fingerprint changed but the normalized VCF fingerprint is the same, the compared VCF records match under CBIcall's deterministic comparison rules. The workflow change should still be inspected before claiming full execution identity.

Advanced view override

The default report shape is usually the right one: two runs get a direct comparison, and three or more runs get baseline plus all-to-all views. Use --comparison-view baseline, --comparison-view all-to-all, or --comparison-view both only when you need to force a specific report shape for an automated audit or manuscript figure.

Status Vocabulary

StatusMeaning
sameValues or fingerprints match.
differentValues or fingerprints exist in all compared runs but differ.
missingEvidence is present in only some runs.
noteAudit hint; not treated as a failed reproducibility check.
not availableEvidence is not recorded in any compared run.

Fingerprint Notes

Runtime fingerprints

Workflow and resource fingerprints are computed at runtime from the files and catalog entries actually resolved for the run. CBIcall deliberately does not store expected workflow hashes in the registry or catalog, because harmless comment or formatting edits would otherwise require metadata churn.

The execution-contract fingerprint is normalized by replacing the run directory and run ID with placeholders. This lets repeated runs compare as the same contract even when their output directories differ.

For external nf-core workflows, CBIcall uses canonical output patterns declared in the workflow registry. For example, the Sarek entry points to the HaplotypeCaller VCF under sarek/variant_calling/haplotypecaller/, so repeated Sarek runs can be audited without hard-coding Sarek paths in compare-runs.

The file-inventory fingerprint is path-based, not content-based. It hashes the sorted list of relative file paths in the run directory, excluding generated report files and backend work directories. Use it to audit run-directory layout; use normalized VCF hashes to audit compared variant records.

Inspect One Run

To inspect one completed run without rerunning the workflow:

bin/cbicall report completed_run/

This is read-only by default. Add explicit flags when you want artifacts to be generated or refreshed:

bin/cbicall report completed_run/ --html
bin/cbicall report completed_run/ --refresh -O
bin/cbicall report completed_run/ --refresh --html -O

--html writes run-report.html; --refresh updates output-derived metadata such as the file inventory and VCF hash sidecars in run-report.json. Existing files are not replaced unless -O/--overwrite is supplied.