Skip to main content

Frequently Asked Questions

WES / WGS

What reference genomes are used?

CBIcall supports:

  • GRCh37 / b37: GATK-compatible reference genome
  • GRCh38 / hg38: GATK-compatible reference genome

For native WES/WGS workflows, support depends on the selected software stack. This table describes CBIcall-shipped workflows, not every possible use of GATK upstream:

StackCBIcall WES referenceCBIcall WGS support/referenceMain use
gatk-3.5b37No CBIcall WGS workflowLegacy Bash WES workflows and mtDNA prerequisites
gatk-4.6b37b37 or hg38Current native WES/WGS workflows
Which exome regions are used for WES?

Native WES interval resources differ between the legacy and current stacks:

StackWES interval resourceUsed for
gatk-3.5Agilent SureSelect hg19 BED files (hg19.chr*.bed and flanked hg19.chr*.flank100bp.bed)BQSR, recalibrated BAM generation, UnifiedGenotyper variant calling, and coverage summaries
gatk-4.6GATK bundle / Broad b37 exome interval list (b37_Broad.human.exome.b37.interval_list)BQSR, HaplotypeCaller, GenotypeGVCFs, and coverage summaries for WES mode

WGS mode does not use WES capture intervals.

mtDNA (MToolBox)

What reference genome is used?

mtDNA workflows use RSRS (rsrs), the Reconstructed Sapiens Reference Sequence.

VCF vs. prioritized variants allele notation
note

In rare cases, the allele reported in prioritized_variants.txt may differ from the ALT allele reported in the VCF.

The Variant_Allele column is generated during annotation and prioritization and does not always follow VCF semantics, where ALT is defined relative to the mapping reference, such as RSRS.

What does GT=1 mean in results?

In variant reports, the Genotype (GT) field shows the observed allele using VCF allele indices:

  • 0 = reference allele
  • 1 = first alternate allele
  • 2, 3, ... = additional alternate alleles in multiallelic records

For CBIcall mtDNA reports, GT=1 means an ALT allele was detected in that sample.

Biological interpretation should use:

  • HF: fraction of reads supporting the ALT allele
  • DP: total read depth at the variant position
tip

For mtDNA, GT tells you which allele was detected, not how much of it was detected. Use HF and DP to interpret heteroplasmy or homoplasmy.

General

What is the difference between native and external workflows?

Native CBIcall workflows are maintained in this repository and launched through a supported workflow backend: Bash, Snakemake, Nextflow, or Cromwell. They use the CBIcall project layout, provenance files, run reports, and usually the CBIcall resource bundle.

External workflows are third-party workflows registered in CBIcall. Today this means selected nf-core workflows launched through Nextflow. CBIcall validates the YAML contract, pins the registered workflow, and writes provenance and run reports, while nf-core keeps its own output layout, profiles, containers, and reference-resource assumptions.

See Workflows, External nf-core, and Resource Validation.

How do I set up cbicall on an HPC system?

On most HPC systems, Docker is not available. CBIcall is designed to run with Apptainer, formerly Singularity, which is the recommended approach for HPC environments.

Apptainer can execute Docker images directly, requires no root privileges, and integrates cleanly with batch schedulers.

In this setup:

  • the container image is read-only
  • configuration files and workflows are stored in a writable host directory
  • native CBIcall resources, when needed, are downloaded outside the container and bind-mounted at runtime

Recommended workflow:

  1. Pull the CBIcall container image using Apptainer.
  2. Create a writable copy of the CBIcall workflow directory.
  3. Choose workflow_provider: nf-core for a no-bundle first run, or download the CBIcall resource bundle for native WES/WGS/mtDNA workflows.
  4. Run the pipeline with the writable copy, adding the data bind only for native workflows.

See HPC with Apptainer / Singularity.

Do you have a Slurm + Apptainer example?

Yes. See the example script:

run_cbicall_apptainer_slurm.sh

How do I cite CBIcall?

Please cite:

CBIcall: a configuration-driven framework for variant calling in large sequencing cohorts. Preprint DOI.