Frequently Asked Questions¶

WES / WGS¶

What are the reference genomes used?

GRCh37 (b37) - GATK-compatible reference genome

GRCh38 (hg38) - GATK-compatible reference genome

last change 2025-10-15 by Manuel Rueda ¶

What are the capture kits for WES?

For GATK version 3.5: Exome capture is based on Agilent SureSelect.
For GATK version 4.6: Exome and WGS reference is based on the GATK bundle (b37).

last change 2025-10-15 by Manuel Rueda ¶

mtDNA (MToolBox)¶

What is the reference genome used?

RSRS (rsrs) - Reconstructed Sapiens Reference Sequence

last change 2025-10-15 by Manuel Rueda ¶

VCF vs. prioritized variants allele notation

In rare cases, the allele reported in prioritized_variants.txt may differ from the ALT allele reported in the VCF. The Variant_Allele column is generated during annotation and prioritization and does not always follow VCF semantics, where ALT is defined relative to the mapping reference (e.g. RSRS).

last change 2025-10-15 by Manuel Rueda ¶

What does GT=1 mean in results?

In variant reports, the Genotype (GT) field shows the observed allele using VCF allele indices:

0 = reference allele
1 = first alternate (ALT) allele
2, 3, ... = additional ALT alleles (multiallelic)

For chrM/MT (mtDNA), callers typically encode genotypes as haploid (not allele pairs).

Meaning¶

GT = 1 → ALT allele detected in that sample
No / or | separator because only one allele index is stored
Biological interpretation relies on:
- HF → heteroplasmy fraction (molecules supporting ALT)
- DP → read depth (total support)

Examples¶

GT	Interpretation (mtDNA)
`0`	Only reference allele observed
`1`	ALT allele present (homoplasmic or heteroplasmic, check `HF` + `DP`)
`0/1`, `1/2` (rare)	Multiallelic call, still haploid encoding — not diploid zygosity

TL;DR: GT = 1 = ALT detected. Check HF and DP for biology.

Tip

For mtDNA, GT tells you which allele, not how much.

Use HF + DP to interpret heteroplasmy or homoplasmy.

last change 2025-10-15 by Manuel Rueda ¶

General¶

How do I set up cbicall to work on an HPC system?

On most HPC systems, Docker is not available. Instead, cbicall is designed to run using Apptainer (formerly Singularity), which is the recommended approach.

Apptainer can execute Docker images directly and is well suited for HPC environments because it requires no root privileges and integrates cleanly with batch schedulers.

In this setup:

the container image is read-only
all configuration files and workflows are stored in a writable host directory
external databases are downloaded outside the container and bind-mounted at runtime

The recommended workflow is:

Pull the CBIcall container image using Apptainer
Download the required databases on the host filesystem
Create a writable copy of the CBIcall workflow directory
Run the pipeline by bind-mounting the writable copy and data directory

This approach avoids manual dependency installation, improves reproducibility, and works on both interactive and batch-based HPC systems.

Step-by-step instructions are provided in:

⬇️ Installation → HPC (Apptainer / Singularity)

Do you have an example in how to run cbicall in Slurm HPC with apptainer?

#!/bin/bash
#
# run_cbicall_slurm_apptainer.sh
# usage: ./run_cbicall_slurm_apptainer.sh <sample_id> <pipeline: wes|wgs>

if [ "$#" -ne 2 ]; then
  echo "Usage: $0 <sample_id> <pipeline: wes|wgs>"
  exit 1
fi

SAMPLE_ID=$1
PIPELINE=$2

if [[ "$PIPELINE" != "wes" && "$PIPELINE" != "wgs" ]]; then
  echo "Error: pipeline must be 'wes' or 'wgs'"
  exit 1
fi

# choose SLURM settings based on pipeline
if [ "$PIPELINE" = "wes" ]; then
  QUEUE="normal"
  TIME="10:00:00"
elif [ "$PIPELINE" = "wgs" ]; then
  QUEUE="vlong"
  TIME="2-00:00:00"
fi

# Uppercase version of pipeline
PIPELINE_UC=${PIPELINE^^}

# where your data and logs live
WORKDIR="/scratch_isilon/projects/0012-hereditary/dbgap/fastq/phs001585/${PIPELINE_UC}/${SAMPLE_ID}"

# name the generated job script
JOB_SCRIPT="job_${SAMPLE_ID}_${PIPELINE}.slurm"

# Number of threads
THREADS=4

# RAM (x1.5 to help prevent oom-kills)
MEM="24G"

# Apptainer settings (edit as needed)
SIF_IMAGE="/software/biomed/containers/cbicall_latest.sif"
CBICALL_DATA="/software/biomed/cbicall-data"
CBICALL_WRITABLE="\$HOME/cbicall"   # writable copy of /usr/share/cbicall (per-user)

cat > "${JOB_SCRIPT}" <<EOF
#!/bin/bash
#SBATCH --job-name=cbicall
#SBATCH -q ${QUEUE}
#SBATCH -D ${WORKDIR}
#SBATCH -e ${WORKDIR}/slurm-%N.%j.err
#SBATCH -o ${WORKDIR}/slurm-%N.%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=${THREADS}
#SBATCH --mem=${MEM}
#SBATCH -t ${TIME}
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=manuel.rueda@cnag.eu

set -euo pipefail

module load apptainer 2>/dev/null || true

# Sanity checks
if [ ! -f "${SIF_IMAGE}" ]; then
  echo "ERROR: SIF image not found: ${SIF_IMAGE}"
  exit 2
fi

if [ ! -d "${CBICALL_DATA}" ]; then
  echo "ERROR: CBICALL_DATA directory not found: ${CBICALL_DATA}"
  exit 2
fi

if [ ! -d "${CBICALL_WRITABLE}" ]; then
  echo "ERROR: Writable cbicall copy not found: ${CBICALL_WRITABLE}"
  echo "Create it once with:"
  echo "  apptainer exec ${SIF_IMAGE} bash -lc 'mkdir -p \$HOME/cbicall && cp -a /usr/share/cbicall/. \$HOME/cbicall/'"
  exit 2
fi

cd \$SLURM_SUBMIT_DIR

# write a pipeline-specific yaml
YAML_FILE="${SAMPLE_ID}_${PIPELINE}_param.yaml"
cat <<YAML > "\${YAML_FILE}"
mode: single
pipeline: ${PIPELINE}
workflow_engine: bash
gatk_version: gatk-4.6
sample: ${WORKDIR}
projectdir: ${SAMPLE_ID}_cbicall
cleanup_bam: false
YAML

# Run cbicall inside the container
# - Bind writable workflow tree over /usr/share/cbicall (container install is read-only)
# - Bind databases to /cbicall-data
# - Bind WORKDIR so paths referenced in the YAML exist inside the container
CBICALL_IN_CONTAINER="/usr/share/cbicall/bin/cbicall"

srun apptainer exec \\
  --pwd /usr/share/cbicall \\
  --bind "${CBICALL_WRITABLE}":/usr/share/cbicall \\
  --bind "${CBICALL_DATA}":/cbicall-data \\
  --bind "${WORKDIR}":"${WORKDIR}" \\
  "${SIF_IMAGE}" \\
  "\${CBICALL_IN_CONTAINER}" \\
    -p "\${YAML_FILE}" \\
    -t ${THREADS} \\
    --no-color \\
    --no-emoji
EOF

# submit it
sbatch "${JOB_SCRIPT}"

last change 2026-01-14 by Manuel Rueda ¶

How do I cite CBIcall?

You can cite the CBIcall paper. Thx!

Citation

CBIcall: a configuration-driven framework for variant calling in large sequencing cohorts. Manuscript In preparation.