Frequently Asked Questions¶
WES / WGS¶
What are the reference genomes used?
GRCh37 (b37) - GATK-compatible reference genome
GRCh38 (hg38) - GATK-compatible reference genome
last change 2025-10-15 by Manuel Rueda ¶
What are the capture kits for WES?
-
For GATK version 3.5: Exome capture is based on Agilent SureSelect.
-
For GATK version 4.6: Exome and WGS reference is based on the GATK bundle (b37).
last change 2025-10-15 by Manuel Rueda ¶
mtDNA (MToolBox)¶
What is the reference genome used?
RSRS (rsrs) - Reconstructed Sapiens Reference Sequence
last change 2025-10-15 by Manuel Rueda ¶
VCF vs prioritized_variants allele notation
The allele shown in prioritized_variants.txt does not always follow VCF REF/ALT semantics. The column Variant_allele represent the variant allele observed at that position for annotation purposes, while the VCF reports variants relative to the mapping reference (e.g. RSRS). The VCF should be considered the authoritative source for REF/ALT.
last change 2025-10-15 by Manuel Rueda ¶
What does GT=1 mean in results?
In variant reports, the Genotype (GT) field shows the observed allele using VCF allele indices:
0= reference allele1= first alternate (ALT) allele2,3, ... = additional ALT alleles (multiallelic)
For chrM/MT (mtDNA), callers typically encode genotypes as haploid (not allele pairs).
Meaning¶
GT = 1โ ALT allele detected in that sample- No
/or|separator because only one allele index is stored - Biological interpretation relies on:
HFโ heteroplasmy fraction (molecules supporting ALT)DPโ read depth (total support)
Examples¶
| GT | Interpretation (mtDNA) |
|---|---|
0 |
Only reference allele observed |
1 |
ALT allele present (homoplasmic or heteroplasmic, check HF + DP) |
0/1, 1/2 (rare) |
Multiallelic call, still haploid encoding โ not diploid zygosity |
TL;DR:
GT = 1= ALT detected. CheckHFandDPfor biology.
Tip
For mtDNA, GT tells you which allele, not how much.
Use HF + DP to interpret heteroplasmy or homoplasmy.
last change 2025-10-15 by Manuel Rueda ¶
General¶
How do I set up cbicall to work on an HPC system?
On most HPC systems, Docker is not available. Instead, cbicall is designed to run
using Apptainer (formerly Singularity), which is the recommended approach.
Apptainer can execute Docker images directly and is well suited for HPC environments because it requires no root privileges and integrates cleanly with batch schedulers.
In this setup:
- the container image is read-only
- all configuration files and workflows are stored in a writable host directory
- external databases are downloaded outside the container and bind-mounted at runtime
The recommended workflow is:
- Pull the CBIcall container image using Apptainer
- Download the required databases on the host filesystem
- Create a writable copy of the CBIcall workflow directory
- Run the pipeline by bind-mounting the writable copy and data directory
This approach avoids manual dependency installation, improves reproducibility, and works on both interactive and batch-based HPC systems.
Step-by-step instructions are provided in:
โฌ๏ธ Installation โ HPC (Apptainer / Singularity)
Do you have an example in how to run cbicall in Slurm HPC with apptainer?
#!/bin/bash
#
# run_cbicall_slurm_apptainer.sh
# usage: ./run_cbicall_slurm_apptainer.sh <sample_id> <pipeline: wes|wgs>
if [ "$#" -ne 2 ]; then
echo "Usage: $0 <sample_id> <pipeline: wes|wgs>"
exit 1
fi
SAMPLE_ID=$1
PIPELINE=$2
if [[ "$PIPELINE" != "wes" && "$PIPELINE" != "wgs" ]]; then
echo "Error: pipeline must be 'wes' or 'wgs'"
exit 1
fi
# choose SLURM settings based on pipeline
if [ "$PIPELINE" = "wes" ]; then
QUEUE="normal"
TIME="10:00:00"
elif [ "$PIPELINE" = "wgs" ]; then
QUEUE="vlong"
TIME="2-00:00:00"
fi
# Uppercase version of pipeline
PIPELINE_UC=${PIPELINE^^}
# where your data and logs live
WORKDIR="/scratch_isilon/projects/0012-hereditary/dbgap/fastq/phs001585/${PIPELINE_UC}/${SAMPLE_ID}"
# name the generated job script
JOB_SCRIPT="job_${SAMPLE_ID}_${PIPELINE}.slurm"
# Number of threads
THREADS=4
# RAM (x1.5 to help prevent oom-kills)
MEM="24G"
# Apptainer settings (edit as needed)
SIF_IMAGE="/software/biomed/containers/cbicall_latest.sif"
CBICALL_DATA="/software/biomed/cbicall-data"
CBICALL_WRITABLE="\$HOME/cbicall" # writable copy of /usr/share/cbicall (per-user)
cat > "${JOB_SCRIPT}" <<EOF
#!/bin/bash
#SBATCH --job-name=cbicall
#SBATCH -q ${QUEUE}
#SBATCH -D ${WORKDIR}
#SBATCH -e ${WORKDIR}/slurm-%N.%j.err
#SBATCH -o ${WORKDIR}/slurm-%N.%j.out
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=${THREADS}
#SBATCH --mem=${MEM}
#SBATCH -t ${TIME}
#SBATCH --mail-type=END,FAIL
#SBATCH --mail-user=manuel.rueda@cnag.eu
set -euo pipefail
module load apptainer 2>/dev/null || true
# Sanity checks
if [ ! -f "${SIF_IMAGE}" ]; then
echo "ERROR: SIF image not found: ${SIF_IMAGE}"
exit 2
fi
if [ ! -d "${CBICALL_DATA}" ]; then
echo "ERROR: CBICALL_DATA directory not found: ${CBICALL_DATA}"
exit 2
fi
if [ ! -d "${CBICALL_WRITABLE}" ]; then
echo "ERROR: Writable cbicall copy not found: ${CBICALL_WRITABLE}"
echo "Create it once with:"
echo " apptainer exec ${SIF_IMAGE} bash -lc 'mkdir -p \$HOME/cbicall && cp -a /usr/share/cbicall/. \$HOME/cbicall/'"
exit 2
fi
cd \$SLURM_SUBMIT_DIR
# write a pipeline-specific yaml
YAML_FILE="${SAMPLE_ID}_${PIPELINE}_param.yaml"
cat <<YAML > "\${YAML_FILE}"
mode: single
pipeline: ${PIPELINE}
workflow_engine: bash
gatk_version: gatk-4.6
sample: ${WORKDIR}
projectdir: ${SAMPLE_ID}_cbicall
cleanup_bam: false
YAML
# Run cbicall inside the container
# - Bind writable workflow tree over /usr/share/cbicall (container install is read-only)
# - Bind databases to /cbicall-data
# - Bind WORKDIR so paths referenced in the YAML exist inside the container
CBICALL_IN_CONTAINER="/usr/share/cbicall/bin/cbicall"
srun apptainer exec \\
--pwd /usr/share/cbicall \\
--bind "${CBICALL_WRITABLE}":/usr/share/cbicall \\
--bind "${CBICALL_DATA}":/cbicall-data \\
--bind "${WORKDIR}":"${WORKDIR}" \\
"${SIF_IMAGE}" \\
"\${CBICALL_IN_CONTAINER}" \\
-p "\${YAML_FILE}" \\
-t ${THREADS} \\
--no-color \\
--no-emoji
EOF
# submit it
sbatch "${JOB_SCRIPT}"
last change 2026-01-14 by Manuel Rueda ¶
How do I cite CBIcall?
You can cite the CBIcall paper. Thx!
Citation
CBIcall: a configuration-driven framework for variant calling in large DNA-seq cohorts. Manuscript In preparation.