Skip to main content

End-to-end examples (GATK 4.6)

Prerequisites Installation, reference bundles, and all dependencies must be completed beforehand.

Installation


This example demonstrates how to run CBIcall on a real WES sample from FASTQ files through final VCF and QC outputs.

1. Prepare your FASTQ files

CBIcall expects paired-end FASTQ files with a shared prefix, for example:

# Project / Sample (Proband WES)
CNAG999_exome/CNAG99901P_ex/
CNAG99901P_ex_S1_L001_R1_001.fastq.gz
CNAG99901P_ex_S1_L001_R2_001.fastq.gz

Note on nomenclature Please see this page.


2. Create a parameters file

Create a YAML file, e.g. wes_single.yaml:

mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
cleanup_bam: false

Notes:

  • mode selects single-sample or cohort (joint genotyping).
  • pipeline switches between WES, WGS or mtDNA.
  • workflow_engine chooses the backend (bash or snakemake).
  • See Configuration Reference for all YAML keys and supported combinations.

How can I perform WGS? Simply change the parameter pipeline to wgs. Like this:

mode: single
pipeline: wgs
workflow_engine: bash
gatk_version: gatk-4.6
input_dir: CNAG999_exome/CNAG99901P_ex
genome: b37
cleanup_bam: false

3. Run CBIcall

bin/cbicall run -p wes_single.yaml -t 4
  • -p selects the YAML parameters file
  • -t sets the number of threads

You should see something like this on the screen:

CBIcall 1.0.0
Executable => .../cbicall/bin/cbicall
Workflow => bash -> wes -> single
Genome => b37
Threads => 4
Project => .../CNAG999_exome/CNAG99901P_ex/cbicall_bash_wes_single_b37_gatk-4.6_177447031761843
Run ID => 177447031761843

Inputs
Param file => wes_single.yaml
Input dir => .../input/CNAG999_exome/CNAG99901P_ex
Sample map => (undef)
GATK => gatk-4.6
Pipeline ver => v1

Resolved
Entrypoint => .../bash/gatk-4.6/wes_single.sh
Env file => .../bash/gatk-4.6/env.sh
Log => /media/mrueda/2TBS/CNAG/Project_CBI_Call/cbicall/examples/input/CNAG999_exome/CNAG99901P_ex/cbicall_bash_wes_single_b37_gatk-4.6_177447031761843/bash_wes_single_b37_gatk-4.6.log

Running
Workflow => bash -> wes -> single
This workflow may take a while depending on input size and pipeline.

Completed
Status => Finished successfully
Elapsed => 1m 30s
Log => /media/mrueda/2TBS/CNAG/Project_CBI_Call/cbicall/examples/input/CNAG999_exome/CNAG99901P_ex/cbicall_bash_wes_single_b37_gatk-4.6_177447031761843/bash_wes_single_b37_gatk-4.6.log
Do Widzenia

4. Inspect outputs

After completion, you will find:

CNAG999_exome/CNAG99901P_ex/cbicall_bash_wes_single_b37_gatk-4.6_*/
01_bam/
02_varcall/
03_stats/
logs/

Where:

  • VCF files are stored in 02_varcall/
  • QC metrics (coverage, sample stats, sex prediction) are in 03_stats
  • Logs for all pipeline steps are under logs/
What you get
  • Final VCF for interpretation: 02_varcall/<id>.hc.QC.vcf.gz
  • gVCF for cohort joint genotyping: 02_varcall/<id>.hc.g.vcf.gz
  • Run metadata: log.json

See Outputs for the full file reference.


For advanced parameters, multi-sample analyses, mtDNA workflows and troubleshooting, see the Usage and FAQ sections.

Do you have examples in how to run CBIcall programatically? Yes, you can find examples at https://github.com/CNAG-Biomedical-Informatics/cbicall/tree/main/examples/scripts.

Any suggestions for performing annotation? We recommend using beacon2-cbi-tools. This tool allows you not only to annotate data, but also to convert it into a data exchange format compatible with the Beacon v2 API.