End-to-end examples (GATK 4.6)¶
Prerequisites
Installation, reference bundles, and all dependencies must be completed beforehand.
This example demonstrates how to run CBIcall on a real WES sample from FASTQ files through final VCF and QC outputs.
1. Prepare your FASTQ files¶
CBIcall expects paired-end FASTQ files with a shared prefix, for example:
# Project / Sample (Proband WES)
CNAG999_exome/CNAG99901P_ex/
CNAG99901P_ex_S1_L001_R1_001.fastq.gz
CNAG99901P_ex_S1_L001_R2_001.fastq.gz
Note on nomenclature
Please see this page.
2. Create a parameters file¶
Create a YAML file, e.g. wes_single.yaml:
mode: single
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
sample: CNAG999_exome/CNAG99901P_ex
genome: b37
cleanup_bam: false
Notes:
modeselects single-sample or cohort (joint genotyping).pipelineswitches between WES, WGS or mtDNA.workflow_enginechooses the backend (bash or snakemake).
3. Run CBIcall¶
-pselects the YAML parameters file-tsets the number of threads
4. Inspect outputs¶
After completion, you will find:
Where:
- VCF files are stored in
02_varcall/ - QC metrics (coverage, sample stats, sex prediction) are in
03_stats - Logs for all pipeline steps are under
logs/
These files are ready for downstream analysis, annotation or integration with cohort-level studies.
For advanced parameters, multi-sample analyses, mtDNA workflows and troubleshooting, see the Usage and FAQ sections.
Important
In order to run a cohort based calculation you first have to create GVCF for each sample. This is being done by running wes mode single.
1. Create a sample map file like the one we display below:¶
CNAG99901P_ex /media/mrueda/2TBS/CNAG/Project_CBI_Call/cbicall/examples/input/CNAG999_exome/CNAG99901P_ex/ref_cbicall_bash_wes_single_b37_gatk-4.6_765963065360466/02_varcall/CNAG99901P.hc.QC.vcf.gz
CNAG99902P_ex /media/mrueda/2TBS/CNAG/Project_CBI_Call/cbicall/examples/input/CNAG999_exome/CNAG99901P_ex/ref_cbicall_bash_wes_single_b37_gatk-4.6_765963065360466/02_varcall/CNAG99901P.hc.QC.vcf.gz
GATK needs absolute paths for the files.
2. Create a parameters file¶
Create a YAML file, e.g. wes_cohort.yaml:
mode: cohort
pipeline: wes
workflow_engine: bash
gatk_version: gatk-4.6
genome: b37
sample_map: ./sample_map.tsv
3. Run CBIcall¶
-pselects the YAML parameters file-tsets the number of threads
4. Inspect outputs¶
After completion, you will find:
Where:
- Final VCF files are stored in
02_varcall/ - Logs for all pipeline steps are under
logs/