Skip to main content

Troubleshooting

Use the error text from the terminal or workflow log to find the matching section. Most failures fall into three groups: missing external data, GATK/Picard input problems, or mtDNA-specific MToolBox issues.

Where to look first
  • Check the main run log in the run directory.
  • For Snakemake/GATK 4.6 runs, also check logs/*.log.
  • Check log.json to confirm the resolved input_dir, sample_map, genome, workflow, and run directory.

Installation and External Data

External data or tool path not found

Symptom

/usr/bin/bash: line 9: /media/mrueda/2TBS/NGSutils/gatk/gatk-4.6.2.0/gatk: No such file or directory

Likely cause

DATADIR does not point to the directory where databases and external tools are installed or mounted.

Fix

Update the data directory in the workflow configuration:

workflows/bash/gatk-4.6/env.sh
workflows/snakemake/gatk-4.6/config.yaml

For containers, make sure the host data directory is bind-mounted at the same path used by the workflow configuration.

Relative input paths resolve somewhere unexpected

Symptom

CBIcall cannot find FASTQ files, BAMs, or sample_map.tsv, even though the path looks correct from your current shell.

Likely cause

Relative input_dir and sample_map paths are resolved from the YAML file location.

Fix

Use absolute paths, or keep the YAML file next to the relative paths it references. Confirm the resolved paths in log.json.

GATK and Picard

NaN LOD value during recalibration

Symptom

NaN LOD value assigned

Likely cause

There are too few variants to train a reliable VQSR model, often too few INDELs.

Fix

Use the existing thresholds that skip VQSR when the variant count is too small, or increase the minimum threshold before rerunning. The final *.QC.vcf.gz is still produced by hard filtering when VQSR is skipped.

Not enough columns in dbSNP line

Symptom

there aren't enough columns for line ... dbsnp_137.hg19.vcf

Likely cause

The dbSNP VCF contains malformed or truncated records.

Fix

Inspect the reported line in the dbSNP VCF, replace the database file if possible, or correct the malformed record locally and document the change.

Error parsing text SAM file

Symptom

Error parsing text SAM file. Not enough fields; File /dev/stdin; Line ...

Likely cause

Secondary or supplementary alignments can introduce records that Picard/GATK rejects when the alignment stream is passed directly into read-group assignment.

Fix

Filter secondary and supplementary alignments before adding read groups:

bwa mem -M -t "$THREADS" "$REFGZ" "$R1" "$R2" \
| samtools view -bSh -F 0x900 - \
| gatk AddOrReplaceReadGroups ...

mtDNA and MToolBox

MToolBox fails on ARM / aarch64

Symptom

mit_single cannot be performed with: aarch64

or:

mit_cohort cannot be performed with: aarch64

Likely cause

The bundled MToolBox workflow is x86_64-only.

Fix

Run mtDNA workflows on an x86_64 Linux host. WES/WGS GATK 4.6 workflows can still run on supported ARM systems.

No usable BAM found for mtDNA

Symptom

ERROR: Could not find BAM for ID ...

or:

ERROR: No usable sample BAMs found. Nothing to do.

Likely cause

The mtDNA workflow expects BAMs from previous WES/WGS single-sample runs in the expected project layout.

Fix

Run WES/WGS single-sample processing first, keep the 01_bam outputs, and then rerun the mtDNA workflow from the sample or project directory described in the mtDNA example.

Unsupported N CIGAR operations

Symptom

MToolBox fails due to unsupported N operations in CIGAR strings.

Likely cause

Some reads contain skipped-region CIGAR operations that MToolBox cannot process.

Fix

Add this flag in the relevant MToolBox alignment or SAM-processing step:

--filter_reads_with_N_cigar
Low coverage and unreliable heteroplasmy fractions

Symptom

mtDNA coverage is low, or heteroplasmy fraction estimates look unstable.

Likely cause

Below roughly 10x mtDNA coverage, heteroplasmy fraction estimates are unreliable.

Fix

Flag samples below 10x median mtDNA coverage, interpret HF values cautiously, exclude low-coverage samples from HF-based analyses when needed, and consider resequencing if mtDNA interpretation is critical.

Variant Interpretation

Unexpected de novo rates in trios

Symptom

Observed de novo rates differ strongly from expectations.

Likely cause

Large deviations can indicate sample, data-quality, annotation, or pipeline issues.

Reference values

Sample typeTypical de novo rate
Proband~1%
Parent~10%

Fix

Check sample identity, pedigree labels, coverage, variant filters, and annotation assumptions before interpreting the result biologically.

Next Steps