Common errors and troubleshooting¶
Expected de novo rates in trios
For trio analyses, approximate de novo variant rates are:
- Probands: ~1%
- Parents: ~10%
Large deviations from these ranges may indicate technical or pipeline issues that warrant investigation.
GATK and Picard errors¶
(wes_single.sh or wes_cohort.sh)
NaN LOD value assigned during recalibration¶
Error message example
NaN LOD value assigned during VariantRecalibrator or ApplyVQSR.
Cause
This typically occurs when there are too few INDEL variants (for example, fewer than about 8000) to train a robust negative model. The default minimum INDEL count threshold is 8000 in the VQSR step.
Solution
Increase the minimum INDEL count threshold in the relevant pipeline script so that VQSR is skipped for samples with low INDEL counts. Only rerun the affected samples.
This prevents VariantRecalibrator from trying to build a model on too few variants.
Not enough columns in dbSNP line¶
Error message example
there aren't enough columns for line ... dbsnp_137.hg19.vcf
Cause
One or more lines in the dbSNP VCF file do not conform to the expected VCF column structure (for example, a truncated or malformed record).
Solution
- Identify the problematic line in the dbSNP VCF.
- Remove or fix that line.
- Document the change in a local README or change log so the modification is traceable.
Error parsing text SAM file¶
Error message example
Error parsing text SAM file. Not enough fields; File /dev/stdin; Line 105120626...
Cause
Some SRA or dbGaP datasets include duplicate or problematic reads. When piping BWA output directly into AddOrReplaceReadGroups, secondary and supplementary alignments can cause issues and lead to collisions or invalid lines as seen by Picard or GATK.
Solution
Remove secondary (0x100) and supplementary (0x800) alignments from the BWA stream before adding read groups.
In wes_single.sh, uncomment the filtering step in the alignment pipe, for example:
bwa mem -M -t "$THREADS" "$REFGZ" "$R1" "$R2" | samtools view -bSh -F 0x900 - | gatk AddOrReplaceReadGroups ...
This filtering prevents problematic alignments from reaching Picard or GATK and avoids the parsing error.
MToolBox errors and mtDNA specific issues¶
Unsupported N CIGAR operations¶
Symptom
MToolBox fails with an error related to unsupported N operations in CIGAR strings.
Solution
Add the --filter_reads_with_N_cigar flag in MToolBox.sh (around the main bwa mem or SAM tools invocations, typically near line 386 in your local copy).
This discards reads with N operations in the CIGAR string before downstream processing, avoiding MToolBox failures.
Low coverage and unreliable heteroplasmy fractions¶
Symptom
- Very low coverage mtDNA samples.
- Heteroplasmic fraction (HF) estimates appear noisy or unreliable.
Guideline
- Below about 10x mtDNA coverage, HF estimates become unreliable and may be biologically meaningless.
- At higher coverage, HF values tend to be robust, even when coverage varies across samples.
Recommended actions
- Flag samples with less than 10x median mtDNA coverage for review.
- Interpret HF with caution in low coverage samples, or exclude them from HF based analyses.
- Consider resequencing or deeper coverage if mtDNA heteroplasmy is critical to the study.