Skip to content

End-to-end examples (MToolBox)

Prerequisites

Installation, reference bundles, and all dependencies must be completed beforehand.

โžก๏ธ Installation

Architecture

MToolBox supports x86_64 only. ARM-based systems, including Apple Silicon (M1/M2/M3), are not supported.


1. Before running mtDNA calling you must have a bam file coming from wes/wgs

Does it matter if I ran WES/WGS with GATK 3.5 or GATK 4.6?

No. CBIcall will detect and use the bam files produced by either version.
Just make sure that bam files are available โ€” FASTQ input is not supported.

CBIcall expects a BAM file from a previous wes/wgs run:

CNAG999_exome
โ””โ”€โ”€ CNAG99901P_ex  <--- ID taken from here
    โ””โ”€โ”€ *cbicall_bash_w?s_single_gatk-* <- The script expects that you have a BAM file inside this directory

Note on nomenclature

Please see this page.


2. Create a parameters file

Create a YAML file, e.g. mit_single.yaml.

Important

Please make sure you use the same value for the key sample that you used for wes/wgs.

Example:

mode:            single
pipeline:        mit
workflow_engine: bash
input_dir:       CNAG999_exome/CNAG99901P_ex

3. Run CBIcall

bin/cbicall -p mit_single.yaml -t 4
  • -p selects the YAML parameters file
  • -t sets the number of threads

4. Inspect outputs

After completion, you will find:

CNAG999_exome/CNAG99901P_ex/cbicall_bash_mit_single_rsrs_gatk-3.5_*/
  01_mtoolbox/
  02_browser/

5. Visualize variants in the browser

Please see:

02_browser/README.txt

The results are reported both as a HTML table and as downloadable files.

See snapshot

browser

Downloadable files:

  • mtDNA JSON A JSON file with the results from mit_prioritized_variants.txt.
  • Report: A tsv file including all the annotations for each variant. Name of the file mit_prioritized_variants.txt.
  • Haplog: A tsv file including the predicted haplogroup for each sample. Name of the file mt_classification_best_results.csv.
  • VCF: A text file consisting of all the variants in the VCF format. Name of the file VCF_file.vcf.

HTML table:

In this tab SG-ADVISER mtDNA displays a browsable table consisting of the most relevant fields relative to the variant annotation:

  • Sample: The full name of each sample.
  • Locus: The location on the mitochondrial chromosome.
  • Variant_Allele: The position in the mitochondrial chromosome + the alternative allele format.
  • Ref: The reference allele (mitochondrial reference genome: RSRS).
  • Alt: The alternative allele(s).
  • Aa_change: The amino acid change if the variant falls in a coding region.
  • GT: Genotype. 0:Ref, โ‰ฅ1:Alt(s).
  • Depth: The number of times this position is covered by reads.
  • Heterop_Frac: The heteroplasmic fraction. Note that the confidence interval can be retrieved from the downloadable VCF file.
  • Other: For other fields please consult MToolBox's manual.

Filtered variants

The table shows pre-filtered variants. Variants were excluded if:

  • HF โ‰ค 0.30 (maximum HF observed in any sample)
  • 1000 Genomes frequency โ‰ฅ 0.01
  • Not present in the input VCF

By default, variants with missing HF values (NA,N/A,.) are excluded. Use the --keep-missing-hf option to retain them.


For advanced parameters, multi-sample analyses, mtDNA workflows and troubleshooting, see the Usage and FAQ sections.

1. Before running mtDNA calling you must have bam files coming from wes/wgs

Does it matter if I ran WES/WGS with GATK 3.5 or GATK 4.6?

No. CBIcall will detect and use the bam files produced by either version.
Just make sure that bam files are available โ€” FASTQ input is not supported.

CBIcall expects BAM files from previous wes/wgs runs:

CNAG999_exome
โ””โ”€โ”€ CNAG99901P_ex  <--- ID taken from here
    โ””โ”€โ”€ *cbicall_bash_w?s_single_gatk-* <- The script expects that you have a BAM file inside this directory
    CNAG99902M_ex  <--- ID taken from here
    โ””โ”€โ”€ *cbicall_bash_w?s_single_gatk-* <- The script expects that you have a BAM file inside this directory

Note on nomenclature

Please see this page.


2. Create a parameters file

Create a YAML file, e.g. mit_cohort.yaml:

mode:            cohort
pipeline:        mit
workflow_engine: bash
gatk_version:    gatk-3.5
input_dir:       CNAG999_exome

3. Run CBIcall

bin/cbicall -p mit_cohort.yaml -t 4
  • -p selects the YAML parameters file
  • -t sets the number of threads

4. Inspect outputs

After completion, you will find:

CNAG999_exome/cbicall_bash_mit_cohort_rsrs_gatk-3.5*
  01_mtoolbox/
  02_browser/

5. Visualize variants in the browser

Please see:

02_browser/README.txt

The results are reported both as a HTML table and as downloadable files.

See snapshot

browser

Downloadable files:

  • mtDNA JSON A JSON file with the results from mit_prioritized_variants.txt.
  • Report: A tsv file including all the annotations for each variant. Name of the file mit_prioritized_variants.txt.
  • Haplog: A tsv file including the predicted haplogroup for each sample. Name of the file mt_classification_best_results.csv.
  • VCF: A text file consisting of all the variants in the VCF format. Name of the file VCF_file.vcf.

HTML table:

In this tab SG-ADVISER mtDNA displays a browsable table consisting of the most relavant fields relative to the variant annotation:

  • Sample: The full name of each sample.
  • Locus: The location on the mitochondrial chromosome.
  • Variant_Allele: The position in the mitochondrial chromosome + the alternative allele format.
  • Ref: The reference allele (mitochondrial reference genome: RSRS).
  • Alt: The alternative allele(s).
  • Aa_change: The amino acid change if the variant falls in a coding region.
  • GT: Genotype. 0:Ref, โ‰ฅ1:Alt(s).
  • Depth: The number of times this position is covered by reads.
  • Heterop_Frac: The heteroplasmic fraction. Note that the confidence interval can be retrieved from the downloadable VCF file.
  • Other: For other fields please consult MToolBox's manual.

Filtered variants

The table shows pre-filtered variants. Variants were excluded if:

  • HF โ‰ค 0.30 (maximum HF observed in any sample)
  • 1000 Genomes frequency โ‰ฅ 0.01
  • Not present in the input VCF

By default, variants with missing HF values (NA,N/A,.) are excluded. Use the --keep-missing-hf option to retain them.

Genetic Data Interpretation Disclaimer

This tool provides research-based annotations of mtDNA genetic data. It is intended for research use only and is not a medical device. It does not provide medical or clinical advice.

  • ๐Ÿฉบ Do not use results for medical decisions. Always consult a qualified healthcare professional.
  • ๐Ÿ˜ฐ Results may cause emotional or psychological distress. You may learn about increased risks for serious health conditions.
  • ๐Ÿ”ฌ Genetic data and interpretations have limitations. Not all variants are covered, and scientific understanding continues to evolve.
  • ๐Ÿ” You are responsible for safeguarding your genetic data. Use caution when storing or sharing results; privacy or legal implications may apply.
  • โšก Use at your own risk. The authors assume no responsibility for how the results are interpreted or used.

By using this tool, you confirm that you understand and accept these terms.