๐ท๏ธ Naming Conventions
Scope and Tool Compatibility¶
This document defines naming conventions required to run analyses in cohort mode with legacy tools:
- GATK 3.5
- MTOOLBox
Legacy tools only
The directory and subdirectory naming conventions described in this document are mandatory for GATK 3.5 and MTOOLBox. They are not required for GATK โฅ 4.6.
With GATK 4.6, sample directory names are ignored by the pipeline and may be any arbitrary string
(e.g. NEUMA640ZBR).
Directory Naming¶
Format¶
-
ProjectCode
Exactly 7 characters:[a-zA-Z0-9]
Example:CN99999 -
SampleType
One of:exomewgs
Examples¶
Subdirectory Naming¶
Format¶
ProjectCodeโ 7 characters ([a-zA-Z0-9])SampleIDโ 2 characters (e.g.01)Roleโ 1 character (P,F,M)SampleTypeShortโ must beex
Example¶
FASTQ Naming Convention¶
This FASTQ naming convention applies to all tools, including GATK 3.5 and GATK 4.6.
It is based on the Illumina specification: support.illumina.com/help/BaseSpace_Sequence_Hub_OLH_009008_2/Source/Informatics/BS/NamingConvention_FASTQ-files-swBS.htm
Sequencing-type suffix¶
_exโ exome sequencing_wgโ whole genome sequencing
Example¶
Expected Directory Layout¶
Required for cohort mode (legacy)
This directory layout is required when running GATK 3.5 or MTOOLBox in cohort mode.