Skip to main content

ClarID Specification Tables

Biosample

Human format

Delimiter: -

#ComponentSource fieldTypePattern / FormatBuilt from
1Projectproject.codestringfree stringcodebook value (if present)
2Speciesspecies.codestring6-letter binomial acronym (e.g. HomSap)codebook value
3Subject IDsubject_idinteger→strzero-pad to 5 digits (default)sprintf("%0${pad}d",$sid)
4Tissuetissue.codestringexactly 3 letters [A-Z]{3}codebook value
5Sample Typesample_type.codestringexactly 3 letters [A-Z]{3}codebook value
6Assayassay.codestringexactly 3 letters [A-Z]{3}codebook value
7ConditionconditionstringICD-10 diagnosis code(s) [A-Z]\d{2}(?:\.\d+)? (≤10),used verbatim (or concatenated with +)
8Timepointtimepointstringalphanumeric events e.g. Baselinecodebook value
9DurationdurationstringISO 8601 3-char (P1D,P7W,P3M,P1Y) or P0N (Not Available)duration_pattern
10Batch (opt)batchinteger→strB%02d (e.g. B01)batch_pattern
11Replicate (opt)replicateinteger→strR%02d (e.g. R05)replicate_pattern

Note (duration): The 3-character limit (PnU) is intentional, designed to keep timepoint duration compact and consistent, rather than attempting to represent every possible duration in full detail; see duration binning.

Stub format

Delimiter: (none)

About Base62 in stub fields

Base62 is used here as a compact encoding alphabet, not as a semantic abbreviation. Stub fields are optimized for compactness and stable parsing, so they are not expected to look visually similar to the human-readable format.

#ComponentSource fieldTypePattern / FormatBuilt from
1Project stubproject.stub_codestringfree stringcodebook value
2Species stubspecies.stub_codestringcodebook-defined stub width (Base62 alphabet recommended)codebook value
3Subject stubsubject_idinteger→Base62width 3 (default) — max 238,3273-char Base62 from integer
4Tissue stubtissue.stub_codestring1–3 charscodebook value
5Sample Type stubsample_type.stub_codestring1–3 charscodebook value
6Assay stubassay.stub_codestring1–3 charscodebook value
7Condition stubconditioncode→Base62N × 3-char Base62 stubs + 2-digit count (%02d)packaged ICD-10 order map + 3-char Base62 from integer
8Timepoint stubtimepoint.stub_codestring1–2 charscodebook value
9Duration stubdurationstringdigits+unit (e.g. 7W)duration_pattern
10Batch stub (opt)batchintegerB%02d (e.g. B01)batch_pattern
11Replicate stub (opt)replicateintegerR%02d (e.g. R05)replicate_pattern

Note (species + subject_id): in stub format these fields are encoded independently. species uses a static codebook stub_code, whereas subject_id is converted from the numeric subject identifier into fixed-width Base62. They are therefore compact counterparts of the same metadata, but not character-by-character transformations of the human-readable fields.

Note (species width): species stub width is defined by the codebook and should be consistent within a given codebook. The reference codebook uses width 2.

Note (condition mapping): the numeric mapping used for stub condition values depends on the packaged ICD-10 order map distributed with the reference implementation. In practice, condition stubs should be interpreted together with the ClarID-Tools release and associated resources used for encoding. Future revisions may revisit this mapping strategy if broader interoperability needs emerge.


Subject

Human format

Delimiter: -

#ComponentSource fieldTypePattern / FormatBuilt from
1Studystudystringfree stringcodebook value (if present)
2Subject IDsubject_idinteger→strzero-pad to 5 digits (default)sprintf("%0${pad}d",$sid)
3Typetype.codestringcodebook codescodebook value
4ConditionconditionstringICD-10 diagnosis code(s) [A-Z]\d{2}(?:\.\d+)? (≤10),used verbatim (or concatenated with +)
5Sexsex.codestringcodebook codescodebook value
6Age Groupage_group.codestringcodebook codescodebook value

Stub format

Delimiter: (none)

#ComponentSource fieldTypePattern / FormatBuilt from
1Study stubstudy.stub_codestringfree stringcodebook value (if present)
2Subject ID stubsubject_idinteger→Base62width 3 (default) — max 238,3273-char Base62 from integer
3Type stubtype.stub_codestring1 charcodebook value
4Condition stubconditioncode→Base62N × 3-char Base62 stubs + 2-digit count (%02d)packaged ICD-10 order map + 3-char Base62 from integer
5Sex stubsex.stub_codestring1 charcodebook value
6Age Group stubage_group.stub_codestring2 charscodebook value

Note (condition mapping): as in biosample stub format, the numeric mapping for condition depends on the packaged ICD-10 order map distributed with the reference implementation. Encoded values should therefore be interpreted together with the corresponding ClarID-Tools release and associated resources.