Output Validation
Output validation in Convert-Pheno is not a single switch. During development, converted files are checked against the target schemas or table definitions, and validation errors are used to improve the conversion code. For users, the same idea shows up as preserved source values, ontology search audit files, and documented mapping tables.
The goal is practical: converted files should be structurally valid, and users should still be able to inspect how source values became target fields.
Development loop: generate output, validate it, inspect schema or table errors, update mappings/defaults/type coercions in the runtime code, and repeat until the generated files validate for the tested route.
Source Provenance in infoâ
When Convert-Pheno creates BFF from OMOP-CDM, CSV, REDCap, or CDISC-ODM, it preserves raw source values in info by default.
This is deliberate:
- Users can cross-check converted records against the original input.
- Beacon-style APIs can still expose or query source-specific values.
- Conversion bugs are easier to diagnose because the source context is retained.
Use --no-source-info only when you need smaller payloads or do not want to carry raw source values forward.
convert-pheno -iomop PERSON.csv CONCEPT.csv \
-obff individuals.json \
--no-source-info
Ontology Search Auditâ
For mapping-file conversions, use --search-audit-tsv to write a user-readable TSV of ontology lookups.
convert-pheno -icsv clinical.csv \
--mapping-file mapping.yaml \
--search-audit-tsv search-audit.tsv \
-obff individuals.json
The audit file is useful for checking:
- the original label from the input
- the converted label
- the converted ontology identifier
- the ontology source
- whether the result came from an exact match, a fuzzy/mixed search, or a fallback
Development Validatorsâ
Convert-Pheno does not validate source files as clinical truth. Input validation and cleaning remain the user's responsibility.
During development, generated outputs are checked with external validators where practical.
- BFF: Beacon v2 JSON entities are checked with
bff-tools validatefrom beacon2-cbi-tools. Validator failures are used to update runtime mapping logic, defaults, and type coercions until generated entity files validate against the Beacon v2 schemas. - PXF: Phenopackets output is checked in the extended
xt/protobuff.ttest. The test uses Inline Python to parse generated PXF JSON into the Phenopackets protobuf model withgoogle.protobuf.json_format.Parseandphenopackets.Phenopacket. - OMOP-CDM: Emitted OMOP CSV tables are checked with omop-csv-validator, which validates table files against the OMOP-CDM DDL.
BFF validators usually infer the entity from the file name. Use standard names such as individuals.json, biosamples.json, datasets.json, and cohorts.json.
OMOP-CDM to Beacon Validationâ
OMOP-CDM v5.4 is a relational SQL model, while Beacon v2 Models are hierarchical JSON schemas. For example, OMOP stores clinical facts across tables such as PERSON, CONDITION_OCCURRENCE, MEASUREMENT, OBSERVATION, PROCEDURE_OCCURRENCE, and SPECIMEN; Beacon individuals and biosamples represent related information as nested JSON objects.
The OMOP-to-BFF mappings were developed to bridge that difference while keeping the converted JSON structurally valid against Beacon v2 schemas. During development, generated BFF files were iteratively validated with bff-tools validate; schema errors were then addressed in the runtime conversion code by refining mappings, adding required defaults, and correcting data types. This is why some apparently artificial defaults exist: they are there to satisfy required Beacon structure when the source model has no direct equivalent.
Validation was also supported by dataset-specific checks:
- Synthetic EUNOMIA data were used where expected behavior can be checked under controlled conditions.
- Representative mappings were reviewed manually for semantic consistency.
- Larger OMOP datasets exposed edge cases that were used to refine the mapping with feedback from data owners.
The current OMOP-to-Beacon mapping tables are documented in OMOP to BFF.
OMOP Mapping Considerationsâ
Two choices are important when reviewing OMOP-derived BFF:
- Source preservation: Original OMOP row values are retained under
infoor_infoprovenance blocks by default. This helps domain experts cross-check converted records and allows source-specific OMOP values to remain queryable when BFF is loaded into downstream systems. Use--no-source-infoif you do not want to carry those raw values forward. - Exposure selection: Beacon
exposuresare populated from a curated set of OMOPconcept_idvalues. The candidate list is maintained inshare/db/concepts_candidates_2_exposure.csv.
Conversion Statusâ
| Route | Status | Notes |
|---|---|---|
PXF -> BFF individuals | Mature | Core pathway |
BFF individuals -> PXF | Mature | Used for round-trip conversion |
OMOP-CDM -> BFF individuals | Mature | Depends on available OMOP tables and concept lookup |
PXF -> BFF biosamples | Beta | Uses Phenopackets biosample content when present |
OMOP SPECIMEN -> BFF biosamples | Beta | Supports structured biosample measurements for specimen quantity |
CSV, REDCap, CDISC-ODM -> BFF | Beta | Depends on mapping-file quality |
openEHR -> BFF/PXF | Experimental | Canonical composition support is still evolving |