Skip to main content

Mapping Steps

At a Glance​

Most conversions use BFF as the center model:

StepWhat HappensWhy It Matters
1Source data is normalized into Beacon v2 Models / BFFThis gives the software one consistent internal target
2BFF is optionally converted into the requested final modelThis keeps source-specific parsing separate from final-output serialization
3Unmapped source values are preserved when usefulUsers can audit the conversion and query source-specific fields later
Practical shortcut

If you only need commands, use Conversion Recipes. Use this page when you want to understand why mappings are structured the way they are.

Step 1: Conversion to the target model​

For most routes, Convert-Pheno first maps the input data to BFF, the Beacon v2 Models-based format that acts as the internal target or center model. From there, the data can remain as BFF or continue to other outputs such as PXF or OMOP CDM.

Convert-Pheno internal mapping steps
Why use Beacon v2 Models as the target model?
  • JSON Schema: Beacon v2 Models are defined with JSON Schema, which is useful for validation and inspection.
  • Additional properties: The Beacon v2 Models schema allows additional properties, which helps preserve source values that do not have a first-class target field.
  • Beacon v2 API alignment: BFF follows the data shape expected by Beacon v2-oriented deployments.
  • Multi-entity output: Beacon v2 Models provide entities beyond individuals, including biosamples, datasets, and cohorts.
  • Overlap with Phenopackets v2: Several clinical and phenotypic concepts are shared or closely aligned between the models.
Advanced mapping details, ontology preservation, and search behavior

Schema mapping​

When starting a new conversion between two data models, the first step is to map variables between the two data schemas.

Starting with version 0.31, some mapping-table drafts may use LLM assistance when the source model is especially dense or ambiguous. These mappings still require human review before being treated as project documentation.

Mapping strategy: External or hardcoded?

In the early stages of development, we considered configuration files for schema-to-schema mapping. Deeply nested JSON structures made that impractical for most routes. The exception is mapping-file input such as REDCap, CSV, and CDISC-ODM, where source fields are project-specific and user configuration is necessary.

In the Mapping tables section (accessible via the 'Technical Details' tab on the left navigation bar), we outline the equivalencies between different schemas. These tables fulfill several purposes:

  1. They summarize the intended mapping without requiring readers to inspect the code first.
  2. Domain experts can review the mapping assumptions and suggest corrections.
  3. Contributors can use them as a starting point for new conversion routes.
Notice

Accurately mapping between clinical data standards is a substantial task. Some mappings may need revision as new source examples and domain feedback become available.

Contributing

While creating the code for a new format can be challenging, modifying properties in an existing one is much easier. Feel free to reach us should you plan to contribute.

From table mappings to code​

These tables serve as a reference for implementing Convert-Pheno's source code. Each format conversion has a dedicated Perl submodule, and during implementation we verify that the converted output conforms to the final target data schema.

Lossless or lossy conversion?​

When converting data from one data standard to another, it is important to consider the possibility of losing information due to differences in schema and field mapping. To mitigate this, we aimed for a lossless conversion by incorporating non-mappable variables as additionalProperties within the Beacon v2 Models schema. This allows users to access the original variables and their values through database queries, especially when using non-relational databases like MongoDB.

During the conversion process, handling variables that cannot be directly mapped can result in one of two scenarios:

Often, the input data model has variables that do not directly map to the target but are still useful to retain in the output format. If the target format allows for extra properties in a given term (as BFF does), these original variables are stored under the _info property (or _ + β€˜property name’). This commonly happens in conversions from OMOP CDM to BFF.

Example extracted from omop2bff conversion:

See example
"interventionsOrProcedures" : [
{
"_info" : {
"PROCEDURE_OCCURRENCE" : {
"OMOP_columns" : {
"modifier_concept_id" : 0,
"modifier_source_value" : null,
"person_id" : 2,
"procedure_concept_id" : 4163872,
"procedure_date" : "1955-10-22",
"procedure_datetime" : "1955-10-22 00:00:00",
"procedure_occurrence_id" : 6,
"procedure_source_concept_id" : 4163872,
"procedure_source_value" : 399208008,
"procedure_type_concept_id" : 38000275,
"provider_id" : "\\N",
"quantity" : "\\N",
"visit_detail_id" : 0,
"visit_occurrence_id" : 103
}
}
},
"ageAtProcedure" : {
"age" : {
"iso8601duration" : "35Y"
}
},
"dateOfProcedure" : "1955-10-22",
"procedureCode" : {
"id" : "SNOMED:399208008",
"label" : "Plain chest X-ray"
}
}
]

Example extracted from redcap2bff conversion:

See example
"treatments" : [
{
"_info" : {
"dose" : null,
"drug" : "budesonide",
"drug_name" : "budesonide",
"duration" : null,
"field" : "budesonide_oral_status",
"route" : "oral",
"start" : null,
"status" : "never treated",
"value" : 1
},
"doseIntervals" : [],
"routeOfAdministration" : {
"id" : "NCIT:C38288",
"label" : "Oral Route of Administration"
},
"treatmentCode" : {
"id" : "NCIT:C1027",
"label" : "Budesonide"
}
}
]

Example of longitudinal data stored under _visit in a omop2bff conversion:

See example
"_visit" : {
"_info" : {
"VISIT_OCCURRENCE" : {
"OMOP_columns" : {
"admitting_source_concept_id" : 0,
"admitting_source_value" : null,
"care_site_id" : "\\N",
"discharge_to_concept_id" : 0,
"discharge_to_source_value" : null,
"person_id" : 3,
"preceding_visit_occurrence_id" : 347,
"provider_id" : "\\N",
"visit_concept_id" : 9201,
"visit_end_date" : "1972-12-21",
"visit_end_datetime" : "1972-12-21 00:00:00",
"visit_occurrence_id" : 312,
"visit_source_concept_id" : 0,
"visit_source_value" : "5d035dd1-30d9-4389-b94c-64947bf1f18c",
"visit_start_date" : "1972-12-20",
"visit_start_datetime" : "1972-12-20 00:00:00",
"visit_type_concept_id" : 44818517
}
}
},
"concept" : {
"id" : "Visit:IP",
"label" : "Inpatient Visit"
},
"end_date" : "1972-12-21T00:00:00Z",
"id" : "312",
"occurrence_id" : 312,
"start_date" : "1972-12-20T00:00:00Z",
"type" : {
"id" : "Visit_Type:OMOP4822465",
"label" : "Visit derived from encounter on claim"
}
},
"featureType" : {
"id" : "SNOMED:428251008",
"label" : "History of appendectomy"
},
"onset" : {
"iso8601duration" : "56Y"
}
}

Preservation and augmentation of ontologies​

One of the advantages of Beacon/Phenopackets v2 is that they do not prescribe the use of specific ontologies, thus allowing us to retain the original ontologies, except to fill in missing terms in required fields.

Which ontologies/terminologies are supported?

If the input files contain ontology terms, the ontologies will be preserved and remain intact after the conversion process, except for:

  • Beacon v2 Models and Phenopackets v2: the property sex is converted to NCI Thesaurus via database search.
  • OMOP CDM: the properties sex, ethnicity, and geographicOrigin are converted to NCI Thesaurus via database search.
CSVREDCapCDISC-ODMOMOP-CDMPhenopackets v2Beacon v2 Models
Data mappingβœ“βœ“βœ“βœ“βœ“βœ“
Add ontologiesβœ“βœ“βœ“--ohdsi-db

Database Search Feature

For input types that do not contain ontologies, such as CSV, REDCap, and CDISC-ODM, we perform a database search to fetch ontologies from a variety of trusted databases. Supported databases include:

  • Athena-OHDSI standardized vocabulary, which includes multiple terminologies, such as SNOMED, RxNorm or LOINC
  • NCI Thesaurus
  • ICD-10 terminology
  • CDISC (Study Data Tabulation Model Terminology)
  • OMIM Online Mendelian Inheritance in Man
  • HPO Human Phenotype Ontology (Note that prefixes are HP:, without the O)

Mapping-file routes can resolve source labels with --search exact, --search mixed, or --search fuzzy. The detailed scoring behavior, examples, and threshold guidance are documented separately in DB Search.

Step 2: Conversion to the final model​

Data validation

Output validation is described in Output Validation, including the Beacon/BFF, Phenopackets/PXF, and OMOP CSV validators used during development.

To Phenopackets​

If the output is set to Phenopackets v2 then a second step (bff2pxf) is performed (see diagram above).

BFF and PXF community alignment

At present, we have prioritized mapping the terms that we deem most critical in facilitating basic semantic interoperability. We anticipate that Beacon v2 Models will become more aligned with Phenopackets v2, which will simplify the conversion process in future updates. We aim to refine the mappings in future iterations, with the community providing a wider range of case studies.

To OMOP CDM​

If the output is set to OMOP CDM then a second step (bff2omop) is performed (see diagram above).