Skip to main content

OMOP to BFF

Information

The Beacon v2 schema enforces the presence of specific properties to achieve successful validation. In cases where no suitable match is found, DEFAULT values are employed to guarantee conformity.

OMOP SPECIMEN rows can now be emitted as first-class Beacon biosamples, but only in entity-aware BFF mode such as -obff --entities biosamples --out-dir out/ or -obff --entities individuals biosamples --out-dir out/.

OMOP SPECIMEN to Beacon biosamples support should still be considered experimental. The mapping is implemented and covered by local tests and schema validation, but it is still pending review and validation with external collaborators.

With --stream, OMOP BFF output is written as line-delimited JSON suitable for MongoDB-style ingestion. Stream mode supports individuals, biosamples, or both together, each written to its own file in --out-dir. Aggregate entities such as datasets and cohorts are not available in stream mode.

If biosamples are explicitly requested and the OMOP input does not contain the SPECIMEN table, the conversion fails with a focused error. If SPECIMEN exists but is empty, the conversion succeeds and emits an empty biosamples collection.

Version 0.31​

Target model: BFF

Entity: individuals, biosamples

By default, raw OMOP rows are preserved under OMOP_columns provenance blocks so converted BFF can be audited against the source data and source-specific OMOP values remain queryable. Use --no-source-info to omit these raw provenance payloads.

For the validation approach used during development, including Beacon/BFF schema validation, OMOP CSV validation, EUNOMIA checks, and manual/domain review, see Output Validation.

Terms​

diseases​

Source fieldTarget fieldNotes
CONDITION_OCCURRENCE.condition_concept_iddiseases.diseaseCodeMapped through OHDSI concepts
CONDITION_OCCURRENCE.condition_start_date + PERSON.birth_datetimediseases.ageOfOnsetDerived age
CONDITION_OCCURRENCE.condition_status_concept_iddiseases.stageDefaulted when absent
CONDITION_OCCURRENCE.*diseases._info.CONDITION_OCCURRENCE.OMOP_columnsProvenance payload
VISIT_OCCURRENCE contextdiseases._visitAdded when visit context is available
missing CONDITION_OCCURRENCE.condition_status_concept_iddiseases.stageDefaults to NCIT:C126101 / Not Available

ethnicity​

Source fieldTarget fieldNotes
PERSON.race_source_valueethnicityNormalized through ontology lookup

exposures​

Source fieldTarget fieldNotes
OBSERVATION.observation_concept_idexposures.exposureCodeOnly observations classified as exposures are used
OBSERVATION.observation_date + PERSON.birth_datetimeexposures.ageAtExposureDerived age
OBSERVATION.observation_dateexposures.dateDirect
OBSERVATION.unit_concept_idexposures.unitDefaulted when absent
OBSERVATION.value_as_numberexposures.value\N is converted to -1
DEFAULTexposures.durationAdded for Beacon completeness
OBSERVATION.*exposures._info.OBSERVATION.OMOP_columnsProvenance payload
missing OBSERVATION.unit_concept_idexposures.unitDefaults to NCIT:C126101 / Not Available
DEFAULTexposures.durationDefaults to P0Y in the OMOP-specific path
OBSERVATION.value_as_number = \Nexposures.valueDefaults to -1

geographicOrigin​

Source fieldTarget fieldNotes
OBSERVATION.value_as_concept_idgeographicOriginPreferred when the observation represents Country of birth; normalized through ontology lookup
OBSERVATION.value_as_stringgeographicOriginPreferred string fallback when the observation represents Country of birth
OBSERVATION.value_source_valuegeographicOriginPreferred string fallback when the observation represents Country of birth
PERSON.ethnicity_source_valuegeographicOriginFallback when no Country of birth observation can be resolved

id​

Source fieldTarget fieldNotes
PERSON.person_ididStringified in Beacon output

info​

Source fieldTarget fieldNotes
PERSON.*info.PERSON.OMOP_columnsRaw OMOP row is preserved
PERSON.birth_datetimeinfo.dateOfBirthTimestamp form
convertPhenoinfo.convertPhenoEmitted outside --test mode
missing PERSON.gender_concept_idnoneThe participant is skipped entirely in this direction

interventionsOrProcedures​

Source fieldTarget fieldNotes
PROCEDURE_OCCURRENCE.procedure_concept_idinterventionsOrProcedures.procedureCodeMapped through OHDSI concepts
PROCEDURE_OCCURRENCE.procedure_date + PERSON.birth_datetimeinterventionsOrProcedures.ageAtProcedureDerived age
PROCEDURE_OCCURRENCE.procedure_dateinterventionsOrProcedures.dateOfProcedureDirect
DEFAULTinterventionsOrProcedures.bodySiteAdded for Beacon completeness
PROCEDURE_OCCURRENCE.*interventionsOrProcedures._info.PROCEDURE_OCCURRENCE.OMOP_columnsProvenance payload
VISIT_OCCURRENCE contextinterventionsOrProcedures._visitAdded when visit context is available
DEFAULTinterventionsOrProcedures.bodySiteDefaults to NCIT:C126101 / Not Available

karyotypicSex​

NA

measures​

Source fieldTarget fieldNotes
MEASUREMENT.measurement_concept_idmeasures.assayCodeMapped through OHDSI concepts
MEASUREMENT.measurement_datemeasures.dateDirect
MEASUREMENT.value_as_concept_idmeasures.measurementValueUsed for ontology-valued measurements
MEASUREMENT.value_as_numbermeasures.measurementValue.quantity.valueUsed for numeric measurements
MEASUREMENT.unit_concept_idmeasures.measurementValue.quantity.unitDefaulted when absent
MEASUREMENT.operator_concept_id + numeric value + unitmeasures.measurementValue.quantity.referenceRangeDerived range payload
MEASUREMENT.measurement_date + PERSON.birth_datetimemeasures.observationMomentDerived age
MEASUREMENT.measurement_date + PERSON.birth_datetimemeasures.procedure.ageAtProcedureMirrors observationMoment
MEASUREMENT.measurement_datemeasures.procedure.dateOfProcedureDirect
MEASUREMENT.measurement_type_concept_idmeasures.procedure.procedureCodeMapped through OHDSI concepts
DEFAULTmeasures.procedure.bodySiteAdded for Beacon completeness
MEASUREMENT.*measures._info.MEASUREMENT.OMOP_columnsProvenance payload
VISIT_OCCURRENCE contextmeasures._visitAdded when visit context is available
missing MEASUREMENT.unit_concept_idmeasures.measurementValue.quantity.unitDefaults to NCIT:C126101 / Not Available
MEASUREMENT.value_as_number = \N and no value_as_concept_idmeasures.measurementValue.quantityDefaults to quantity -1 with Not Available unit and -1/-1 reference range
missing MEASUREMENT.measurement_concept_idnoneThe row is skipped rather than emitting a default measure
DEFAULTmeasures.procedure.bodySiteDefaults to NCIT:C126101 / Not Available

pedigrees​

NA

phenotypicFeatures​

Source fieldTarget fieldNotes
OBSERVATION.observation_concept_idphenotypicFeatures.featureTypeOnly non-exposure observations are used
OBSERVATION.observation_date + PERSON.birth_datetimephenotypicFeatures.onsetDerived age
OBSERVATION.*phenotypicFeatures._info.OBSERVATION.OMOP_columnsProvenance payload
VISIT_OCCURRENCE contextphenotypicFeatures._visitAdded when visit context is available

sex​

Source fieldTarget fieldNotes
PERSON.gender_concept_idsexMapped through OHDSI concepts and then normalized to Beacon terms
missing PERSON.gender_concept_idnoneThe participant is skipped before an individual is emitted

treatments​

Source fieldTarget fieldNotes
DRUG_EXPOSURE.drug_concept_idtreatments.treatmentCodeMapped through OHDSI concepts
DRUG_EXPOSURE.drug_exposure_start_date + PERSON.birth_datetimetreatments.ageAtOnsetDerived age
DEFAULTtreatments.routeOfAdministrationPlaceholder
DEFAULTtreatments.doseIntervalsInitialized as an empty list
DRUG_EXPOSURE.*treatments._info.DRUG_EXPOSURE.OMOP_columnsProvenance payload
VISIT_OCCURRENCE contexttreatments._visitAdded when visit context is available
DEFAULTtreatments.routeOfAdministrationDefaults to NCIT:C126101 / Not Available
DEFAULTtreatments.doseIntervalsDefaults to an empty list

Biosamples​

biosamples​

Source fieldTarget fieldNotes
SPECIMEN.specimen_idbiosamples.idStringified in Beacon output
SPECIMEN.person_idbiosamples.individualIdStringified in Beacon output
SPECIMEN.specimen_concept_idbiosamples.sampleOriginTypeMapped through OHDSI concepts; defaulted when absent
SPECIMEN.anatomic_site_concept_idbiosamples.sampleOriginDetailMapped through OHDSI concepts when present
SPECIMEN.specimen_type_concept_idbiosamples.obtentionProcedure.procedureCodeMapped through OHDSI concepts when present
SPECIMEN.specimen_datebiosamples.collectionDateDirect
SPECIMEN.specimen_date + PERSON.birth_datetimebiosamples.collectionMomentDerived age
SPECIMEN.disease_status_concept_idbiosamples.histologicalDiagnosisMapped through OHDSI concepts when present
SPECIMEN.quantitybiosamples.measurements.measurementValue.quantity.valueEmitted as a sample-level measurement when numeric
SPECIMEN.unit_concept_idbiosamples.measurements.measurementValue.quantity.unitMapped through OHDSI concepts when present
SPECIMEN.unit_source_valuebiosamples.measurements.measurementValue.quantity.unit.labelUsed as fallback unit label when no unit concept is available
OMOP:SPECIMEN.quantitybiosamples.measurements.assayCodeLocal valid CURIE identifying the OMOP source field; OMOP SPECIMEN has no measurement_concept_id equivalent
SPECIMEN.specimen_source_id / SPECIMEN.specimen_source_valuenoneKept only in provenance; not promoted to Beacon schema fields by default
DEFAULTbiosamples.biosampleStatusDefaulted for Beacon completeness
convertPhenobiosamples.info.convertPhenoEmitted outside --test mode
SPECIMEN.*biosamples.info.SPECIMEN.OMOP_columnsProvenance payload
missing SPECIMEN.specimen_concept_idbiosamples.sampleOriginTypeDefaults to NCIT:C126101 / Not Available
DEFAULTbiosamples.biosampleStatusDefaults to NCIT:C126101 / Not Available

SPECIMEN.quantity is promoted conservatively. OMOP provides the value and unit, but the SPECIMEN table does not include a measurement_concept_id equivalent for the measured sample property. For this reason, Convert-Pheno uses the valid local CURIE OMOP:SPECIMEN.quantity with label Specimen quantity as the Beacon assayCode, while the original OMOP columns remain available under biosamples.info.SPECIMEN.OMOP_columns.

When ontology ids are generated from OMOP CONCEPT.vocabulary_id and CONCEPT.concept_code, whitespace in the vocabulary prefix is replaced with underscores. For example, Type Concept becomes Type_Concept, producing ids such as Type_Concept:OMOP4976929.

About exposures

exposures terms are obtained from this CSV file. You can use a different csv file with the option --exposures-file.