Skip to main content

Command-Line Interface

Convert-Pheno includes a command-line utility for file-based conversions. This is the primary way most users work with the project.

The CLI is organized around one input format, one output format, and optional controls for BFF entities, mapping files, ontology search, streaming, and source provenance.

Start with One of These​

PXF to BFF Command​

Convert a Phenopacket to BFF individuals:

convert-pheno -ipxf phenopacket.json -obff individuals.json

Multi-Entity BFF Command​

Convert a Phenopacket to multiple BFF entities:

convert-pheno -ipxf phenopacket.json -obff \
--entities individuals biosamples datasets cohorts \
--out-dir bff_out/

OMOP to BFF Command​

Convert OMOP-CDM CSV tables to BFF:

convert-pheno -iomop PERSON.csv CONCEPT.csv CONDITION_OCCURRENCE.csv \
-obff individuals.json

Mapping-File Command​

Convert a mapped CSV file and audit ontology searches:

convert-pheno -icsv clinical.csv \
--mapping-file mapping.yaml \
--search-audit-tsv search-audit.tsv \
-obff individuals.json

BFF to PXF Command​

Convert BFF individuals to Phenopackets:

convert-pheno -ibff individuals.json -opxf phenopacket.json
Need a command for a specific route?

Use Conversion Recipes for short examples. Use this page when you want to understand the CLI model and option groups.

Command Model​

Every command has three parts:

PartMeaningExample
InputWhat format is being read-ipxf phenopacket.json
OutputWhat format is being written-obff individuals.json
OptionsExtra behavior--entities individuals biosamples --out-dir out/

The most important distinction is BFF output mode:

  • -obff FILE writes one individuals file.
  • -obff --entities ... --out-dir DIR writes one file per requested BFF entity.
Detailed CLI reference

Basic pattern​

The command is organized around one input format and one output format:

convert-pheno -i <input-type> <infile> -o <output-type> <outfile> [options]

Both CLI styles are supported:

  • Generic form: -i pxf ... -o bff ...
  • Compact form: -ipxf ... -obff ...

The compact flags are still the ones most users rely on:

  • -ipxf, -ibff, -iomop, -iopenehr, -iredcap, -icdisc, -icsv
  • -obff, -opxf, -oomop, -ocsv, -ojsonf, -ojsonld
Note

openEHR input support is currently experimental. The current CLI path is aimed at EHRbase-style canonical JSON compositions, and it currently supports BFF and PXF output.

You can always check the current built-in help with:

convert-pheno --help

BFF output modes​

BFF output has two explicit CLI forms:

  • individuals-only BFF output: -obff FILE
  • Entity-aware output: -obff --entities ... --out-dir DIR

In other words, --entities does not replace -obff. It refines which BFF entities are written after you have already selected BFF as the output format.

Common examples​

Convert Phenopackets to the individuals-only BFF output:

convert-pheno -ipxf pxf.json -obff individuals.json

The same conversion with the generic form:

convert-pheno -i pxf pxf.json -o bff individuals.json

Convert Phenopackets to entity-aware BFF output:

convert-pheno -ipxf pxf.json -obff --entities individuals biosamples datasets cohorts --out-dir out/

Convert mapping-file input to individuals, datasets, and cohorts:

convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff --entities individuals datasets cohorts --out-dir out/

Convert both individuals and biosamples, while overriding the biosample filename:

convert-pheno -ipxf pxf.json -obff --entities individuals biosamples --out-dir out/ --out-name biosamples=samples.json

Convert OMOP SPECIMEN rows to Beacon biosamples:

convert-pheno -iomop PERSON.csv CONCEPT.csv SPECIMEN.csv -obff --entities biosamples --out-dir out/

Create a smaller BFF export without copied raw source payloads:

convert-pheno -iomop omop.sql -obff individuals.json --no-source-info

Convert a large OMOP SQL dump incrementally:

convert-pheno -iomop omop.sql.gz -obff individuals.json.gz --stream --ohdsi-db

Convert an openEHR patient envelope to BFF:

convert-pheno -i openehr patient-set.json -o bff individual.json

Convert an openEHR patient envelope to PXF:

convert-pheno -i openehr patient-set.json -o pxf phenopacket.json

Write a TSV audit of ontology lookups during a mapping-file conversion:

convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff individuals.json --search-audit-tsv search-audit.tsv

Convert BFF individuals to Phenopackets:

convert-pheno -ibff individuals.json -opxf pxf.json

Convert BFF individuals to Phenopackets while changing the fallback subject.vitalStatus used when no source value is available:

convert-pheno -ibff individuals.json -opxf pxf.json --default-vital-status UNKNOWN_STATUS

Notes​

  • -obff keeps the individuals-only BFF behavior.
  • BFF entity mode is also explicit: use -obff --entities ... --out-dir DIR.
  • When PXF input contains biosamples, the individuals-only -obff FILE path still writes only individuals. In that mode, convert-pheno warns and preserves the biosamples under info.phenopacket.biosamples.
  • --entities can be used with BFF output. The supported output entities are individuals, biosamples, datasets, and cohorts.
  • biosamples are currently emitted as first-class output from -ipxf input when biosample data is present.
  • datasets and cohorts are synthesized from the normalized individuals collection.
  • In mapping-file conversions, the top-level beacon section can override metadata for synthesized datasets and cohorts.
  • This mapping-based augmentation is currently available only for csv2bff, redcap2bff, and cdisc2bff, which are the routes that use a mapping file.
  • --entities narrows BFF output. It must be combined with -obff and --out-dir.
  • --out-name key=file lets you override one multi-file output name. Use entity keys for BFF entity mode and table keys for OMOP output.
  • --no-source-info omits raw source provenance copied into BFF info, such as OMOP_columns, CSV_columns, and REDCap_columns. Mapped fields and info.convertPheno are kept.
  • --search-audit-tsv FILE writes a tab-separated audit of ontology search results for mapping-file-driven conversions such as csv2bff, redcap2bff, and cdisc2bff, including the effective configured search mode, whether each lookup matched the DB or fell back to NA, and the per-row lookup resolution (exact, similarity, or fallback_na).
  • --stream is mainly relevant for large OMOP inputs. Use --no-stream to force the default in-memory mode when a wrapper or previous option may have enabled streaming.

Important options​

Mapping-file conversions​

  • --mapping-file FILE supplies the YAML or JSON mapping file used by csv2bff, redcap2bff, cdisc2bff, and related conversions.
  • --redcap-dictionary FILE or -rcd FILE supplies the REDCap data dictionary required by REDCap and CDISC input conversions.
  • --schema-file FILE lets you validate mapping files against an alternative JSON Schema.
  • --self-validate-schema or -svs performs a self-validation of the mapping schema itself. This is mainly an author or development check and may require SSL support in the Perl environment.
  • --search-audit-tsv FILE writes a TSV report of ontology lookups performed during mapping-file-driven conversions. The audit includes both row-level results and the effective search settings used for the run.
  • --print-hidden-labels or -phl preserves original text labels before ontology mapping in _label fields.

Ontology search tuning​

  • --search exact|mixed|fuzzy selects the ontology lookup strategy. Default: exact.
  • --text-similarity-method cosine|dice selects the token-similarity method used by mixed and fuzzy. Default: cosine.
  • --min-text-similarity-score FLOAT sets the minimum score accepted by mixed and fuzzy. Default: 0.8.
  • --levenshtein-weight FLOAT sets the normalized Levenshtein weight used by fuzzy. Default: 0.1.

For the search behavior itself, including examples and threshold tradeoffs, see the DB search explainer.

OMOP-specific options​

  • --ohdsi-db enables Athena-OHDSI lookup when OMOP data needs concepts not already present in the local export.
  • --path-to-ohdsi-db DIR points to the directory containing ohdsi.db.
  • --omop-tables TABLE ... restricts which OMOP-CDM tables are processed, while CONCEPT and PERSON stay included.
  • --exposures-file FILE provides a CSV list of OMOP concept_id values to be treated as exposures.
  • --stream enables incremental OMOP processing for -iomop ... -obff output. Use --no-stream to explicitly keep the default non-streaming mode.
  • --sql2csv prints SQL tables instead of converting them.
  • --max-lines-sql N limits how many lines are read per SQL table. Default: 500.

openEHR-specific options​

  • -iopenehr FILE ... or -i openehr FILE ... accepts openEHR JSON or YAML input as patient-bearing envelopes or composition sets.
  • openEHR input must carry a resolvable patient identifier in the payload or envelope; otherwise the conversion fails.
  • multiple openEHR files are supported when patient identity can be resolved; multi-patient input is grouped automatically before mapping.
  • The current openEHR CLI path is experimental and currently supports BFF and PXF output.

General options​

  • --separator CHAR or --sep CHAR overrides the CSV delimiter. For .csv files the default remains ;.
  • --username NAME or -u NAME overrides the username stored in conversion metadata.
  • --default-vital-status ALIVE|DECEASED|UNKNOWN_STATUS sets the fallback subject.vitalStatus.status used for PXF output when no source-derived value is available. Default: ALIVE.
  • --source-info / --no-source-info controls whether raw source payloads are preserved in BFF info. Default: --source-info.
  • --log [FILE] writes the resolved request/configuration JSON. If no filename is provided, the default is convert-pheno-log.json in --out-dir.
  • --color / --no-color controls colored terminal output. Default: --color.
  • --test suppresses time-varying metadata so generated files are stable for comparisons.
  • --verbose or -v prints progress information.
  • --debug LEVEL prints the resolved internal request and extra debugging output. With LEVEL >= 2, it also prints a compact SQLite lookup summary (requests, cache hits, DB lookups, search resolution, and SQL timings).

More help​