Command-Line Interface
Convert-Pheno includes a command-line utility for file-based conversions. This is the primary way most users work with the project.
The CLI is organized around one input format, one output format, and optional controls for BFF entities, mapping files, ontology search, streaming, and source provenance.
Start with One of Theseâ
PXF to BFF Commandâ
Convert a Phenopacket to BFF individuals:
convert-pheno -ipxf phenopacket.json -obff individuals.json
Multi-Entity BFF Commandâ
Convert a Phenopacket to multiple BFF entities:
convert-pheno -ipxf phenopacket.json -obff \
--entities individuals biosamples datasets cohorts \
--out-dir bff_out/
OMOP to BFF Commandâ
Convert OMOP-CDM CSV tables to BFF:
convert-pheno -iomop PERSON.csv CONCEPT.csv CONDITION_OCCURRENCE.csv \
-obff individuals.json
Mapping-File Commandâ
Convert a mapped CSV file and audit ontology searches:
convert-pheno -icsv clinical.csv \
--mapping-file mapping.yaml \
--search-audit-tsv search-audit.tsv \
-obff individuals.json
BFF to PXF Commandâ
Convert BFF individuals to Phenopackets:
convert-pheno -ibff individuals.json -opxf phenopacket.json
Use Conversion Recipes for short examples. Use this page when you want to understand the CLI model and option groups.
Command Modelâ
Every command has three parts:
| Part | Meaning | Example |
|---|---|---|
| Input | What format is being read | -ipxf phenopacket.json |
| Output | What format is being written | -obff individuals.json |
| Options | Extra behavior | --entities individuals biosamples --out-dir out/ |
The most important distinction is BFF output mode:
-obff FILEwrites oneindividualsfile.-obff --entities ... --out-dir DIRwrites one file per requested BFF entity.
Detailed CLI reference
Basic patternâ
The command is organized around one input format and one output format:
convert-pheno -i <input-type> <infile> -o <output-type> <outfile> [options]
Both CLI styles are supported:
- Generic form:
-i pxf ... -o bff ... - Compact form:
-ipxf ... -obff ...
The compact flags are still the ones most users rely on:
-ipxf,-ibff,-iomop,-iopenehr,-iredcap,-icdisc,-icsv-obff,-opxf,-oomop,-ocsv,-ojsonf,-ojsonld
openEHR input support is currently experimental. The current CLI path is aimed at EHRbase-style canonical JSON compositions, and it currently supports BFF and PXF output.
You can always check the current built-in help with:
convert-pheno --help
BFF output modesâ
BFF output has two explicit CLI forms:
individuals-only BFF output:-obff FILE- Entity-aware output:
-obff --entities ... --out-dir DIR
In other words, --entities does not replace -obff. It refines which BFF entities are written after you have already selected BFF as the output format.
Common examplesâ
Convert Phenopackets to the individuals-only BFF output:
convert-pheno -ipxf pxf.json -obff individuals.json
The same conversion with the generic form:
convert-pheno -i pxf pxf.json -o bff individuals.json
Convert Phenopackets to entity-aware BFF output:
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples datasets cohorts --out-dir out/
Convert mapping-file input to individuals, datasets, and cohorts:
convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff --entities individuals datasets cohorts --out-dir out/
Convert both individuals and biosamples, while overriding the biosample filename:
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples --out-dir out/ --out-name biosamples=samples.json
Convert OMOP SPECIMEN rows to Beacon biosamples:
convert-pheno -iomop PERSON.csv CONCEPT.csv SPECIMEN.csv -obff --entities biosamples --out-dir out/
Create a smaller BFF export without copied raw source payloads:
convert-pheno -iomop omop.sql -obff individuals.json --no-source-info
Convert a large OMOP SQL dump incrementally:
convert-pheno -iomop omop.sql.gz -obff individuals.json.gz --stream --ohdsi-db
Convert an openEHR patient envelope to BFF:
convert-pheno -i openehr patient-set.json -o bff individual.json
Convert an openEHR patient envelope to PXF:
convert-pheno -i openehr patient-set.json -o pxf phenopacket.json
Write a TSV audit of ontology lookups during a mapping-file conversion:
convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff individuals.json --search-audit-tsv search-audit.tsv
Convert BFF individuals to Phenopackets:
convert-pheno -ibff individuals.json -opxf pxf.json
Convert BFF individuals to Phenopackets while changing the fallback subject.vitalStatus used when no source value is available:
convert-pheno -ibff individuals.json -opxf pxf.json --default-vital-status UNKNOWN_STATUS
Notesâ
-obffkeeps the individuals-onlyBFFbehavior.BFFentity mode is also explicit: use-obff --entities ... --out-dir DIR.- When
PXFinput containsbiosamples, the individuals-only-obff FILEpath still writes onlyindividuals. In that mode,convert-phenowarns and preserves the biosamples underinfo.phenopacket.biosamples. --entitiescan be used withBFFoutput. The supported output entities areindividuals,biosamples,datasets, andcohorts.biosamplesare currently emitted as first-class output from-ipxfinput when biosample data is present.datasetsandcohortsare synthesized from the normalizedindividualscollection.- In mapping-file conversions, the top-level
beaconsection can override metadata for synthesizeddatasetsandcohorts. - This mapping-based augmentation is currently available only for
csv2bff,redcap2bff, andcdisc2bff, which are the routes that use a mapping file. --entitiesnarrowsBFFoutput. It must be combined with-obffand--out-dir.--out-name key=filelets you override one multi-file output name. Use entity keys forBFFentity mode and table keys forOMOPoutput.--no-source-infoomits raw source provenance copied intoBFFinfo, such asOMOP_columns,CSV_columns, andREDCap_columns. Mapped fields andinfo.convertPhenoare kept.--search-audit-tsv FILEwrites a tab-separated audit of ontology search results for mapping-file-driven conversions such ascsv2bff,redcap2bff, andcdisc2bff, including the effective configured search mode, whether each lookup matched the DB or fell back toNA, and the per-row lookup resolution (exact,similarity, orfallback_na).--streamis mainly relevant for large OMOP inputs. Use--no-streamto force the default in-memory mode when a wrapper or previous option may have enabled streaming.
Important optionsâ
Mapping-file conversionsâ
--mapping-file FILEsupplies the YAML or JSON mapping file used bycsv2bff,redcap2bff,cdisc2bff, and related conversions.--redcap-dictionary FILEor-rcd FILEsupplies the REDCap data dictionary required by REDCap and CDISC input conversions.--schema-file FILElets you validate mapping files against an alternative JSON Schema.--self-validate-schemaor-svsperforms a self-validation of the mapping schema itself. This is mainly an author or development check and may require SSL support in the Perl environment.--search-audit-tsv FILEwrites a TSV report of ontology lookups performed during mapping-file-driven conversions. The audit includes both row-level results and the effective search settings used for the run.--print-hidden-labelsor-phlpreserves original text labels before ontology mapping in_labelfields.
Ontology search tuningâ
--search exact|mixed|fuzzyselects the ontology lookup strategy. Default:exact.--text-similarity-method cosine|diceselects the token-similarity method used bymixedandfuzzy. Default:cosine.--min-text-similarity-score FLOATsets the minimum score accepted bymixedandfuzzy. Default:0.8.--levenshtein-weight FLOATsets the normalized Levenshtein weight used byfuzzy. Default:0.1.
For the search behavior itself, including examples and threshold tradeoffs, see the DB search explainer.
OMOP-specific optionsâ
--ohdsi-dbenables Athena-OHDSI lookup when OMOP data needs concepts not already present in the local export.--path-to-ohdsi-db DIRpoints to the directory containingohdsi.db.--omop-tables TABLE ...restricts which OMOP-CDM tables are processed, whileCONCEPTandPERSONstay included.--exposures-file FILEprovides a CSV list of OMOPconcept_idvalues to be treated as exposures.--streamenables incremental OMOP processing for-iomop ... -obffoutput. Use--no-streamto explicitly keep the default non-streaming mode.--sql2csvprints SQL tables instead of converting them.--max-lines-sql Nlimits how many lines are read per SQL table. Default:500.
openEHR-specific optionsâ
-iopenehr FILE ...or-i openehr FILE ...accepts openEHR JSON or YAML input as patient-bearing envelopes or composition sets.- openEHR input must carry a resolvable patient identifier in the payload or envelope; otherwise the conversion fails.
- multiple openEHR files are supported when patient identity can be resolved; multi-patient input is grouped automatically before mapping.
- The current openEHR CLI path is experimental and currently supports BFF and PXF output.
General optionsâ
--separator CHARor--sep CHARoverrides the CSV delimiter. For.csvfiles the default remains;.--username NAMEor-u NAMEoverrides the username stored in conversion metadata.--default-vital-status ALIVE|DECEASED|UNKNOWN_STATUSsets the fallbacksubject.vitalStatus.statusused forPXFoutput when no source-derived value is available. Default:ALIVE.--source-info/--no-source-infocontrols whether raw source payloads are preserved inBFFinfo. Default:--source-info.--log [FILE]writes the resolved request/configuration JSON. If no filename is provided, the default isconvert-pheno-log.jsonin--out-dir.--color/--no-colorcontrols colored terminal output. Default:--color.--testsuppresses time-varying metadata so generated files are stable for comparisons.--verboseor-vprints progress information.--debug LEVELprints the resolved internal request and extra debugging output. WithLEVEL >= 2, it also prints a compact SQLite lookup summary (requests, cache hits, DB lookups, search resolution, and SQL timings).
More helpâ
- Usage for more examples
- Download & Installation for setup
- Google Colab tutorial if you want a disposable environment