Command-line interface
Convert-Pheno includes a command-line utility for file-based conversions. This is the primary way most users work with the project.
See common usage Read the tutorial Check installation
Basic pattern¶
The command is organized around one input format and one output format:
Both CLI styles are supported:
- Generic form:
-i pxf ... -o bff ... - Compact form:
-ipxf ... -obff ...
The compact flags are still the ones most users rely on:
-ipxf,-ibff,-iomop,-iopenehr,-iredcap,-icdisc,-icsv-obff,-opxf,-oomop,-ocsv,-ojsonf,-ojsonld
Note
openEHR input support is currently experimental. The current CLI path is aimed at EHRbase-style canonical JSON compositions, and it currently supports BFF and PXF output.
You can always check the current built-in help with:
BFF output modes¶
BFF output has two explicit CLI forms:
individuals-only BFF output:-obff FILE- Entity-aware output:
-obff --entities ... --out-dir DIR
In other words, --entities does not replace -obff. It refines which BFF entities are written after you have already selected BFF as the output format.
Common examples¶
Convert Phenopackets to the individuals-only BFF output:
The same conversion with the generic form:
Convert Phenopackets to entity-aware BFF output:
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples datasets cohorts --out-dir out/
Convert a mapping-file workflow to individuals, datasets, and cohorts:
convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff --entities individuals datasets cohorts --out-dir out/
Convert both individuals and biosamples, while overriding the biosample filename:
convert-pheno -ipxf pxf.json -obff --entities individuals biosamples --out-dir out/ --out-name biosamples=samples.json
Convert OMOP SPECIMEN rows to Beacon biosamples:
Convert a large OMOP SQL dump incrementally:
Convert an openEHR patient envelope to BFF:
Convert an openEHR patient envelope to PXF:
Write a TSV audit of ontology lookups during a mapping-file conversion:
convert-pheno -icsv data.csv --mapping-file mapping.yaml -obff individuals.json --search-audit-tsv search-audit.tsv
Convert BFF individuals to Phenopackets:
Convert BFF individuals to Phenopackets while changing the fallback subject.vitalStatus used when no source value is available:
Notes¶
-obffkeeps the individuals-onlyBFFbehavior.BFFentity mode is also explicit: use-obff --entities ... --out-dir DIR.- When
PXFinput containsbiosamples, the individuals-only-obff FILEpath still writes onlyindividuals. In that mode,convert-phenowarns and preserves the biosamples underinfo.phenopacket.biosamples. --entitiescan be used withBFFoutput. The supported output entities areindividuals,biosamples,datasets, andcohorts.biosamplesare currently emitted as first-class output from-ipxfinput when biosample data is present.datasetsandcohortsare synthesized from the normalizedindividualscollection.- In mapping-file workflows, the top-level
beaconsection can override metadata for synthesizeddatasetsandcohorts. - This mapping-based augmentation is currently available only for
csv2bff,redcap2bff, andcdisc2bff, which are the routes that use a mapping file. --entitiesnarrowsBFFoutput. It must be combined with-obffand--out-dir.--out-name key=filelets you override one multi-file output name. Use entity keys forBFFentity mode and table keys forOMOPoutput.--search-audit-tsv FILEwrites a tab-separated audit of ontology search results for mapping-file-driven conversions such ascsv2bff,redcap2bff, andcdisc2bff, including the effective configured search mode, whether each lookup matched the DB or fell back toNA, and the per-row lookup resolution (exact,similarity, orfallback_na).--streamis mainly relevant for large OMOP inputs.
Important options¶
Mapping-file conversions¶
--mapping-file FILEsupplies the YAML or JSON mapping file used bycsv2bff,redcap2bff,cdisc2bff, and related conversions.--redcap-dictionary FILEor-rcd FILEsupplies the REDCap data dictionary required by REDCap and CDISC input workflows.--schema-file FILElets you validate mapping files against an alternative JSON Schema.--self-validate-schemaor-svsperforms a self-validation of the mapping schema itself. This is mainly an author or development check and may require SSL support in the Perl environment.--search-audit-tsv FILEwrites a TSV report of ontology lookups performed during mapping-file-driven conversions. The audit includes both row-level results and the effective search settings used for the run.--print-hidden-labelsor-phlpreserves original text labels before ontology mapping in_labelfields.
Ontology search tuning¶
--search exact|mixed|fuzzyselects the ontology lookup strategy. Default:exact.--text-similarity-method cosine|diceselects the token-similarity method used bymixedandfuzzy. Default:cosine.--min-text-similarity-score FLOATsets the minimum score accepted bymixedandfuzzy. Default:0.8.--levenshtein-weight FLOATsets the normalized Levenshtein weight used byfuzzy. Default:0.1.
For the search behavior itself, including examples and threshold tradeoffs, see the DB search explainer.
OMOP-specific options¶
--ohdsi-dbenables Athena-OHDSI lookup when OMOP data needs concepts not already present in the local export.--path-to-ohdsi-db DIRpoints to the directory containingohdsi.db.--omop-tables TABLE ...restricts which OMOP-CDM tables are processed, whileCONCEPTandPERSONstay included.--exposures-file FILEprovides a CSV list of OMOPconcept_idvalues to be treated as exposures.--streamenables incremental OMOP processing for individuals-only-obffoutput.--sql2csvprints SQL tables instead of converting them.--max-lines-sql Nlimits how many lines are read per SQL table. Default:500.
openEHR-specific options¶
-iopenehr FILE ...or-i openehr FILE ...accepts openEHR JSON or YAML input as patient-bearing envelopes or composition sets.- openEHR input must carry a resolvable patient identifier in the payload or envelope; otherwise the conversion fails.
- multiple openEHR files are supported when patient identity can be resolved; multi-patient input is grouped automatically before mapping.
- The current openEHR CLI path is experimental and currently supports BFF and PXF output.
General options¶
--separator CHARor--sep CHARoverrides the CSV delimiter. For.csvfiles the default remains;.--username NAMEor-u NAMEoverrides the username stored in conversion metadata.--default-vital-status ALIVE|DECEASED|UNKNOWN_STATUSsets the fallbacksubject.vitalStatus.statusused forPXFoutput when no source-derived value is available. Default:ALIVE.--testsuppresses time-varying metadata so generated files are stable for comparisons.--verboseor-vprints progress information.--debug LEVELprints the resolved internal request and extra debugging output. WithLEVEL >= 2, it also prints a compact SQLite lookup summary (requests, cache hits, DB lookups, search resolution, and SQL timings).
More help¶
- Usage for more examples
- Download & Installation for setup
- Google Colab tutorial if you want a disposable environment