Phenopackets v2
PXF stands for Phenotype eXchange Format. Phenopackets v2 documentation.

Phenopackets organize information using top-level elements. Our software, Pheno-Ranker, specifically processes data from the Phenopacket element, serialized in PXF format.
Browsing PXF JSON data
PXF As Input
The examples below show the minimal command-line patterns. For the complete CLI reference, see Usage.
What happens with deeply nested arrays such as interpretations.diagnosis.genomicInterpretations?
The property genomicInterpretation presents some peculiarities for several reasons. It can have multiple nested levels or arrays, the key "id" may refer to a given patient, plus the key subjectOrBiosampleId refers to the same patient too!. This implies that users might be interested in the variants, but since patient ids will be in the flattened key, it will never match another patient.
Pheno-Ranker will handle this for you for the term interpretations. This is a dedicated PXF-specific transformation because genomic interpretation records can otherwise include patient-specific identifiers in the flattened keys. The approach taken is to transition from array data structures to objects.
Imagine you have a PXF data that looks like this:
{
"id": "Sample_1",
"interpretations": [
{
"id": "Interpretation_1",
"progressStatus": "SOLVED",
"diagnosis": {
"disease": {
"id": "OMIM:148600",
"label": "Disease 1"
},
"genomicInterpretations": [
{
"subjectOrBiosampleId": "Subject_1",
"interpretationStatus": "CAUSATIVE",
"variantInterpretation": {
"variationDescriptor": {
"geneContext": {
"valueId": "HGNC:25662",
"symbol": "AAGAB"
}
}
}
}
]
}
}
],
"subject": {
"id": "Subject_1"
}
}
The processed JSON will look like this:
{
"id": "Sample_1",
"interpretations": {
"OMIM:148600": {
"genomicInterpretations": {
"HGNC:25662": {
"interpretationStatus": "CAUSATIVE",
"variantInterpretation": {
"variationDescriptor": {
"geneContext": {
"symbol": "AAGAB",
"valueId": "HGNC:25662"
}
}
}
}
},
"progressStatus": "SOLVED"
}
},
"subject": {
"id": "Subject_1"
}
}
Now you can run Pheno-Ranker as usual. The flattened keys will look like this:
"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.interpretationStatus.CAUSATIVE" : 1,
"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.variantInterpretation.variationDescriptor.geneContext.symbol.AAGAB" : 1,
"interpretations.OMIM:148600.progressStatus.SOLVED" : 1,
Other examples of PXF nested array properties
From v1.08 onward, users do not need to transpose or manually rewrite nested arrays for comparison. Pheno-Ranker canonicalizes other nested arrays automatically from their meaningful content. This avoids differences caused only by array order in complex PXF properties such as:
"biosamples.diagnosticMarkers",
"biosamples.pathologicalTnmFinding",
"biosamples.phenotypicFeatures",
"diseases.clinicalTnmFinding",
"diseases.diseaseStage",
"measurements.complexValue.typedQuantities",
"medicalActions.treatment.doseIntervals"
If a nested object has no usable content after filtering, Pheno-Ranker keeps its numeric position instead of guessing an identity. You can still filter out noisy variables with the configuration file when they are not useful for similarity.
- Cohort mode
- Patient mode
Basic run:
pheno-ranker -r pxf.json
The default output is named matrix.txt. It is an N x N matrix with pairwise comparisons for all individuals.
Test dataset
We are going to use data from the phenopacket-store repository:
wget https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.zip
unzip all_phenopackets.zip
Instead of using the > 5K examples, we will work with a subset of 50, consolidated in an array:
# sudo apt install jq
jq -s '.' $(ls -1 */*json | shuf -n 50) > combined.json
And now we perform the calculation:
pheno-ranker -r combined.json -include-terms interpretations
For more information visit the cohort mode page.
Basic run:
pheno-ranker -r pxf.json -t patient.json
The output will be printed to STDOUT and to a file named rank.txt. The matching individuals will be sorted according to their Hamming distance to the reference patient. See additional details in the Patient Mode page.
For more information visit the patient mode page.