Skip to content

PXF

PXF stands for Phenotype eXchange Format. Phenopackets v2 documentation.

Phenopackets v2

Figure extracted from www.ga4gh.org

Phenopackets organize information using top-level elements. Our software, Pheno-Ranker, specifically processes data from the Phenopacket element, serialized in PXF format.

Browsing PXF JSON data

You can browse a public Phenopackets v2 file with onf of the following JSON viewers:

PXF (Phenopacket top-element) as input PXF

When using the pheno-ranker command-line interface, simply ensure the correct syntax is provided.

What happens with deeply nested arrays such as interpretations.diagnosis.genomicInterpretations?

The property genomicInterpretation presents some peculiarities for several reasons. It can have multiple nested levels or arrays, the key "id" may refer to a given patient, plus the key subjectOrBiosampleId referes to the same patient too!. This implies that users might be interested in the variants, but since patient ids will be in the flattened key, it will never match another patient.

Pheno-Ranker will handle this for you for the term interpretations. The approach taken is to transition from array properties to objects.

Imagine you have a PXF data that looks like this:

{
   "id": "Sample_1",
   "interpretations": [
     {
       "id": "Interpretation_1",
       "progressStatus": "SOLVED",
       "diagnosis": {
         "disease": {
           "id": "OMIM:148600",
           "label": "Disease 1"
         },
         "genomicInterpretations": [
           {
             "subjectOrBiosampleId": "Subject_1",
             "interpretationStatus": "CAUSATIVE",
             "variantInterpretation": {
               "variationDescriptor": {
                 "geneContext": {
                   "valueId": "HGNC:25662",
                   "symbol": "AAGAB"
                 }
               }
             }
           }
         ]
       }
     }
   ],
   "subject": {
     "id": "Subject_1"
   }
}

The processed JSON will look like this:

{
    "id": "Sample_1",
    "interpretations": {
        "OMIM:148600": {
            "genomicInterpretations": {
                "HGNC:25662": {
                    "interpretationStatus": "CAUSATIVE",
                    "variantInterpretation": {
                        "variationDescriptor": {
                            "geneContext": {
                                "symbol": "AAGAB",
                                "valueId": "HGNC:25662"
                            }
                        }
                    }
                }
            },
            "progressStatus": "SOLVED"
        }
    },
    "subject": {
        "id": "Subject_1"
    }
}

Now you can run Pheno-Ranker as usual. The flattened keys will look like this:

"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.interpretationStatus.CAUSATIVE" : 1,
"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.variantInterpretation.variationDescriptor.geneContext.symbol.AAGAB" : 1,
"interpretations.OMIM:148600.progressStatus.SOLVED" : 1,

Other examples of PXF nested array properties

Find below another examples of deeply nested properties. For these you have to pre-process your data:

"biosamples.diagnosticMarkers",
"biosamples.pathologicalTnmFinding",
"biosamples.phenotypicFeatures",
"diseases.clinicalTnmFinding",
"diseases.diseaseStage",
"measurements.complexValue.typedQuantities",
"medicalActions.treatment.doseIntervals"

Basic run:

pheno-ranker -r pxf.json

The default output is named matrix.txt and it's a N x N bidimensional matrix with a pairwise comparison of all individuals.

Test dataset

We are going to use data from the phenopacket-store repository:

wget https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.zip
unzip all_phenopackets.zip

Instead of using the > 5K examples, we will work with a subset of 50, consolidated in an array:

# sudo apt install jq 
jq -s '.' $(ls -1 */*json | shuf -n 50) > combined.json

And now we perform the calculation:

pheno-ranker -r combined.json -include-terms interpretations

For more information visit the cohort mode page.

Basic run:

pheno-ranker -r pxf.json -t patient.json

The output will be printed to STDOUT and to a file named rank.txt. The matching individuals will be sorted according to their Hamming distance to the reference patient. See aditional details in the Patient Mode page.

For more information visit the patient mode page.