PXF
PXF stands for Phenotype eXchange Format. Phenopackets v2 documentation.
Phenopackets organize information using top-level elements. Our software, Pheno-Ranker
, specifically processes data from the Phenopacket element, serialized in PXF format.
Browsing PXF JSON
data
You can browse a public Phenopackets v2 file with onf of the following JSON viewers:
PXF (Phenopacket top-element) as input ¶
When using the pheno-ranker
command-line interface, simply ensure the correct syntax is provided.
What happens with deeply nested arrays such as interpretations.diagnosis.genomicInterpretations
?
The property genomicInterpretation presents some peculiarities for several reasons. It can have multiple nested levels or arrays, the key "id"
may refer to a given patient, plus the key subjectOrBiosampleId
referes to the same patient too!. This implies that users might be interested in the variants, but since patient ids will be in the flattened key, it will never match another patient.
Pheno-Ranker
will handle this for you for the term interpretations
. The approach taken is to transition from array data structures to objects.
Imagine you have a PXF
data that looks like this:
{
"id": "Sample_1",
"interpretations": [
{
"id": "Interpretation_1",
"progressStatus": "SOLVED",
"diagnosis": {
"disease": {
"id": "OMIM:148600",
"label": "Disease 1"
},
"genomicInterpretations": [
{
"subjectOrBiosampleId": "Subject_1",
"interpretationStatus": "CAUSATIVE",
"variantInterpretation": {
"variationDescriptor": {
"geneContext": {
"valueId": "HGNC:25662",
"symbol": "AAGAB"
}
}
}
}
]
}
}
],
"subject": {
"id": "Subject_1"
}
}
The processed JSON
will look like this:
{
"id": "Sample_1",
"interpretations": {
"OMIM:148600": {
"genomicInterpretations": {
"HGNC:25662": {
"interpretationStatus": "CAUSATIVE",
"variantInterpretation": {
"variationDescriptor": {
"geneContext": {
"symbol": "AAGAB",
"valueId": "HGNC:25662"
}
}
}
}
},
"progressStatus": "SOLVED"
}
},
"subject": {
"id": "Subject_1"
}
}
Now you can run Pheno-Ranker
as usual. The flattened keys will look like this:
"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.interpretationStatus.CAUSATIVE" : 1,
"interpretations.OMIM:148600.genomicInterpretations.HGNC:25662.variantInterpretation.variationDescriptor.geneContext.symbol.AAGAB" : 1,
"interpretations.OMIM:148600.progressStatus.SOLVED" : 1,
Other examples of PXF
nested array properties
Find below another examples of deeply nested properties.
"biosamples.diagnosticMarkers",
"biosamples.pathologicalTnmFinding",
"biosamples.phenotypicFeatures",
"diseases.clinicalTnmFinding",
"diseases.diseaseStage",
"measurements.complexValue.typedQuantities",
"medicalActions.treatment.doseIntervals"
If issues arise, we recommend addressing them by either:
- Filtering out problematic variables using the config file
- Preprocessing the
JSON
data, such as converting arrays into objects.
Any critical issues specific to BFF/PXF
will be addressed by our team as they arise.
Basic run:¶
The default output is named matrix.txt
and it's a N x N
bidimensional matrix with a pairwise comparison of all individuals.
Test dataset¶
We are going to use data from the phenopacket-store repository:
wget https://github.com/monarch-initiative/phenopacket-store/releases/latest/download/all_phenopackets.zip
unzip all_phenopackets.zip
Instead of using the > 5K examples, we will work with a subset of 50, consolidated in an array
:
And now we perform the calculation:
For more information visit the cohort mode page.
Basic run:¶
The output will be printed to STDOUT
and to a file named rank.txt
. The matching individuals will be sorted according to their Hamming distance to the reference patient. See aditional details in the Patient Mode page.
For more information visit the patient mode page.