Proposal: Implementing Pheno-Ranker in a Federated Network¶
In this proposal, we aim to explore the potential application of Pheno-Ranker within two distinct contexts: the Inter-Hospital Network and the Beacon v2 Network.
The current version of Pheno-Ranker is designed for file-based operations and initiates calculations from scratch each time. To adapt the algorithm for use in multiple hospitals without directly sharing clinical data, we propose the following approach:
1. Preparation Stage:¶
Vector Standardization: Ensure all hospitals use a standardized vector format.
- Store each patient’s vector in a local-database.
- “id_1": "1101010101010...n",
- "id_2": "0101010101000...n
- Utilize a network aggregator to regularly update a global reference vector. Each update gets a new version identifier.
- Periodically update the vector database at each site to ensure current data.
Privacy Protocols: Set up differential privacy mechanisms or encryption protocols.
Threshold Agreement: Establish a common threshold for the Hamming distance (or other metric) for matches.
2. Query Initiation:¶
The querying hospital prepares a vector representation of the individual or set of individuals. The vector is processed using the agreed-upon privacy protocols.
3. Aggregator Mediation:¶
The querying hospital sends the processed vector to the network aggregator. The network aggregator distributes the query to all hospitals in the federated network.
4. Local Computation:¶
Each receiving hospital computes the Hamming distance against its local patient vectors. The computation is done entirely within the local environment of each hospital.
5. Thresholding:¶
Each hospital applies the agreed-upon thresholding to identify vectors that are considered a "match."
6. Response to Aggregator:¶
Each hospital sends its response (list of matching vectors, counts, etc.) back to the network aggregator.
7. Aggregation:¶
The network aggregator collects all the responses, processes them, and sends the aggregated result to the querying hospital.
8. Post-Processing:¶
The querying hospital undertakes further analysis, potentially reaching out to specific hospitals for more information based on the aggregated results, and decides on subsequent actions.
To facilitate Pheno-Ranker's integration into the Beacon v2 API ecosystem, we propose the addition of an optional term/propery to the JSON Schema of the 'individuals' entry type in Beacon v2 Models. We propose two distinct pathways for query submission to enhance flexibility and security:
-
The first mirrors the method used within hospital networks, where queries utilize a precomputed vector. This approach ensures secure and fast similarity evaluations against an existing database. To facilitate this, a Beacon aggregator would periodically aggregate ontology terms via the filtering_terms endpoint from each Beacon v2 API, creating a global lookup table.
-
Alternatively, centers may submit queries using actual JSON data (either
BFF
orPXF
objects), which should be anonymized or meet the network's security standards. This option allows the recipient site to perform similarity analyses either on their precomputed data or on-the-fly using Pheno-Ranker's CLI or module, offering greater adaptability.
The response schema can either adhere to the Beacon v2 specification standards or be adapted to include similarity metrics, enhancing the utility and adaptability of the integration.
Draft Proposal for the JSON Schema of the phenoRanker
property
Note: In YAML, subject to future modifications:
$schema: "https://json-schema.org/draft/2020-12/schema"
type: object
properties:
version:
type: string
description: "The global version of the schema."
example: "0.0.1"
enum: ["0.0.1"] # Ensures the version is fixed to "0.0.1"
phenoRanker:
type: array
description: "Array of objects representing the phenoRanker data. All objects must be of the same type: 'vector', 'bff', or 'pxf'."
items:
type: object
properties:
info:
type: string
description: "Additional information about the phenoRanker object. This field is optional."
example: "This is a sample description for the phenoRanker object."
oneOf:
- type: object
properties:
vector:
type: string
pattern: "^[01]+$"
description: "A binary string composed of ones and zeros representing a specific vector."
example: "1010101"
version:
type: string
description: "The version of the global lookup table for the phenoRanker object."
example: "1.0.0"
required:
- vector
- version
description: "Object representing the 'vector' data and associated version."
- type: object
properties:
bff:
type: object
description: "Object representing the BFF data."
version:
type: string
description: "The version of the BFF object."
example: "2.0"
enum: ["2.0"]
required:
- bff
- version
description: "Object representing the 'bff' data."
- type: object
properties:
pxf:
type: object
description: "Object representing the PXF data."
version:
type: string
description: "The version of the PXF object."
example: "2.0"
enum: ["2.0"]
required:
- pxf
- version
description: "Object representing the 'pxf' data."
description: "Object structure for each item in the phenoRanker array must contain exactly one of 'vector', 'bff', or 'pxf'."
allOf:
- if:
contains:
required: ["vector"]
then:
items:
required: ["vector"]
- if:
contains:
required: ["bff"]
then:
items:
required: ["bff"]
- if:
contains:
required: ["pxf"]
then:
items:
required: ["pxf"]
required:
- version
- phenoRanker
description: "This is a proposal for the schema for the phenoRanker property. Each object in the 'phenoRanker' array must contain exactly one of the specified properties: 'vector', 'bff', or 'pxf', and all items in the array must be of the same type."