Skip to main content

Patient Mode

Patient mode ranks records in a reference cohort against a target patient or object. It uses the same flattened variables and binary-vector representation as cohort mode, but the output is a ranked table instead of an all-vs-all matrix.

Use patient mode when you want to find the closest matches to a patient profile, inspect which variables overlap, or assess match significance with Z-scores and p-values.

ComparesTarget against reference records
Basic commandpheno-ranker -r cohort.json -t patient.json
Main outputrank.txt
Best forClosest matches and alignment review

When to Use It

Match

Find similar records

Rank every reference record against one target patient or object.

Compare

Multiple cohorts

Use several reference files and keep each match traceable to its source cohort.

Interpret

Read match statistics

Use Hamming distance, Jaccard similarity, Z-scores, p-values, and overlap percentages.

Audit

Inspect alignments

Use --align to see which variables match or differ between target and reference.

What You Get

  • rank.txt: ranked matches between the target and the reference cohort.
  • alignment*: optional variable-level alignment files when --align is used.
  • export.*.json: optional intermediate hashes, vectors, and coverage statistics when --export is used.
  • Hamming distance, Jaccard similarity, Z-scores, p-values, and overlap statistics for each match.
Patient mode vs cohort mode

Use patient mode when one target should be ranked against a reference cohort. Use cohort mode when every record should be compared with every other record.

See common usage Compare cohorts Check installation

Usage

The examples below show the common patient-mode command-line patterns. For the complete CLI reference, see Usage.

Example:

pheno-ranker -r individuals.json -t patient.json
How do I extract one or many patients from a cohort file?
pheno-ranker -r t/data/individuals.json --patients-of-interest 107:week_0_arm_1 125:week_0_arm_1

This command will carry out a dry-run, creating 107:week_0_arm_1.json and 125:week_0_arm_1.json files. On Windows, characters that are invalid in filenames are percent-encoded, so 107:week_0_arm_1 is written as 107%3Aweek_0_arm_1.json. In the example above, I renamed 107:week_0_arm_1.json to patient.json by typing this:

mv 107:week_0_arm_1.json patient.json

This will create the output text file rank.txt.

The first rows in rank.txt are the best matches according to the selected sorting metric. By default, patient mode sorts by Hamming distance; use --sort-by jaccard to sort by Jaccard similarity instead.

How to read rank.txt

For most analyses, start with these columns:

  • RANK: Match order; 1 is the best match under the selected sorting metric.
  • REFERENCE(ID): The matched individual in the reference cohort.
  • HAMMING-DISTANCE: Lower values indicate more similar binary profiles.
  • JACCARD-INDEX: Higher values indicate more similar binary profiles.
  • DISTANCE-P-VALUE / JACCARD-P-VALUE: Significance of the match within the distribution of comparisons in the run.
  • INTERSECT-RATE(%): How much of the target profile is covered by the reference match.
  • COMPLETENESS(%): How much of the reference profile is covered by the target.

Use Hamming distance when you want a distance-like ranking. Use Jaccard similarity when sparse overlap or missingness is important.

Full rank.txt column reference

Identifiers and run metadata

  • RANK: Match order. A rank of 1 is the best match.
  • REFERENCE(ID): The unique identifier (primary_key) for the reference individual.
  • TARGET(ID): The unique identifier (primary_key) for the target individual passed with --target.
  • FORMAT: Input format used by the configuration, such as BFF, PXF, or CSV.
  • WEIGHTED: Whether the calculation used variable weights with --weights.

Alignment size

  • LENGTH: Count of variables that have a 1 in either the reference or the target. In other words, this is the size of the comparison space for that pair.
LENGTH example
REF: 0001001
TAR: 1000001

In this case, LENGTH is 3 because three positions have a 1 in at least one vector.

Similarity and distance metrics

  • HAMMING-DISTANCE: Count of positions where the reference and target binary vectors differ. Lower values indicate more similar profiles.
  • JACCARD-INDEX: Similarity between the reference and target vectors, calculated as the intersection divided by the union. Higher values indicate more similar profiles.
Metric definitions

Hamming distance counts mismatches between two binary strings of equal length.

Jaccard similarity focuses on shared 1 values:

Jaccard=IntersectionUnion\text{Jaccard} = \frac{\text{Intersection}}{\text{Union}}

Significance statistics

  • DISTANCE-Z-SCORE: Empirical Z-score for the observed Hamming distance compared with all target-reference comparisons in the run.
  • DISTANCE-P-VALUE: Statistical significance associated with DISTANCE-Z-SCORE.
  • DISTANCE-Z-SCORE(RAND): Estimated Z-score for two random binary vectors, assuming the alignment size is equal to LENGTH.
  • JACCARD-Z-SCORE: Empirical Z-score for the observed Jaccard index compared with all target-reference comparisons in the run.
  • JACCARD-P-VALUE: Statistical significance associated with JACCARD-Z-SCORE.
DISTANCE-Z-SCORE(RAND) calculation

This value comes from the estimated mean and standard deviation of the Hamming distance for binary strings. It assumes that each position has a 50% chance of being a mismatch, independently of other positions.

The expected mean is:

Estimated Average=Length×Probability of Mismatch\text{Estimated Average} = \text{Length} \times \text{Probability of Mismatch}

where the probability of mismatch is set to 0.5.

The standard deviation is:

Estimated Standard Deviation=Length×Probability of Mismatch×(1Probability of Mismatch)\text{Estimated Standard Deviation} = \sqrt{\text{Length} \times \text{Probability of Mismatch} \times (1 - \text{Probability of Mismatch})}

Finally, the formula for the Z-score is:

Z=(Xμ)σZ = \frac{(X - \mu)}{\sigma}

where ( X ) is the observed value, ( \mu ) is the estimated mean, and ( \sigma ) is the estimated standard deviation.

Variable overlap

  • REFERENCE-VARS: Total number of variables present in the reference.
  • TARGET-VARS: Total number of variables present in the target.
  • INTERSECT: Number of variables shared by the reference and target.
  • INTERSECT-RATE(%): Percentage of target variables also present in the reference.
  • COMPLETENESS(%): Percentage of reference variables also present in the target.
INTERSECT-RATE(%) calculation

INTERSECT-RATE(%) measures how much of the target profile is covered by the reference:

INTERSECT-RATE(%)=Intersection CountNumber of Variables in Target×100\text{INTERSECT-RATE(\%)} = \frac{\text{Intersection Count}}{\text{Number of Variables in Target}} \times 100
COMPLETENESS(%) calculation

COMPLETENESS(%) measures how much of the reference profile is covered by the target:

COMPLETENESS(%)=Intersection CountNumber of Variables in Reference×100\text{COMPLETENESS(\%)} = \frac{\text{Intersection Count}}{\text{Number of Variables in Reference}} \times 100
See results from rank.txt
RANKREFERENCE(ID)TARGET(ID)FORMATLENGTHWEIGHTEDHAMMING-DISTANCEDISTANCE-Z-SCOREDISTANCE-P-VALUEDISTANCE-Z-SCORE(RAND)JACCARD-INDEXJACCARD-Z-SCOREJACCARD-P-VALUEREFERENCE-VARSTARGET-VARSINTERSECTINTERSECT-RATE(%)COMPLETENESS(%)
1107:week_0_arm_1107:week_0_arm_1BFF77False0-2.4190.0077787-8.77501.0002.9490.0256500777777100.00100.00
2125:week_0_arm_1107:week_0_arm_1BFF79False6-1.9240.0271576-7.53810.9242.2690.102269375777394.8197.33
3275:week_0_arm_1107:week_0_arm_1BFF86False14-1.2650.1030165-6.25430.8371.4910.311734881777293.5188.89
4215:week_0_arm_1107:week_0_arm_1BFF88False16-1.1000.1357515-5.96960.8181.3210.374286883777293.5186.75
5305:week_0_arm_1107:week_0_arm_1BFF89False18-0.9350.1749800-5.61800.7981.1380.445298083777192.2185.54
6365:week_0_arm_1107:week_0_arm_1BFF87False20-0.7700.2207314-5.03890.7700.8900.543789977776787.0187.01
7125:week_14_arm_1107:week_0_arm_1BFF78False23-0.5220.3007259-3.62330.7050.3080.755542356775571.4398.21
8527:week_14_arm_1107:week_0_arm_1BFF78False23-0.5220.3007259-3.62330.7050.3080.755542356775571.4398.21
9107:week_14_arm_1107:week_0_arm_1BFF78False23-0.5220.3007259-3.62330.7050.3080.755542356775571.4398.21
10125:week_2_arm_1107:week_0_arm_1BFF78False23-0.5220.3007259-3.62330.7050.3080.755542356775571.4398.21
11107:week_2_arm_1107:week_0_arm_1BFF78False24-0.4400.3300253-3.39680.6920.1930.790126755775470.1398.18
12125:week_26_arm_1107:week_0_arm_1BFF78False24-0.4400.3300253-3.39680.6920.1930.790126755775470.1398.18
13527:week_2_arm_1107:week_0_arm_1BFF78False24-0.4400.3300253-3.39680.6920.1930.790126755775470.1398.18
14527:week_0_arm_1107:week_0_arm_1BFF98False24-0.4400.3300253-5.05080.7550.7560.596558195777496.1077.89
15365:week_2_arm_1107:week_0_arm_1BFF79False25-0.3570.3604065-3.26280.6840.1150.812015956775470.1396.43
16275:week_2_arm_1107:week_0_arm_1BFF79False25-0.3570.3604065-3.26280.6840.1150.812015956775470.1396.43
17305:week_26_arm_1107:week_0_arm_1BFF79False26-0.2750.3916958-3.03770.6710.0010.841035355775368.8396.36
18365:week_14_arm_1107:week_0_arm_1BFF80False26-0.2750.3916958-3.13050.6750.0380.831944057775470.1394.74
19215:week_2_arm_1107:week_0_arm_1BFF78False27-0.1920.4237022-2.71750.654-0.1510.875203552775166.2398.08
20215:week_26_arm_1107:week_0_arm_1BFF78False27-0.1920.4237022-2.71750.654-0.1510.875203552775166.2398.08
21257:week_0_arm_1107:week_0_arm_1BFF102False29-0.0270.4890344-4.35660.7160.4030.724904098777394.8174.49
22215:week_14_arm_1107:week_0_arm_1BFF84False29-0.0270.4890344-2.83680.655-0.1430.873509162775571.4388.71
23275:week_14_arm_1107:week_0_arm_1BFF80False300.0550.5219230-2.23610.625-0.4100.920685453775064.9494.34
24365:week_26_arm_1107:week_0_arm_1BFF83False300.0550.5219230-2.52460.639-0.2880.901179159775368.8389.83
25215:week_78_arm_1107:week_0_arm_1BFF86False320.2200.5870339-2.37230.628-0.3840.916768863775470.1385.71
26527:week_26_arm_1107:week_0_arm_1BFF86False320.2200.5870339-2.37230.628-0.3840.916768863775470.1385.71
27125:week_78_arm_1107:week_0_arm_1BFF94False400.8800.8104854-1.44400.574-0.8620.968718371775470.1376.06
28527:week_52_arm_1107:week_0_arm_1BFF98False431.1270.8701495-1.21220.561-0.9810.976198676775571.4372.37
29125:week_52_arm_1107:week_0_arm_1BFF98False431.1270.8701495-1.21220.561-0.9810.976198676775571.4372.37
30365:week_52_arm_1107:week_0_arm_1BFF99False451.2920.9018282-0.90450.545-1.1220.983087076775470.1371.05
31257:week_14_arm_1107:week_0_arm_1BFF99False451.2920.9018282-0.90450.545-1.1220.983087076775470.1371.05
32305:week_52_arm_1107:week_0_arm_1BFF99False451.2920.9018282-0.90450.545-1.1220.983087076775470.1371.05
33257:week_2_arm_1107:week_0_arm_1BFF99False451.2920.9018282-0.90450.545-1.1220.983087076775470.1371.05
34215:week_52_arm_1107:week_0_arm_1BFF104False491.6220.9475899-0.58830.529-1.2710.988423282775571.4367.07
35257:week_26_arm_1107:week_0_arm_1BFF103False501.7040.9558461-0.29560.515-1.3990.991775979775368.8367.09
36275:week_52_arm_1107:week_0_arm_1BFF105False511.7870.9630202-0.29280.514-1.4010.991831582775470.1365.85
Obtaining additional information on the alignments

You can create several files related to the reference-target alignment by adding --align. By default, this creates alignment* files in the current directory, but you can specify a </path/basename>. Example:

pheno-ranker -r individuals.json individuals.json -t patient.json --align

Or using a path + basename:

pheno-ranker -r individuals.json individuals.json -t patient.json --align /my/fav/dir/jobid-001-align

Find below an extract of the alignment (C1_107:week_0_arm_1 --- 107:week_0_arm_1) extracted from alignment.txt:

REF -- TAR
1 ----- 1 | (w: 1|d: 0|cd: 0|) diseases.NCIT:C3138.diseaseCode.id.NCIT:C3138 (Inflammatory Bowel Disease)
1 ----- 1 | (w: 1|d: 0|cd: 0|) ethnicity.id.NCIT:C41261 (Caucasian)
1 ----- 1 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C154329.exposureCode.id.NCIT:C154329 (Smoking)
1 ----- 1 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C154329.unit.id.NCIT:C65108 (Never Smoker)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C154329.unit.id.NCIT:C67147 (Current Smoker)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C154329.unit.id.NCIT:C67148 (Former Smoker)
1 ----- 1 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C2190.exposureCode.id.NCIT:C2190 (Alcohol)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C2190.unit.id.NCIT:C126379 (Non-Drinker)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C2190.unit.id.NCIT:C156821 (Alcohol Consumption More than 2 Drinks per Day for Men and More than 1 Drink per Day for Women)
1 ----- 1 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C2190.unit.id.NCIT:C17998 (Unknown)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C73993.exposureCode.id.NCIT:C73993 (Pack Year)
0 0 | (w: 1|d: 0|cd: 0|) exposures.NCIT:C73993.unit.id.NCIT:C73993 (Pack Year)
1 xxx-- 0 | (w: 1|d: 1|cd: 1|) id.C1_107:week_0_arm_1 (id.C1_107:week_0_arm_1)
0 0 | (w: 1|d: 0|cd: 1|) id.C1_107:week_14_arm_1 (id.C1_107:week_14_arm_1)
0 0 | (w: 1|d: 0|cd: 1|) id.C1_107:week_2_arm_1 (id.C1_107:week_2_arm_1)
0 0 | (w: 1|d: 0|cd: 1|) id.C1_125:week_0_arm_1 (id.C1_125:week_0_arm_1)
0 0 | (w: 1|d: 0|cd: 1|) id.C1_125:week_14_arm_1 (id.C1_125:week_14_arm_1)
...