Overview

Build Beacon v2-ready datasets from metadata and genomic files.

beacon2-cbi-tools helps validate Beacon metadata, convert VCF or SNP-array input into Beacon Friendly Format, and load the resulting collections into MongoDB.

Command recipes What should I run?Supported data

beacon2-cbi-tools helps you prepare data for Beacon v2 deployments based on the Beacon Friendly Format (BFF).

With this toolkit you can:

validate metadata from XLSX or JSON files against Beacon v2 schemas
convert VCF or SNP-array TSV input into BFF genomicVariations
load BFF collections into MongoDB
optionally inspect the resulting data with lightweight utilities

Research-use disclaimer

This toolkit is intended for research use. Do not use generated annotations or results for medical decisions.

Typical Workflow

Most users follow this sequence:

Prepare and validate metadata with bff-tools validate.
Convert genomic data with bff-tools vcf or bff-tools tsv.
Load the generated BFF collections into MongoDB with bff-tools load or bff-tools full.

InputXLSX or BFF metadataVCF or SNP-array TSV

Processvalidatevcf / tsv / load / full

OutputBFF JSON collectionsMongoDB and browser files

Recommended Path

If you are new to the toolkit, use this order:

Read the installation overview and pick Docker unless your environment requires Apptainer or a direct install.
Use What should I run? to choose the right command for your input.
Check Supported Inputs and Outputs to confirm your data fits a supported path.
Run the Quick Start with the bundled test data.
Use Command Recipes for copy-paste commands.
Read the data beaconization tutorial before adapting the workflow to your own data.
Check Validation and Reproducibility and Outputs when reviewing generated files and logs.
Keep the FAQ open while configuring reference genomes, annotation resources, and MongoDB loading.

What You Need Before Starting

Requirement	Why it matters
Metadata in XLSX or BFF JSON	Required for Beacon entities such as `individuals`, `biosamples`, `runs`, and `datasets`
VCF, VCF.gz, or SNP-array TSV input	Used to generate BFF `genomicVariations`
Reference genome choice	Must match your genomic input, for example `hg19`, `hg38`, `hs37`, or `b37`
External reference data	Required by the genomic conversion workflow
MongoDB	Required only when you want to load and query BFF collections