Overview
Overview
Build Beacon v2-ready datasets from metadata and genomic files.
beacon2-cbi-tools helps validate Beacon metadata, convert VCF or SNP-array input into Beacon Friendly Format, and load the resulting collections into MongoDB.
beacon2-cbi-tools helps you prepare data for Beacon v2 deployments based on the Beacon Friendly Format (BFF).
With this toolkit you can:
- validate metadata from XLSX or JSON files against Beacon v2 schemas
- convert VCF or SNP-array TSV input into BFF
genomicVariations - load BFF collections into MongoDB
- optionally inspect the resulting data with lightweight utilities
This toolkit is intended for research use. Do not use generated annotations or results for medical decisions.
Typical Workflow
Most users follow this sequence:
- Prepare and validate metadata with
bff-tools validate. - Convert genomic data with
bff-tools vcforbff-tools tsv. - Load the generated BFF collections into MongoDB with
bff-tools loadorbff-tools full.
Recommended Path
If you are new to the toolkit, use this order:
- Read the installation overview and pick Docker unless your environment requires Apptainer or a direct install.
- Use What should I run? to choose the right command for your input.
- Check Supported Inputs and Outputs to confirm your data fits a supported path.
- Run the Quick Start with the bundled test data.
- Use Command Recipes for copy-paste commands.
- Read the data beaconization tutorial before adapting the workflow to your own data.
- Check Validation and Reproducibility and Outputs when reviewing generated files and logs.
- Keep the FAQ open while configuring reference genomes, annotation resources, and MongoDB loading.
What You Need Before Starting
| Requirement | Why it matters |
|---|---|
| Metadata in XLSX or BFF JSON | Required for Beacon entities such as individuals, biosamples, runs, and datasets |
| VCF, VCF.gz, or SNP-array TSV input | Used to generate BFF genomicVariations |
| Reference genome choice | Must match your genomic input, for example hg19, hg38, hs37, or b37 |
| External reference data | Required by the genomic conversion workflow |
| MongoDB | Required only when you want to load and query BFF collections |
Choose Your Path
Install the toolkit
Choose Docker, Apptainer, or a non-containerized setup for your workstation, server, or HPC environment.
RunCopy a command
Use short recipes for validation, VCF conversion, SNP-array input, MongoDB loading, and inspection.
WorkflowPrepare real data
Follow the end-to-end data beaconization tutorial before adapting the workflow to your own cohort.
ReviewCheck reproducibility
Understand what validation checks, what it cannot prove, and what to keep when sharing a run.
ExamplesStart from test data
Use the GRCh38 / hg38 example and bundled datasets to confirm that your runtime works.
HelpDebug a run
Find the right log file and match common symptoms around reference data, validation, and MongoDB loading.
Main Commands
The main entry point is bff-tools.
bff-tools validate: validate metadata and write BFF JSON collectionsbff-tools vcf: convert a VCF or VCF.gz file into BFFbff-tools tsv: convert a SNP-array TSV file into BFFbff-tools load: load BFF collections into MongoDBbff-tools full: run conversion plus loading in one step
Utilities
The toolkit also includes optional utilities for browsing or queueing jobs:
bff-browser: browse static BFF files without a databasebff-portal: query BFF data stored in MongoDBbff-queue: run and monitor many ingestion jobs on a workstation