hs37 (1000 Genomes Project version of GRCh37)¶

hg38 (GRCh38)¶

Data Download¶

We download the data using wget since we could not use tabix directly:

wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz

Now, we index the VCF with tabix:

tabix -p vcf ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz

Note: If your version of tabix accepts using ftp protocol:

#tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 22:10516173-11016173  | sed 's/^22    /chr22  /' | bgzip > test_1000G_hg38.vcf.gz

Data subset¶

Next, we need to convert the GRCh38 file to hg38. This involves adding the prefix 'chr' to '22' to obtain chr22.

tabix -h ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 22:10516173-11016173 | sed -e 's/##contig=<ID=22>/##contig=<ID=chr22>/' -e 's/^22\t/chr22\t/' | bgzip > test_1000G_hg38.vcf.gz

Run `bff-tools`¶

The simplest task is to convert a VCF file to the BFF format. The resulting files will be located in the beacon_*/vcf/ directory.

../bin/bff-tools vcf -i test_1000G_hg38.vcf.gz -p param_hg38.yaml
# Here we're using 'hg38' as the reference genome.

Alternative `bff-tools` modes¶

If your mongo container is set up and running, you can convert the VCF and load the data into MongoDB in a single step using the full mode:

../bin/bff-tools full -i test_1000G_hg38.vcf.gz -p param_hg38.yaml
# This runs both 'vcf' and 'load' steps together.

The result of the MongoDB import will be located in the beacon_*/mongodb/ directory.

Loading other Beacon v2 Model entities¶

To import other Beacon v2 Model entities into MongoDB (without converting VCFs), use the load mode with a YAML file:

../bin/bff-tools load -p load.yaml