hs37 (1000 Genomes Project version of GRCh37)¶
See the test directory.
hg38 (GRCh38)¶
Data Download¶
We download the data using wget
since we could not use tabix
directly:
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz
Now, we index the VCF with tabix
:
Note: If your version of tabix
accepts using ftp
protocol:
#tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 22:10516173-11016173 | sed 's/^22 /chr22 /' | bgzip > test_1000G_hg38.vcf.gz
Data subset¶
Next, we need to convert the GRCh38 file to hg38. This involves adding the prefix 'chr' to '22' to obtain chr22
.
tabix -h ALL.chr22.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf.gz 22:10516173-11016173 | sed -e 's/##contig=<ID=22>/##contig=<ID=chr22>/' -e 's/^22\t/chr22\t/' | bgzip > test_1000G_hg38.vcf.gz
Run bff-tools
¶
The simplest task is to convert a VCF
file to the BFF
format. The resulting files will be located in the beacon_*/vcf/
directory.
../bin/bff-tools vcf -i test_1000G_hg38.vcf.gz -p param_hg38.yaml
# Here we're using 'hg38' as the reference genome.
Alternative bff-tools
modes¶
If your mongo
container is set up and running, you can convert the VCF
and load the data into MongoDB in a single step using the full
mode:
../bin/bff-tools full -i test_1000G_hg38.vcf.gz -p param_hg38.yaml
# This runs both 'vcf' and 'load' steps together.
The result of the MongoDB import will be located in the beacon_*/mongodb/
directory.
Loading other Beacon v2 Model entities¶
To import other Beacon v2 Model entities into MongoDB (without converting VCFs), use the load
mode with a YAML file: