Containerized Installation¶
Downloading Required Databases and Software¶
First, we need to download the necessary databases and software. In contrast to beacon2-ri-tools
, where the data was bundled inside the container to provide a zero-configuration experience for users, we now store the data externally. This change improves data persistence and allows software updates without requiring a full re-download of all data.
Step 1: Download Required Files¶
Navigate to a directory with at least 150GB of available space and run:
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/beacon2-cbi-tools/main/scripts/01_download_external_data.py
Then execute the script:
Note: Google Drive can sometimes restrict downloads. If you encounter an error, use the provided error URL in a browser to retrieve the file manually.
Step 2: Verify Download Integrity¶
Run a checksum to ensure the files were not corrupted:
Step 3: Reassemble Split Files¶
The downloaded data is split into parts. Reassemble it into a single tar archive (~130GB required):
Once the files are successfully merged, delete the split parts to free up space:
Step 4: Extract Data¶
Extract the tar archive:
Make sure a tmp
directory exists in teh directory where you extracted your data:
Step 5: Configure Path in SnpEff¶
- Navigate to your downloaded data and locate the SnpEff configuration file. It is located at:
-
Open
snpEff.config
with a text editor and find the line containing thedata.dir
variable. -
Update the
data.dir
variable to reflect the correct path to your downloaded data directory. It should look similar to this:
Important: Ensure that you use an absolute path and verify that the directory exists to avoid any errors during subsequent analyses.
Installation Options¶
Method 1: Installing from Docker Hub¶
Pull the latest Docker image from Docker Hub:
docker pull manuelrueda/beacon2-cbi-tools:latest
docker image tag manuelrueda/beacon2-cbi-tools:latest cnag/beacon2-cbi-tools:latest
Method 2: Installing from Dockerfile¶
Download the Dockerfile
from GitHub:
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/beacon2-cbi-tools/main/docker/Dockerfile
Then build the container:
- For Docker version 19.03 and above (supports buildx):
- For Docker versions older than 19.03 (no buildx support):
Method 3: Full Stack with Docker Compose¶
We now provide an extended Docker Compose file (docker-compose.all.yml
) to launch beacon2-cbi-tools, MongoDB, and Mongo Express together in one command. This is recommended if you're deploying the full data-loading and querying stack.
- Download
docker-compose.yml
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/beacon2-cbi-tools/main/docker/docker-compose.all.yml
- Configure the Data Directory
Ensure you have a directory containing the required data for beacon2-cbi-tools. You can set this directory in the compose file using an environment variable or by editing the volume mapping directly. For example, the volume is defined as:
You can set the BEACON2_DATA_DIR
variable in a .env
file or in your shell, or replace the default path with the actual absolute path.
- Deploy the Complete Stack
Run the following command from your project directory:
This command will pull the required images from Docker Hub (if not available locally) and start containers for MongoDB, Mongo Express, and beacon2-cbi-tools, all connected on the same network.
- Verify and Interact
Check that all containers are running with:
You can then connect to the beacon2-cbi-tools container or interact with the services as needed.
Running and Interacting with the Container¶
๐น If You Used Method 1 or 2 (Docker Hub or Dockerfile)¶
# Please update '/absolute/path/to/beacon2-cbi-tools-data' with your actual local data path
docker run -tid --volume /absolute/path/to/beacon2-cbi-tools-data:/beacon2-cbi-tools-data --name beacon2-cbi-tools cnag/beacon2-cbi-tools:latest
To connect to the container:
Or, to run tools directly from the host:
alias bff-tools='docker exec -ti beacon2-cbi-tools /usr/share/beacon2-cbi-tools/bin/bff-tools'
bff-tools
Example run:
bff-tools vcf -i /beacon2-cbi-tools-data/chr22.Test.1000G.phase3.joint.vcf.gz \
-p /beacon2-cbi-tools-data/param.yaml \
--projectdir-override /beacon2-cbi-tools-data/my_test_dir
Note: You can also set the path for the projectdir via parameters file.
๐น If You Used Method 3 (Docker Compose)¶
Your container should already be running if you used:
To connect:
โ Test the Deployment¶
MongoDB: Manual Setup (Optional)¶
โ ๏ธ This section is only needed if you're not using
docker-compose.all.yml
, or want to run MongoDB manually.
Step 1: Download docker-compose.yml
¶
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/beacon2-cbi-tools/main/docker/docker-compose.yml
Step 2: Start MongoDB¶
Mongo Express will be accessible at http://localhost:8081
with default credentials admin
and pass
.
IMPORTANT: If you plan to load data into MongoDB from inside the beacon2-cbi-tools
container, read the section below.
Access MongoDB from Inside the Container¶
Option A: Before running the container¶
docker run -tid --network=my-app-network --volume /media/mrueda/4TBB/beacon2-cbi-tools-data:/beacon2-cbi-tools-data --name beacon2-cbi-tools cnag/beacon2-cbi-tools:latest
Option B: After running the container¶
System requirements¶
- OS/ARCH supported: linux/amd64 and linux/arm64.
- Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
- Docker and docker compose
- Perl 5 (>= 5.10 core; installed by default in most Linux distributions). Check the version with perl -v
- 4GB of RAM (ideally 16GB).
- >= 1 core (ideally i7 or Xeon).
- At least 200GB HDD.
Perl itself does not require much RAM (max load ~400MB), but external tools (e.g., mongod
[MongoDB's daemon]) do.
Common errors: Symptoms and treatment¶
- Dockerfile:
* DNS errors - Error: Temporary failure resolving 'foo' Solution: https://askubuntu.com/questions/91543/apt-get-update-fails-to-fetch-files-temporary-failure-resolving-error
References¶
-
BCFtools Danecek P, Bonfield JK, et al. Twelve years of SAMtools and BCFtools. Gigascience (2021) 10(2):giab008 link
-
SnpEff "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.", Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID: 22728672.
-
SnpSift "Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift", Cingolani, P., et. al., Frontiers in Genetics, 3, 2012. PMID: 22435069.
- dbNSFP v4
- Liu X, Jian X, and Boerwinkle E. 2011. dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions. Human Mutation. 32:894-899.
- Liu X, Jian X, and Boerwinkle E. 2013. dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Human Mutation. 34:E2393-E2402.
- Liu X, Wu C, Li C, and Boerwinkle E. 2016. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Non-synonymous and Splice Site SNVs. Human Mutation. 37:235-241.
- Liu X, Li C, Mou C, Dong Y, and Tu Y. 2020. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Medicine. 12:103.