Non-containerized installation¶
Downloading Required Databases and Software¶
First, we need to download the necessary databases and software.
Step 1: Download Required Files¶
Navigate to a directory with at least 150GB of available space and run:
wget https://raw.githubusercontent.com/CNAG-Biomedical-Informatics/beacon2-cbi-tools/main/scripts/01_download_external_data.py
Then execute the script:
Note: Google Drive can sometimes restrict downloads. If you encounter an error, use the provided error URL in a browser to retrieve the file manually.
Step 2: Verify Download Integrity¶
Run a checksum to ensure the files were not corrupted:
Step 3: Reassemble Split Files¶
The downloaded data is split into parts. Reassemble it into a single tar archive (~130GB required):
Once the files are successfully merged, delete the split parts to free up space:
Step 4: Extract Data¶
Extract the tar archive:
Make sure a tmp
directory exists in teh directory where you extracted your data:
Download from GitHub¶
First, we need to install a few system components:
sudo apt install gcc make libperl-dev libbz2-dev zlib1g-dev libncurses5-dev libncursesw5-dev liblzma-dev libcurl4-openssl-dev libssl-dev cpanminus python3-pip perl-doc default-jre
Let's install mongosh
(only if you plan to load data into MongoDB)
wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo gpg --dearmor -o /usr/share/keyrings/mongodb-server-6.0.gpg
echo "deb [signed-by=/usr/share/keyrings/mongodb-server-6.0.gpg arch=amd64,arm64] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list
sudo apt-get update
sudo apt-get install -y mongodb-mongosh
Use git clone
to get the latest (stable) version:
If you only new to update to the lastest version do:
bff-tools
is a Perl script (no compilation required) designed to run on the Linux command line. Internally, it acts as a wrapper that submits multiple pipelines through customizable Bash scripts (see an example here). While Perl and Bash are pre-installed on most Linux systems, a few additional dependencies must be installed separately.
We use cpanm
to install the CPAN modules. We'll install the dependencies at ~/perl5
:
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
cpanm --notest --installdeps .
To ensure Perl recognizes your local modules every time you start a new terminal, run:
We'll also need a few Python 3 modules:
Step 5: Configure Path in SnpEff¶
- Navigate to your downloaded data and locate the SnpEff configuration file. It is located at:
-
Open
snpEff.config
with a text editor and find the line containing thedata.dir
variable. -
Update the
data.dir
variable to reflect the correct path to your downloaded data directory. It should look similar to this:
Important: Ensure that you use an absolute path and verify that the directory exists to avoid any errors during subsequent analyses.
Step 6: Update paths in bin/config.yaml
¶
Make sure that base: /beacon2-cbi-tools-data
points to the directory where you downloaded the data (see above).
Replace mongosh: "/usr/bin/mongosh"
with your path.
System requirements¶
- OS/ARCH supported: linux/amd64 and linux/arm64.
- Ideally a Debian-based distribution (Ubuntu or Mint), but any other (e.g., CentOS, OpenSUSE) should do as well (untested).
- Perl 5 (>= 5.36 core; installed by default in many Linux distributions). Check the version with
perl -v
- 4GB of RAM (ideally 16GB).
- >= 1 core (ideally i7 or Xeon).
- At least 200GB HDD.
The Perl itself does not need a lot of RAM (max load will reach 400MB), but external tools do (e.g., process mongod
[MongoDB's daemon]).
Testing the deployment¶
You may wanna install jq
for running tests.
Common errors: Symptoms and treatment¶
-
Perl errors:
- Error: Unknown PerlIO layer "gzip" at (eval 10) line XXX
Solution:
cpanm PerlIO::gzip
... or ...
sudo apt install libperlio-gzip-perl
References¶
-
BCFtools Danecek P, Bonfield JK, et al. Twelve years of SAMtools and BCFtools. Gigascience (2021) 10(2):giab008 link
-
SnpEff "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3.", Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. Fly (Austin). 2012 Apr-Jun;6(2):80-92. PMID: 22728672.
-
SnpSift "Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift", Cingolani, P., et. al., Frontiers in Genetics, 3, 2012. PMID: 22435069.
- dbNSFP v4
- Liu X, Jian X, and Boerwinkle E. 2011. dbNSFP: a lightweight database of human non-synonymous SNPs and their functional predictions. Human Mutation. 32:894-899.
- Liu X, Jian X, and Boerwinkle E. 2013. dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Human Mutation. 34:E2393-E2402.
- Liu X, Wu C, Li C, and Boerwinkle E. 2016. dbNSFP v3.0: A One-Stop Database of Functional Predictions and Annotations for Human Non-synonymous and Splice Site SNVs. Human Mutation. 37:235-241.
- Liu X, Li C, Mou C, Dong Y, and Tu Y. 2020. dbNSFP v4: a comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Medicine. 12:103.