Distributed File Storage¶
Brief Primer into File vs Object Storage¶
To emphasize why object storage was chosen for the OmicsDM data warehouse solution.
-
File vs Object Storage adapted from scaleway.com/en/blog/understanding-the-different-types-of-storage -
Feature File Storage Object Storage Location Typically centralized Inherently distributed Organization Hierarchical (folders) Flat namespace (buckets/keys) Metadata Basic (File properties) Extensive and customizable Access Method File paths (e.g., NFS, SMB) HTTP/HTTPS-based APIs (e.g., REST, S3) Mutability Overwrite, appends Objects are replaced as a whole Use Case Real-time applications Large datasets Scalability Vertical (Upgrade existing hardware) Horizontal (Addition of nodes) Cost Scaling can be expensive Cost-efficient at scale
Object Storage with MinIO¶
The OmicsDM data warehouse solution employs MinIO, a S3-compatible distribute object storage system, to store all uploaded files, in a so-called S3 bucket. Below is a conceptual overview. For the sake of simplicity, the OmicsDM client is not shown in the diagram.
-
Step Description 1. Get Pre-Signed URL Users send a request to the application server to obtain a pre-signed URL for uploading or downloading files. 2. API Call to MinIO Server The application server requests the generation of a pre-signed URL for a specific file operation (upload or download) in the S3 bucket. 3. Pre-Signed URL Generation The MinIO server creates a pre-signed URL, which contains time-limited credentials and permissions to perform the requested file operation. This URL is sent back to the application server. 4. File Upload/Download The application server provides the pre-signed URL to the user. The user can then use this URL to directly upload files to or download files from the S3 bucket.
File Upload Driven by EvaporateJS¶
The file upload is driven by EvaporateJS a JavaScript library that allows for large file uploads (in chunks) directly from the browser to the S3 bucket.
Automatic File Versioning¶
The OmicsDM data warehouse solution comes with automatic file versioning:
A re-upload of a file with the same name will not overwrite the existing file on the S3 bucket but create a new version of it.
File Download leveraging presigned URLs¶
For each file selected to be downloaded, the user receives a unique download link, a so-called presigned URL.
File deletion¶
To prevent accidental deletion of files, the OmicsDM data warehouse solution has no method to delete any to the S3 bucket uploaded files. The only option the user gets is to mark a file as "deleted". This switches a boolean flag in the database, preventing
(a) The file from being shown in the files overview
(b) The generation of a pre-signed URL for the download of the respective file