Backend inference services for GENA-Web: a DNA language model–based platform for sequence annotation and interpretation.
This repository contains the task-specific backend services used to run GENA-Web models. In the accompanying paper, GENA-Web is presented as a web platform for promoter annotation, splice-site annotation, epigenetic/chromatin profiling, and enhancer activity scoring from raw DNA sequence. The deployment described in the paper combines a React/Redux/igv.js frontend with multiple Flask-based model containers; this repository contains the service-side model code, not the full web UI.
At the top level, the repository is organized around separate service directories under src/, with both GENA-LM and DNABERT variants for several tasks:
gena-promoters_2000— promoter annotation with GENA-LMgena-spliceai— splice donor / acceptor annotation with GENA-LMgena-deepsea— epigenetic / chromatin feature prediction with GENA-LMgena-deepstarr— enhancer activity scoring with GENA-LM
And similarly for DNABERT:
DNABERT-Promoters_2000,DNABERT-Promoters_originalDNABERT-SpliceAIDNABERT-DeepSeaDNABERT-DeepSTARR
The repository is best understood as a collection of independent inference backends, rather than a single polished Python package.
Each task directory contains its own Dockerfile. For example, to run the GENA-LM DeepSEA-like service:
docker build -t gena-deepsea ./src/gena-deepsea
docker run --rm -p 3000:3000 gena-deepseaThe included Dockerfile uses Python 3.10, installs the task-specific requirements.txt, copies the service directory into the image, and starts the Flask app with:
python server.pyYou can apply the same pattern to the other service directories.
From a chosen service directory, install the local requirements and start the server:
cd src/gena-deepsea
pip install -r requirements.txt
python server.pyThis assumes that the required model assets already exist under the service’s data/ directory, including:
data/checkpoints/data/configs/data/tokenizers/
Some services also vendor a local gena_lm/ package directly inside the task directory.
curl -X POST \
-F "[email protected]" \
http://localhost:3000/api/gena-deepsea/uploadThe exact outputs depend on the task, but the services generally return JSON with paths to generated files, for example:
{
"bed": [
"/generated/gena-deepsea/request_..._track1.bed",
"/generated/gena-deepsea/request_..._track2.bed"
],
"fasta_file": "/generated/gena-deepsea/request_... .fa",
"fai_file": "/generated/gena-deepsea/request_... .fa.fai",
"archive": "/generated/gena-deepsea/request_..._archive.zip"
}For enhancer scoring, the returned files are bedGraph tracks even though the response key is still named bed in the current implementation.
If you use this code or build on the GENA-Web system, please cite the GENA-Web paper:
Shmelev A, Petrov M, Penzar D, Akhmetyanov N, Tavritskiy M, Mamontov S, Kuratov Y, Burtsev M, Kardymon O, Fishman V. GENA-Web - GENomic Annotations Web Inference using DNA language models. bioRxiv, 2024. DOI: 10.1101/2024.04.26.591391
You may also want to cite the broader GENA-LM paper for the underlying DNA language model family.