This is a project to showcase results obtained for the Longevity hackathon
A Streamlit-based web application for biomedical drug research and discovery, integrating multiple pharmaceutical and biomedical databases for comprehensive drug analysis and exploration.
- Multi-Database Integration: Combines data from DrKG, DrugBank, MeSH, SIDER, DOID, and HGNC
- Interactive Web Interface: Built with Streamlit for intuitive data exploration
- Containerized Deployment: Docker-based setup for consistent environments
- Knowledge Graph Analysis: Leverages drug repurposing knowledge graphs
- Disease Ontology Integration: Incorporates structured disease classifications
- Docker and Docker Compose
- Make (optional, for convenience commands)
- Minimum 4GB RAM recommended
- At least 2GB free disk space for data and containers
The application integrates the following biomedical databases:
data/
βββ doid/ # Disease Ontology
β βββ doid.obo # Disease classifications and relationships
βββ drkg/ # Drug Repurposing Knowledge Graph
β βββ drkg.tsv # Main knowledge graph data
β βββ graph.gml # Graph structure file
β βββ relation_glossary.tsv # Relationship definitions
β βββ relation_glossary.xlsx
βββ drugbank/ # DrugBank Database
β βββ drugbank_vocabulary.csv # Drug nomenclature and metadata
βββ hgnc/ # Human Gene Nomenclature Committee
β βββ HGNC_complete_set.tsv # Official gene symbols and names
βββ mesh/ # Medical Subject Headings
β βββ desc2025.xml # Medical terminology hierarchy
βββ sider/ # Side Effect Resource
β βββ meddra_all_indications.tsv # Drug side effects and indications
βββ drugbank_vocabulary.csv # Additional drug vocabulary
βββ entity_name_mapping.json # Entity name mappings across databases
- DOID: Disease Ontology - Standardized disease classifications
- DrKG: Drug Repurposing Knowledge Graph - Comprehensive biomedical knowledge graph
- DrugBank: DrugBank Database - Pharmaceutical knowledge base
- HGNC: HUGO Gene Nomenclature Committee - Official gene symbols
- MeSH: Medical Subject Headings - NLM's controlled vocabulary
- SIDER: Side Effect Resource - Drug side effects database
Place the following files in the access/ directory to enable Google Drive access:
access/service_account.json.
Instructions on creating a Google Drive client and connecting it to the code can be found here.
# Clone the repository
git clone <repository-url>
cd <project-directory>
# Build and start all services
makeDon't forget to place Google Drive client JSON file.
# Set environment variables
export UID=$(id -u)
export GID=$(id -g)
# Build and start
./run.shOnce the containers are running, access the Streamlit application at:
http://localhost:8501
make help # Show all available commands
make build # Build all containers
make up # Start containers in detached mode
make down # Stop all containers
make logs # View container logs
make ps # Show container status
make restart # Restart services
make clean # Clean volumes (β οΈ removes data)# Access container shell
make shell SERVICE=streamlit_app
# View logs in real-time
make logs
# Restart specific service
make restart SERVICE=streamlit_app- Streamlit App: Port 8501 (configurable in docker-compose.yml)
./streamlit_appβ/app/streamlit_app(Application code)./dataβ/app/data(Database files)./accessβ/app/access(Access control files)
.
βββ docker-compose.yml # Docker services configuration
βββ Dockerfile # Container build instructions
βββ Makefile # Development convenience commands
βββ requirements.txt # Python dependencies
βββ project_config.py # Project configuration
βββ streamlit_app/ # Streamlit application code
β βββ root_page.py # Main application entry point
βββ app/ # Core application modules
βββ research_scripts/ # Research and analysis scripts
βββ data/ # Database files (see structure above)
βββ access/ # Access control and authentication
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Make your changes
- Test with:
make build && make up - Submit a pull request
This project is licensed under the MIT License.
Maintainers: [email protected]