Open Cure Discovery

Accelerating drug discovery through open-source, GPU-powered virtual screening

Created by Alfredo Baratta - Democratizing drug discovery for everyone.

The Vision

Every year, millions of people suffer from diseases that lack effective treatments. Traditional drug discovery is a lengthy and expensive process:

10-15 years from initial discovery to market
$2-3 billion average cost per approved drug
90% failure rate in clinical trials
Limited accessibility to expensive computational resources

Open Cure Discovery aims to revolutionize this process by putting the power of computational drug screening in everyone's hands. With just a consumer-grade GPU (GTX 1060 6GB or better), anyone can contribute to finding new treatments for cancer, Alzheimer's, infectious diseases, and more.

Why This Matters

Pharmaceutical companies focus on profitable diseases, leaving many rare conditions ("orphan diseases") without research investment. By democratizing drug discovery:

Researchers worldwide can screen millions of compounds without expensive infrastructure
Patient advocacy groups can drive research for neglected diseases
Academic institutions can participate in cutting-edge drug discovery
Citizen scientists can contribute computational power to find cures

How It Works

Open Cure Discovery uses a multi-stage pipeline that mimics the early phases of pharmaceutical drug discovery:

┌─────────────────────────────────────────────────────────────────────────────┐
│                         VIRTUAL SCREENING PIPELINE                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌───────────┐ │
│  │  MOLECULES   │    │   FILTERS    │    │   DOCKING    │    │  SCORING  │ │
│  │  ──────────  │ -> │  ──────────  │ -> │  ──────────  │ -> │ ───────── │ │
│  │ SMILES input │    │ PAINS/Lipinski│   │ AutoDock-GPU │    │ Composite │ │
│  │ ChEMBL/ZINC  │    │ Toxicophores │    │ Binding poses│    │ ML+ADMET  │ │
│  └──────────────┘    └──────────────┘    └──────────────┘    └───────────┘ │
│                                                                             │
│                                    ↓                                        │
│                                                                             │
│                        ┌────────────────────────┐                           │
│                        │   RANKED CANDIDATES    │                           │
│                        │   ──────────────────   │                           │
│                        │  CSV, SMILES, JSON     │                           │
│                        │  Top-N with scores     │                           │
│                        │  Ready for validation  │                           │
│                        └────────────────────────┘                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Stage 1: Molecule Input

Load molecules from public databases (ChEMBL, ZINC, PDB)
Support for custom molecule libraries in SMILES format
Batch processing for millions of compounds

Stage 2: Pre-filtering

PAINS filters: Remove Pan-Assay Interference Compounds (false positives)
Lipinski's Rule of Five: Ensure drug-likeness properties
Toxicophore detection: Flag potentially toxic substructures
Eliminates 40-60% of unsuitable candidates early, saving computational time

Stage 3: Molecular Docking

GPU-accelerated docking using AutoDock-GPU
Simulates how molecules bind to target proteins
Calculates binding affinity (how strongly a molecule binds)
CPU fallback with AutoDock Vina for systems without GPU

Stage 4: ML Prediction & ADMET

Machine Learning binding prediction: Neural network models predict binding affinity
ADMET analysis (15+ endpoints):
- Absorption: Can the body absorb the drug?
- Distribution: Where does it go in the body?
- Metabolism: How is it processed?
- Excretion: How is it eliminated?
- Toxicity: Is it safe?

Stage 5: Scoring & Ranking

Composite scoring: Combines docking, ML, and ADMET scores
Pareto optimization: Multi-objective ranking for best trade-offs
Diversity selection: Ensures chemically diverse candidates
Outputs ranked list of promising drug candidates

Key Advantages

Zero Cost

100% free and open-source - No subscriptions, no cloud fees
Uses freely available molecular databases (ChEMBL, ZINC, PDB)
No proprietary software dependencies

Runs on Consumer Hardware

Minimum: NVIDIA GTX 1060 6GB (or CPU-only mode)
Optimized batch processing to fit in limited GPU memory
Automatic hardware detection and configuration

Scientifically Rigorous

Based on established computational chemistry methods
Validated against benchmark datasets
Clear documentation of limitations and assumptions

Modular & Extensible

Add custom scoring functions
Integrate new molecular databases
Extend with additional ADMET endpoints

Performance Estimates

Hardware	Molecules/Day	Use Case
GTX 1060 6GB	~100,000	Personal research
RTX 3060 12GB	~300,000	Small lab
RTX 4090 24GB	~1,000,000	High-throughput screening
CPU only (8 cores)	~10,000	Testing/Development

Current Status

Phase 1 Core Implementation: COMPLETE

Component	Status	Description
Docking Engine	✅	AutoDock-GPU integration with Vina fallback
ML Prediction	✅	Binding affinity and fingerprint models
ADMET	✅	15+ pharmacokinetic/toxicity endpoints
Scoring	✅	Composite scoring with Pareto ranking
Pipeline	✅	Complete screening workflow
CLI	✅	Command-line interface
Receptor Preparation	✅	Professional PDB to PDBQT conversion
Binding Site Detection	✅	Automatic extraction from co-crystallized ligands
DeepChem Integration	✅	Graph neural network binding prediction (optional)
Multi-Target Validation	✅	Validated on COX-2, EGFR, HIV-PR, AChE
Tests	✅	67 unit and integration tests (100% passing)

Installation Guide

TL;DR - Quick Install (Experienced Users)

# Clone and setup
git clone https://github.com/alfredo-baratta/open-cure-discovery.git
cd open-cure-discovery
python -m venv venv && source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -e ".[all]"

# Download Vina (required for docking)
# Windows: Download vina.exe from https://github.com/ccsb-scripps/AutoDock-Vina/releases
# Linux: wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64 -O tools/vina && chmod +x tools/vina

# Validate
python examples/validate_installation.py
python examples/multi_target_validation.py  # Full validation (~10 min)

Prerequisites

Before installing Open Cure Discovery, ensure you have:

Requirement	Version	Notes
Python	3.10+	Download
Git	Any	Download
NVIDIA GPU	Optional	For GPU-accelerated docking
CUDA Toolkit	11.0+	Only if using GPU features

Step 1: Clone the Repository

git clone https://github.com/alfredo-baratta/open-cure-discovery.git
cd open-cure-discovery

Step 2: Create Virtual Environment

Windows (PowerShell):

python -m venv venv
.\venv\Scripts\Activate.ps1

Windows (Command Prompt):

python -m venv venv
venv\Scripts\activate.bat

Linux/macOS:

python -m venv venv
source venv/bin/activate

Step 3: Install Python Dependencies

# Install core dependencies
pip install -e ".[all]"

# Verify RDKit installation (required)
python -c "from rdkit import Chem; print('RDKit OK')"

Step 4: Install AutoDock Vina (Required for Docking)

AutoDock Vina is the molecular docking engine. You must install it separately.

Windows

Download from: https://github.com/ccsb-scripps/AutoDock-Vina/releases
Download vina_1.2.5_windows_x86_64.zip (or latest version)

Extract vina.exe to a tools/ folder in your project:

open-cure-discovery/
└── tools/
    └── vina.exe

Verify installation:

.\tools\vina.exe --version
# Expected: AutoDock Vina 1.2.5

Linux

# Option 1: Download binary
wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_linux_x86_64
chmod +x vina_1.2.5_linux_x86_64
mkdir -p tools && mv vina_1.2.5_linux_x86_64 tools/vina

# Option 2: Install via conda
conda install -c conda-forge autodock-vina

# Verify
./tools/vina --version

macOS

# Download binary
wget https://github.com/ccsb-scripps/AutoDock-Vina/releases/download/v1.2.5/vina_1.2.5_mac_x86_64
chmod +x vina_1.2.5_mac_x86_64
mkdir -p tools && mv vina_1.2.5_mac_x86_64 tools/vina

# For Apple Silicon (M1/M2), use Rosetta or compile from source
# Verify
./tools/vina --version

Step 5: Install Optional Dependencies

AutoDock-GPU (For NVIDIA GPU Users)

If you have an NVIDIA GPU, you can use AutoDock-GPU for 10-100x faster docking.

Windows:

Download from: https://github.com/ccsb-scripps/AutoDock-GPU/releases
Download adgpu-v1.5.3-windows.zip (or latest)

Extract to tools/ folder:

open-cure-discovery/
└── tools/
    ├── vina.exe
    └── autodock_gpu.exe

Verify CUDA is available:

nvidia-smi  # Should show your GPU
.\tools\autodock_gpu.exe --version

Linux:

# Download binary
wget https://github.com/ccsb-scripps/AutoDock-GPU/releases/download/v1.5.3/adgpu-v1.5.3-linux.tar.gz
tar -xzf adgpu-v1.5.3-linux.tar.gz
mv autodock_gpu_* tools/autodock_gpu

# Verify
./tools/autodock_gpu --version

Note: AutoDock-GPU requires NVIDIA GPU with CUDA. Falls back to Vina (CPU) if unavailable.

DeepChem (Advanced ML Models)

DeepChem provides graph neural network models for binding affinity prediction.

# Install DeepChem with PyTorch backend
pip install deepchem[torch]

# Verify installation
python -c "import deepchem; print(f'DeepChem {deepchem.__version__} OK')"

Note: DeepChem is optional. The system works without it using heuristic models.

Meeko (Ligand Preparation)

Meeko is used for preparing ligands for docking.

pip install meeko

# Verify
python -c "from meeko import MoleculePreparation; print('Meeko OK')"

Step 6: Validate Installation

Run the validation script to check everything is working:

python examples/validate_installation.py

Expected output:

============================================================
OPEN CURE DISCOVERY - Installation Validation
============================================================

[1] Checking Python version... OK (3.10.x)
[2] Checking RDKit... OK
[3] Checking NumPy... OK
[4] Checking Meeko... OK
[5] Checking AutoDock Vina... OK (tools/vina.exe)
[6] Checking DeepChem... OK (optional)

============================================================
ALL CHECKS PASSED - Installation successful!
============================================================

Step 7: Run Multi-Target Validation (Recommended)

Verify the docking system works correctly on real drug targets:

python examples/multi_target_validation.py

This tests docking on 4 therapeutic targets (COX-2, EGFR, HIV-PR, AChE) with known drugs.

Quick Start

Run Demo Screening

# Quick demo with sample molecules (~1 second, no docking)
python examples/demo_screening.py

Expected output:

RESULTS
============================================================
Total screened: 10
Passed filters: 6
Duration: 0.71 seconds

Top Candidates:
------------------------------------------------------------
Rank   Name            Score      ADMET      ML Binding
------------------------------------------------------------
1      naproxen        0.5034     0.8211     0.8262
2      ibuprofen       0.4873     0.7616     0.8143
3      omeprazole      0.4764     0.7224     0.8055
...

Run Real Docking Validation

# Test real docking on COX-2 target (~2 minutes)
python examples/real_docking_validation.py

Programmatic Usage

from src.core.models import Molecule, ProteinTarget
from src.core.pipeline import ScreeningPipeline, PipelineConfig
from src.core.docking import ReceptorPreparator

# Prepare receptor automatically from PDB
preparator = ReceptorPreparator(vina_path="tools/vina.exe")  # or "tools/vina" on Linux
receptor_info = preparator.prepare_from_pdb_id(
    pdb_id="1CX2",      # COX-2 structure
    ligand_code="S58",  # Co-crystallized ligand for binding site
)

# Create molecules to screen
molecules = [
    Molecule(id="celecoxib", smiles="CC1=CC=C(C=C1)C2=CC(=NN2C3=CC=C(C=C3)S(=O)(=O)N)C(F)(F)F"),
    Molecule(id="ibuprofen", smiles="CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"),
]

# Define target with auto-detected binding site
target = ProteinTarget(
    id="COX2",
    name="Cyclooxygenase-2",
    pdb_id="1CX2",
    binding_sites=[receptor_info.binding_site],
)

# Configure pipeline with docking enabled
config = PipelineConfig(
    run_docking=True,
    run_ml_prediction=True,
    run_admet=True,
    top_n=100,
    vina_path="tools/vina.exe",
)

# Run screening
pipeline = ScreeningPipeline(config)
results = pipeline.run(molecules, target)

# View top candidates
for candidate in results.candidates[:10]:
    print(f"{candidate.rank}. {candidate.molecule.name}: {candidate.final_score:.3f}")

Troubleshooting

Common Issues

Problem	Solution
`ModuleNotFoundError: rdkit`	Run `pip install rdkit` or use conda: `conda install -c conda-forge rdkit`
`vina.exe not found`	Download Vina and place in `tools/` folder
`CUDA out of memory`	Reduce batch size or use CPU mode
`DeepChem import error`	DeepChem is optional, system works without it
`Meeko preparation failed`	Ensure molecule SMILES is valid

Windows-Specific Issues

PowerShell execution policy: Run Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
Long path names: Enable long paths in Windows settings or use shorter directory names

Linux-Specific Issues

Permission denied on vina: Run chmod +x tools/vina
Missing libstdc++: Install with sudo apt install libstdc++6

Component Summary

Component	Required	Purpose	Installation
Python 3.10+	✅ Yes	Runtime	python.org
RDKit	✅ Yes	Cheminformatics	`pip install rdkit`
AutoDock Vina	✅ Yes*	Molecular docking	Download binary
Meeko	✅ Yes	Ligand preparation	`pip install meeko`
AutoDock-GPU	❌ Optional	GPU-accelerated docking	Download binary
DeepChem	❌ Optional	GNN binding prediction	`pip install deepchem[torch]`
CUDA Toolkit	❌ Optional	GPU acceleration	nvidia.com

*Required only if you want to run molecular docking. ML prediction and ADMET work without it.

Features

Core Capabilities

Feature	Description
Docking	AutoDock-GPU with Vina CPU fallback
Receptor Prep	Automatic PDB download, cleaning, PDBQT conversion
Binding Site	Auto-detection from co-crystallized ligands
Fingerprints	Morgan (ECFP), MACCS, RDKit fingerprints
ML Binding	Neural network + DeepChem graph neural networks
ADMET	QED, Lipinski, BBB permeability, hERG inhibition, AMES mutagenicity, hepatotoxicity
Filters	PAINS, toxicophores, Lipinski, Veber
Scoring	Weighted composite with normalization
Ranking	Top-N, Pareto optimization, MaxMin diversity selection

Supported Databases

Database	Content	Access
ChEMBL	2M+ bioactive molecules	✅ API integrated
PDB	200K+ protein structures	✅ API integrated
ZINC	750M+ purchasable compounds	✅ API integrated

Hardware Requirements

Minimum Requirements

Component	Requirement
GPU	NVIDIA GTX 1060 6GB (or CPU-only mode)
CPU	4 cores, 2.5GHz
RAM	16GB
Storage	50GB (for databases and results)

Project Structure

open-cure-discovery/
├── src/
│   ├── core/
│   │   ├── models.py         # Data models (Molecule, Target, etc.)
│   │   ├── config.py         # Configuration management
│   │   ├── pipeline.py       # Main screening pipeline
│   │   ├── docking/          # Molecular docking engines
│   │   │   ├── engine.py     # AutoDock-GPU and Vina engines
│   │   │   ├── receptor.py   # Professional receptor preparation
│   │   │   └── preparation.py # Ligand preparation
│   │   ├── ml/               # ML prediction
│   │   │   ├── fingerprints.py    # Morgan, MACCS, RDKit fingerprints
│   │   │   ├── binding.py         # Neural network binding prediction
│   │   │   └── deepchem_binding.py # DeepChem GNN models (optional)
│   │   ├── admet/            # ADMET calculation & filters
│   │   └── scoring/          # Scoring & ranking algorithms
│   ├── data/loaders/         # Database loaders (ChEMBL, PDB, ZINC)
│   ├── ui/cli/               # Command-line interface
│   └── utils/                # Utilities (GPU detection, I/O)
├── examples/
│   ├── demo_screening.py          # Demo script
│   ├── validate_installation.py   # Installation validation
│   ├── real_docking_validation.py # Single-target docking test
│   └── multi_target_validation.py # Multi-target validation suite
├── tests/                    # Test suite (67 tests)
├── configs/diseases/         # Disease-specific presets
└── docs/                     # Documentation

Documentation

Architecture Overview - System design and components
Development Roadmap - Project phases and milestones
Task List - Implementation status
Contributing Guide - How to contribute

Running Tests

# Install dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run with coverage report
pytest --cov=src

# Run specific test file
pytest tests/test_integration.py -v

Contributing

We welcome contributions from everyone! See CONTRIBUTING.md for guidelines.

Ways to contribute:

Run validation and report issues
Improve documentation
Add new ADMET prediction endpoints
Implement web dashboard (Phase 2)
Validate results against experimental data
Add support for new molecular databases
Optimize performance for specific hardware

Scientific Validity

This project prioritizes scientific rigor:

Open Methods: All algorithms are fully documented and open-source
Benchmarking: Validation against standard datasets (DUD-E, MUV)
Multi-Target Validation: Tested on 4 real therapeutic targets with known drugs
Limitations: Clear documentation of computational constraints
Reproducibility: Deterministic results with fixed random seeds

Validation Results

The docking system has been validated on 4 therapeutic targets:

Target	PDB	Indication	Drugs Tested	Status
COX-2	1CX2	Inflammation/Pain	Celecoxib, Diclofenac, Naproxen	✅ PASS
EGFR	1M17	Lung Cancer	Erlotinib, Gefitinib, Lapatinib	✅ PASS
HIV-1 Protease	1HVR	HIV/AIDS	Ritonavir, Indinavir, Saquinavir	✅ PASS
Acetylcholinesterase	1EVE	Alzheimer's	Donepezil, Rivastigmine, Galantamine	✅ PASS

All targets correctly rank known drugs above negative controls with significant energy differences (1.5-4.0 kcal/mol).

Important Disclaimer: Computational predictions require experimental validation. This software identifies candidates for further laboratory study—it does not produce ready-to-use drugs. Any promising candidates must undergo rigorous in vitro and in vivo testing before clinical consideration.

Roadmap

Phase	Status	Focus
Phase 0	✅ Complete	Foundation & Architecture
Phase 1	✅ Complete	Core Engine & Pipeline
Phase 2	🔄 In Progress	User Experience & Web UI
Phase 3	⏳ Planned	Community & Distributed Computing

See ROADMAP.md for detailed milestones.

Use Cases

For Researchers

Screen large compound libraries against your target of interest. Export results in standard formats for further analysis.

For Patient Advocacy Groups

Focus computational resources on neglected diseases. Generate preliminary data to attract research funding.

For Students & Educators

Learn computational drug discovery with real tools and data. Hands-on experience with molecular docking and ADMET prediction.

For Citizen Scientists

Contribute to drug discovery from home. Join a global effort to find new treatments.

License

Apache License 2.0 - see LICENSE

This means you can:

Use commercially
Modify and distribute
Use privately
Use patents

Citation

If you use Open Cure Discovery in your research, please cite:

@software{open_cure_discovery,
  author = {Baratta, Alfredo},
  title = {Open Cure Discovery: Democratizing Drug Discovery},
  year = {2025},
  url = {https://github.com/alfredo-baratta/open-cure-discovery}
}

Acknowledgments

This project builds on the work of many open-source projects and databases:

RDKit - Cheminformatics toolkit
AutoDock Vina - Molecular docking
AutoDock-GPU - GPU-accelerated docking
DeepChem - Deep learning for chemistry (optional)
Meeko - Molecular preparation
ChEMBL - Bioactivity database
RCSB PDB - Protein structure database
ZINC - Compound database

Together, we can accelerate the discovery of cures.

"The best time to plant a tree was 20 years ago. The second best time is now."

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs/diseases		configs/diseases
data		data
docs		docs
examples		examples
models		models
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Component	Recommendation
GPU	NVIDIA RTX 3060 12GB or better
CPU	8+ cores
RAM	32GB
Storage	100GB SSD

Folders and files

Latest commit

History

Repository files navigation

Open Cure Discovery

The Vision

Why This Matters

How It Works

Stage 1: Molecule Input

Stage 2: Pre-filtering

Stage 3: Molecular Docking

Stage 4: ML Prediction & ADMET

Stage 5: Scoring & Ranking

Key Advantages

Zero Cost

Runs on Consumer Hardware

Scientifically Rigorous

Modular & Extensible

Performance Estimates

Current Status

Installation Guide

TL;DR - Quick Install (Experienced Users)

Prerequisites

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Python Dependencies

Step 4: Install AutoDock Vina (Required for Docking)

Windows

Linux

macOS

Step 5: Install Optional Dependencies

AutoDock-GPU (For NVIDIA GPU Users)

DeepChem (Advanced ML Models)

Meeko (Ligand Preparation)

Step 6: Validate Installation

Step 7: Run Multi-Target Validation (Recommended)

Quick Start

Run Demo Screening

Run Real Docking Validation

Programmatic Usage

Troubleshooting

Common Issues

Windows-Specific Issues

Linux-Specific Issues

Component Summary

Features

Core Capabilities

Supported Databases

Hardware Requirements

Minimum Requirements

Recommended

Project Structure

Documentation

Running Tests

Contributing

Scientific Validity

Validation Results

Roadmap

Use Cases

For Researchers

For Patient Advocacy Groups

For Students & Educators

For Citizen Scientists

License

Citation

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages