Eubiota: Agentic AI for Autonomous Microbiome Discovery

A modular framework for mechanistic reasoning and experimental design. Eubiota orchestrates specialized agents to drive tool-grounded discovery through outcome-driven refinement.

Overview

Eubiota is a modular agentic platform for end-to-end discovery in the human microbiome, combining multi-agent reasoning with domain-specific tools.

System Architecture

Specialized agents orchestrate an iterative cycle of planning, execution, and verification via shared memory to ensure rigorous evidence grounding.

News & Updates

[2026.02] Coming.

Setup

Prerequisites

Python 3.11 (recommended)

Installation

Quick Install with UV (Recommended)

bash setup.sh
source .venv/bin/activate

This installs with inference dependencies: uv pip install -e ".[infer]"

Installation Options

Use case	Command
Inference only (recommended)	`bash setup.sh infer`
Training	`bash setup.sh infer train`
Extended engines (Dashscope, Together, Ollama)	`uv pip install -e ".[extended-engines]"`
Full (all features + training)	`uv pip install -e ".[all]"` and `bash setup_stable_gpu.sh`

Download Tool Databases

cd data
source create_all_dbs.sh

Configure API Keys

Copy the .env.template file from scientist/.env.template and rename it to .env, then place it in the scientist/ folder. Update the following variables with your own API keys:

OPENAI_API_KEY (for judging reasponse)
GOOGLE_API_KEY (for Google Search tool)
PERPLEXITY_API_KEY (for Perplexity Search tool)

Please check API Key Setup Guide for detailed instructions on how to obtain these keys.

Quick Start

Run a single query:

python scientist/solver_scientist.py

Check Before You Run (Recommended)

Before running inference or training, we recommend verifying that your API keys and environment are properly configured.

Test Tools

Run the following command to test all integrated tools:

cd scientist
python -m tools.test_tools

Example output:

Success Rate: 100.0%
Tools requiring LLM: 8
  - pubmed_search
  - ...
Tools not requiring LLM: 6
  - kegg_gene_search
  - ...

Test LLM Engines

Verify that your LLM engines (OpenAI, DashScope, Gemini, etc.) are correctly initialized and responding:

python scientist/scripts/test_llm_engine.py

example output:

🚀 Starting fault-tolerant test for 11 engines...
✅ Success: 6
 • gpt-4o
 • azure-gpt-4
 • dashscope-qwen2.5-3b-instruct
 • gemini-1.5-pro
 • vllm-meta-llama/Llama-3-8b-instruct
 • together-meta-llama/Llama-3-70b-chat-hf
...
🎉 Testing complete. Script did NOT crash despite errors.

Scientific Benchmark

Serve the trained planner model with VLLM (here we deploy our Eubiota-8b planner model):

bash setup.sh train
bash tests/exp/serve_vllm.sh

for more vllm serving local model details, please see guidance

Run inference on specific benchmark tasks:

cd test
# Run Our Drug-Microbiome_Impact benchmark
bash tests/Drug-Microbiome_Impact/run.sh

After running, each task folder (e.g., test/Drug-Microbiome_Impact/) will contain:

data/: Contains the evaluation dataset (e.g., data.json).
logs/: Contains detailed execution logs for each problem index (organized by model label).
results/: Contains the model's generated answers (output_i.json) and final evaluation scores (finalscore_*.log).

You can find more benchmarking details in benchmark.md.

Training

Dataset Preparation

We mix four domains datasets for training: NQ (Natural Questions) for agentic search, DeepMath-103K for mathematical reasoning, PubMedQA & MedQA-USMLE for general medical-biology reasoning, and our curated microbiome reasoning dataset. (Please remember to access before you run make_train_data.py)

# train & validation data with specified ratio
python data/make_train_data.py

for more useage of how to customize training & validation data, see data/data_prepare.md

After that, data dir should be:

data/
├── train/
│   └── train.parquet
├── val/
│   └── val.parquet (100 samples)
└── make_train_data.py

Start Training

Training uses Group Relative Policy Optimization for Multi-Agent Systems (GRPO-MAS) for the planner module.

Start training with tmux:

# Login to wandb first
wandb login

# Create tmux session and start agentflow service (Window 0, make sure you are at the project root and ran `source .venv/bin/activate` in each window)
tmux new-session -s eubiota
bash trainer/train_scientist/serve_with_logs.sh

# Create new window (Ctrl+B then C), modify the `BASE_DATA_DIR` in `trainer/train_scientist/config.yaml` to the absolute path of the data directory, and then start training (Window 1)
bash trainer/train_scientist/train_with_logs.sh

Configuration: All training hyperparameters are in trainer/train_scientist/config.yaml (model settings, tools, RL parameters, resources, etc.) For more details, please see Configuration Guide.

Logging: We provide a comprehensive logging to monitor training. See logs.md for more details.

Customization

For detailed instructions on adding new tools and configuring the agent modules, please refer to the Customization Guide.

An example of our scientific experiment verified workflow visualization is as follows:

Experience designing your own workflow online

➕ Adding New Tools

Step 1: Create Tool Directory

scientist/tools/your_tool_name/
├── tool.py
├── config.yaml
└── README.md

Step 2: Implement Tool Card

class YourTool(BaseTool):
    def execute(self, query):
        # Your tool logic
        return result

Step 3: Register in scientist/tools/__init__.py

from .your_tool_name import YourTool
# Add to __all__ and TOOL_REGISTRY
__all__.append("YourTool")
TOOL_REGISTRY["Your_Tool"] = YourTool

Step 4: Register Tool Add to configuration:

enabled_tools:
  - Your_Tool_Name

Scientific Experiments

Main Scientific Results

Eubiota driven inflammatory stress gene discovery

Task1-Part1 Task1-Part2

Eubiota assisted therapeutic design for gut diseases

Task2

Eubiota enabled pathogen-biased antibiotic cocktail design

Task3

Eubiota guided anti-inflammatory molecule discovery

Task4

Acknowledgements

We also thank the following open-source projects:

VeRL for the excellent RL framework design.
vLLM for fast LLM inference support.
AgentFlow and Agent Lightning for early-stage exploration in multi-agent RL training.

Eubiota Team

We are grateful for all the help we got from our contributors!

_{Pan Lu}	_{Yifan Gao}	_{William G. Peng}	_{Haoxiang Zhang}	_{Kunlun Zhu}
_{Elektra Robinson}	_{Qixin Xu}	_{Masakazu Kotaka}	_{Harrison G. Zhang}	_{Bingxuan Li}
_{Anthony Shiver}	_{Yejin Choi}	_{Kerwyn Casey Huang}	_{Justin L. Sonnenburg}	_{James Zou}

Citation

@article{lu2026eubiota,
  title = {Eubiota: Modular Agentic AI for Autonomous Discovery in the Gut Microbiome},
  author = {Lu, Pan and Gao, Yifan and Peng, William G. and Zhang, Haoxiang and Zhu, Kunlun and Robinson, Elektra K. and Xu, Qixin and Kotaka, Masakazu and Zhang, Harrison G. and Li, Bingxuan and Shiver, Anthony L. and Choi, Yejin and Huang, Kerwyn Casey and Sonnenburg, Justin and Zou, James},
  journal = {bioRxiv},
  year = {2026},
  month = {feb},
  day = {27},
  doi = {10.64898/2026.02.27.708412},
  url = {https://www.biorxiv.org/content/10.64898/2026.02.27.708412v1},
  publisher = {Cold Spring Harbor Laboratory}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
data		data
scientist		scientist
tests		tests
trainer		trainer
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Eubiota: Agentic AI for Autonomous Microbiome Discovery

Overview

System Architecture

News & Updates

Setup

Prerequisites

Installation

Download Tool Databases

Configure API Keys

Quick Start

Check Before You Run (Recommended)

Test Tools

Test LLM Engines

Scientific Benchmark

Training

Dataset Preparation

Start Training

Customization

➕ Adding New Tools

Scientific Experiments

Acknowledgements

Eubiota Team

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Eubiota: Agentic AI for Autonomous Microbiome Discovery

Overview

System Architecture

News & Updates

Setup

Prerequisites

Installation

Download Tool Databases

Configure API Keys

Quick Start

Check Before You Run (Recommended)

Test Tools

Test LLM Engines

Scientific Benchmark

Training

Dataset Preparation

Start Training

Customization

➕ Adding New Tools

Scientific Experiments

Acknowledgements

Eubiota Team

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages