A simple implementation of the knowledge-updating research agent from the DOE Agent Requirements document.
- Searches arXiv for recent papers on protein-ligand binding affinity
- Triages relevance using Claude to determine if papers are relevant
- Extracts binding data (protein, ligand, affinity values) from abstracts
- Stores in SQLite for querying and analysis
- Alerts on high-impact findings (logs to console; extend for email/Slack)
- Tracks retraining threshold (stubs out the actual NERSC job submission)
cd examples/knowledge-agent
pip install -r requirements.txt
export ANTHROPIC_API_KEY=your-key-herepython agent.py# Run daily at 2am
0 2 * * * cd /path/to/knowledge-agent && python agent.py >> agent.log 2>&1knowledge_base.db— SQLite database with papers and extracted binding dataagent_state.json— Agent state (last run, record counts, model version)agent.log— Log file
sqlite3 knowledge_base.db
-- Recent relevant papers
SELECT title, published FROM papers WHERE is_relevant = 1 ORDER BY published DESC LIMIT 10;
-- Extracted binding data
SELECT protein, ligand, affinity_value, affinity_unit, method
FROM binding_data
ORDER BY created_at DESC LIMIT 20;
-- Count by protein
SELECT protein, COUNT(*) as count FROM binding_data GROUP BY protein ORDER BY count DESC;Edit the CONFIG dict in agent.py:
CONFIG = {
"search_query": "protein ligand binding affinity", # arXiv search terms
"max_papers_per_run": 20, # papers to process per run
"lookback_days": 7, # initial lookback window
"records_before_retrain": 200, # trigger retrain threshold
...
}Add more sources (PubMed, bioRxiv):
- PubMed: Use
biopythonwithEntrez - bioRxiv: Use their RSS feed or API
Real alerts:
- Email: Use
smtplibor SendGrid - Slack: Use
slack_sdk
Real retraining:
- Use Globus Compute to submit jobs to NERSC
- Add human approval flow (could be a simple prompt or web interface)
This is a demonstration. A production version would need:
- Better error handling and retries
- Rate limiting for APIs
- More sophisticated extraction (full-text, not just abstracts)
- Actual model retraining integration
- Human approval workflow
- Proper secrets management