This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
NodeNormalization is a FastAPI microservice for the NIH NCATS Translator project. It normalizes biomedical CURIEs (Compact URIs) and finds equivalent identifiers across databases. Given a CURIE, it returns the preferred CURIE, equivalent identifiers, and Biolink semantic types.
Data comes from Babel (identifier equivalence project) and is stored in Redis across 7 separate databases. The service supports both standalone and clustered Redis.
Setup:
python -m venv nodeNormalization-env
source nodeNormalization-env/bin/activate
pip install -r requirements.txtRun web server:
uvicorn --host 0.0.0.0 --port 8000 --workers 1 node_normalizer.server:app
# API docs at http://localhost:8000/docsRun with Docker:
docker-compose up # Starts Redis + web service on port 8080Load data into Redis:
python load.py # Requires Redis running and compendia files in configured directoryRun tests:
pytest # All tests
pytest tests/test_endpoints.py # Single test file
pytest tests/test_endpoints.py::test_function_name # Single testFormatting:
black --line-length 160 .Babel Compendia Files → loader.py → Redis (7 DBs) → FastAPI (server.py) → REST/TRAPI responses
| DB | Name | Purpose |
|---|---|---|
| 0 | eq_id_to_id_db | Equivalent ID → canonical ID |
| 1 | id_to_eqids_db | ID → all equivalent IDs |
| 2 | id_to_type_db | ID → semantic types |
| 3 | curie_to_bl_type_db | CURIE → Biolink types |
| 4 | info_content_db | Information content scores |
| 5 | gene_protein_db | Gene/protein conflation |
| 6 | chemical_drug_db | Chemical/drug conflation |
node_normalizer/server.py— FastAPI app with all REST endpoints. Uses lifespan events for Redis connection setup/teardown. Root path is/1.3.node_normalizer/normalizer.py— Core logic:get_normalized_nodes(),normalize_message()(for TRAPI), and equivalent CURIE discovery. Traverses Biolink Model ancestors for semantic type expansion.node_normalizer/loader.py—NodeLoaderclass that reads flat compendia files and populates Redis. Validates input againstresources/valid_data_format.json. Batch size: 100,000.node_normalizer/redis_adapter.py—RedisConnectionFactoryandRedisConnectionclasses abstracting both clustered and standalone Redis, with async and sync support.node_normalizer/model/— Pydantic request/response models (input.py,response.py).config.json— Lists compendia and conflation files to load, preferred name boost prefixes, and feature flags (test mode, debug).
GET/POST /get_normalized_nodes— Main normalization endpoint; accepts CURIE listGET/POST /get_setid— Deterministic hash for a set of CURIEsGET /get_semantic_types— Lists available Biolink semantic typesGET /get_curie_prefixes— Lists CURIE prefixes per semantic typePOST /query— Normalizes full TRAPI response objectsGET /status— Health check with database info
Two separate images are built and released to ghcr.io:
- Main webserver (
Dockerfile) — uvicorn entry point - Data loader (
data-loading/Dockerfile) — for loading Babel compendia into Redis
- Uses
pytest-asyncio(async mode enabled viapytest.ini) - Redis testcontainers for isolated integration tests
- Fixtures in
tests/conftest.py - Test data in
tests/resources/