Skip to content

SuyashEkhande/OpenFDA-Semantic-MCP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenFDA FastMCP Server 🧬🤖

A production-grade, AI-native Model Context Protocol (MCP) server providing LLMs with comprehensive, intent-driven access to the entire openFDA API Ecosystem.

Built for modern GenAI orchestrators (like Claude Desktop) operating in the Healthcare and Life Sciences (HCLS) domain, this server moves beyond basic 1:1 API proxies. It utilizes Domain-Driven Design (DDD), semantic aggregation, and stateful cursor pagination to allow AI Agents to interrogate millions of FDA adverse events, medical device clearances, and drug labels securely and reliably.


🎯 Key Capabilities & Highlights

  • 100% Endpoint Coverage: Full integration with FDA Drug, Device, Food, Cosmetics, Animal/Veterinary, Tobacco, Transparency, and non-clinical (NSDE/UNII) datasets.
  • Intent-Driven Architecture (HLD): Groups disparate API interactions into high-level LLM capabilities (e.g., analyze_drug_profile) rather than low-level database lookups.
  • Graceful "Zero Match" Intercepts: Naturally handles extreme edge-case hallucinated queries by catching 404: No Matches Found backend errors and feeding the LLM clean, empty arrays ([]) instead of throwing stack-trace crashes.
  • Stateful Cursor Pagination (LLD): Safely bypasses the openFDA 26,000-record skip limit. The MCP Server extracts search_after tokens from the Link header, allowing LLMs to seamlessly paginate through 15+ million adverse event records without token bloat.
  • Anti-Hallucination Schemas: Strict Pydantic models leverage Literal[] typeings bounding all valid endpoints (verified directly via web scrapes of open.fda.gov documentation) to guarantee AI agents never hallucinate nonexistent slugs.

🏗️ Technical Architecture (HLD & LLD)

High-Level Design (HLD)

  1. Semantic Aggregation & Progressive Disclosure: Rather than handing raw, multi-megabyte JSON payloads to LLMs (which explodes context windows), tools like analyze_drug_profile execute concurrent operations parsing Adverse Events and Recall Enforcements simultaneously. The server extracts the openfda branding headers, boils down the data to top-hit summaries, and returns structured profiles.
  2. Domain-Driven Repository Structure: The codebase strictly separates sub-domains (e.g., domains/drug, domains/device, domains/food). This isolates logic so changes to the complex 510(k) device reporting structure don't risk breaking Cosmetic Adverse Event tools.

Low-Level Design (LLD)

  1. Client URL Syntax Builder (utils/syntax_builder.py): Standard Python HTTP clients auto-encode + into %2B. OpenFDA strictly blocks %2B and demands literal +AND+ delimiters for multi-field queries. A custom query builder patches standard URL encoding to satisfy openFDA's esoteric Lucene syntax.
  2. Pydantic Guardrailing & Docstring Routing (schemas.py & tools.py): Each tool exposes highly explicit Python docstrings uniquely optimized for "Agentic Routing" (e.g., explaining precisely when to use the Cosmetic tool vs the Transparency tool). Pydantic rigorously validates inputs before hitting the HTTP client to ensure optimal API compute behavior.
  3. Async Core (core/client.py): Fully asynchronous httpx engine handling concurrency, rate-limits, and SSL verification profiles natively matching FastMCP event loops.

🛠️ Provided Tools

This server injects the following tools into your LLM orchestrator:

Domain Tool Name Agentic Use Case
Drugs analyze_drug_profile Summarize total adverse hits & critical recalls for a specific drug.
Drugs search_drug_labels Search active ingredients, warnings, and boxed alerts in pill labels.
Drugs search_drug_power_query Deep-dive raw searches against generic event datasets.
Devices evaluate_medical_device Correlate a 3-letter FDA Product Code against 510(k) clearances.
Devices search_device_power_query Raw search across PMAs, UDIs, 510(k)s, and recalls.
Food & Cosmetics search_food_events & search_cosmetic_events Investigate salmonella outbreaks, product anomalies, and cosmetic injuries.
Veterinary search_animal_events Interrogate veterinary responses across dog/cat species or flea medications.
Tobacco search_tobacco_data Query prevention/digital ad studies and problem reports.
Transparency search_transparency_data Access FDA Complete Response Letters (CRLs).
Other search_other_data Retrieve historical press releases and substance UNII codes.

🚀 Quickstart & Installation

This project is orchestrated using uv for ultra-fast, modern Python dependency generation.

  1. Clone & Install

    git clone https://github.com/yourusername/openfda_mcp.git
    cd openfda_mcp
    uv pip install -e .
  2. Configure External Clients (Claude Desktop) Add the following to your claude_desktop_config.json:

    {
      "mcpServers": {
        "openfda_server": {
          "command": "uv",
          "args": [
            "--directory",
            "/absolute/path/to/openfda_mcp",
            "run",
            "fastmcp",
            "run",
            "-m",
            "openfda_mcp.main:mcp",
            "--transport",
            "stdio"
          ]
        }
      }
    }
  3. Running the Server Manually (Streamable HTTP) If you wish to test or host the server natively over Server-Sent Events (SSE) instead of stdio:

    uv run fastmcp run src/openfda_mcp/main.py:mcp --transport streamable-http --port 8000

💬 Example LLM Prompts & Workflows

Once mounted in your Agent framework, you can formulate zero-shot biomedical queries natively:

Drug Analysis:

"Use OpenFDA to analyze the safety profile of 'Acetaminophen'. Are there any major recalls associated with it?" The agent will automatically use analyze_drug_profile, formatting output gracefully without flooding the UI context window.

Medical Device Investigation:

"Look up FDA Product Code DTQ. How many 510(k) clearances does it have, and what are the top applications?"

Cosmetic Research:

"Query the cosmetic events dataset to find if there are any reported cancer outcomes related to Talcum or Baby Powder products from 2015-2020."


Security Note

Local SSL verification is bypassable inside config.py defaults to support local MacOS proxy firewall testing. Ensure OPENFDA_API_KEY is injected via environment variable to support bursts over 40 requests/minute in production arrays.

About

A production-grade, AI-native Model Context Protocol (MCP) server providing LLMs with comprehensive, intent-driven access to the entire openFDA API Ecosystem.

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages