deriva-mcp-core

A Model Context Protocol (MCP) server for the DERIVA platform. Exposes ERMrest and Hatrac as MCP tools with per-request OAuth authentication via Credenza.

Transport Modes

Mode	Auth	Use Case
`stdio`	Local credential from `~/.deriva/credential.json`	Local development, Claude Desktop
`http`	Per-request Bearer token via Credenza introspect + exchange	Production multi-user deployment

Installation

Requires Python 3.11+. Install with uv (recommended) or pip.

# Install from the repository
uv pip install git+https://github.com/informatics-isi-edu/deriva-mcp-core.git

# With RAG support (ChromaDB and pgvector dependencies)
uv pip install "deriva-mcp-core[rag] @ git+https://github.com/informatics-isi-edu/deriva-mcp-core.git"

For development:

git clone https://github.com/informatics-isi-edu/deriva-mcp-core.git
cd deriva-mcp-core
uv sync --extra dev --extra rag

Quick Start

stdio (local development)

stdio mode reads credentials from ~/.deriva/credential.json -- the same file used by deriva-py CLI tools. No additional configuration is required.

Run the server:

deriva-mcp-core
# or explicitly:
deriva-mcp-core --transport stdio

Configure Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "deriva": {
      "command": "deriva-mcp-core",
      "args": [
        "--transport",
        "stdio"
      ]
    }
  }
}

By default, all mutating tools are disabled in stdio mode as well -- set DERIVA_MCP_DISABLE_MUTATING_TOOLS=false if you need write access.

HTTP (production)

HTTP mode validates each request's Bearer token against Credenza and exchanges it for a DERIVA-scoped derived token. Create a deriva-mcp.env file (see Configuration Reference) and run:

deriva-mcp-core --transport http --host 0.0.0.0 --port 8000

The server loads deriva-mcp.env from the first path found in:

/etc/deriva-mcp/deriva-mcp.env (system-wide)
~/deriva-mcp.env (user home)
./deriva-mcp.env (current directory)

Use --config /path/to/file to override.

Minimal deriva-mcp.env for HTTP:

DERIVA_MCP_CREDENZA_URL = https://your-host/authn
DERIVA_MCP_SERVER_URL = https://your-host/mcp
DERIVA_MCP_SERVER_RESOURCE = https://your-host/mcp
DERIVA_MCP_CLIENT_SECRET = your-client-secret
DERIVA_MCP_DISABLE_MUTATING_TOOLS = false

For Docker Compose deployments, see the deriva-docker repository and the Deployment Guide.

Configuration Reference

Core settings

Variable	Default	Description
`DERIVA_MCP_CREDENZA_URL`	(required for HTTP)	Base URL of the Credenza instance
`DERIVA_MCP_SERVER_URL`	(required for HTTP)	Public HTTPS URL of this MCP server
`DERIVA_MCP_SERVER_RESOURCE`	(required for HTTP)	Resource identifier for this server (usually same as SERVER_URL)
`DERIVA_MCP_CLIENT_SECRET`	(required for HTTP)	Client secret for Credenza token exchange
`DERIVA_MCP_CLIENT_ID`	`deriva-mcp`	Client ID registered with Credenza
`DERIVA_MCP_DERIVA_RESOURCE`	`urn:deriva:rest:service:all`	Resource identifier to request in token exchange
`DERIVA_MCP_ALLOW_ANONYMOUS`	`false`	Allow unauthenticated requests (see Anonymous Access)
`DERIVA_MCP_DISABLE_MUTATING_TOOLS`	`true`	When `true`, all tools registered as mutating return an error without executing
`DERIVA_MCP_PLUGIN_ALLOWLIST`	(unset -- allow all)	Comma-separated list of plugin entry point names to load; empty string disables all external plugins
`DERIVA_MCP_MUTATION_REQUIRED_CLAIM`	(unset)	JSON claim spec that must match the token introspection payload before mutations are permitted (e.g. `{"groups": ["deriva-mcp-mutator"]}`)
`DERIVA_MCP_TOKEN_CACHE_BUFFER_SECONDS`	`60`	Re-exchange derived tokens this many seconds before they expire
`DERIVA_MCP_INTROSPECT_CACHE_TTL_SECONDS`	`60`	How long to cache token introspection results
`DERIVA_MCP_AUDIT_LOGFILE_PATH`	`deriva-mcp-audit.log`	Audit log file path (used when syslog is off)
`DERIVA_MCP_AUDIT_USE_SYSLOG`	`false`	Write audit events to syslog (`/dev/log`) instead of a file
`DERIVA_MCP_HOSTNAME_MAP`	`{}`	JSON object mapping external hostnames to internal aliases (e.g. `{"localhost":"deriva"}`)
`DERIVA_MCP_SSL_VERIFY`	`true`	TLS verification for outbound calls: `true`, `false`, or path to a CA bundle
`DERIVA_MCP_DEBUG`	`false`	Enable DEBUG-level logging

RAG settings

Variable	Default	Description
`DERIVA_MCP_RAG_ENABLED`	`false`	Enable the RAG subsystem
`DERIVA_MCP_RAG_VECTOR_BACKEND`	`chroma`	Vector store backend: `chroma` or `pgvector`
`DERIVA_MCP_RAG_CHROMA_DIR`	`~/.deriva-mcp/chroma`	Persistence directory for embedded ChromaDB
`DERIVA_MCP_RAG_CHROMA_URL`	(unset)	ChromaDB server URL (enables server mode instead of embedded)
`DERIVA_MCP_RAG_PG_DSN`	(required for pgvector)	PostgreSQL DSN for pgvector backend
`DERIVA_MCP_RAG_AUTO_UPDATE`	`true`	Crawl and incrementally update documentation sources on startup
`DERIVA_MCP_RAG_DATA_DIR`	`~/.deriva-mcp/rag`	SHA cache and runtime-added sources file

Built-in Tools

Mutating tools (* below) are disabled by default (DERIVA_MCP_DISABLE_MUTATING_TOOLS=true). Set it to false to enable them.

Schema introspection

Tool	Description
`get_catalog_info`	Catalog metadata and list of schemas
`list_schemas`	Schema names and comments
`get_schema`	Tables, columns, keys, and foreign keys for a schema
`get_table`	Full definition for one table
`resolve_snaptime`	Convert a snapshot identifier or timestamp to a canonical snaptime
`get_catalog_history_bounds`	Earliest and latest snapshot timestamps

Entity CRUD

Tool	Description
`get_entities`	Fetch rows with optional filters, cursor pagination (`after_rid`), and preflight count
`insert_entities` *	Insert rows; returns inserted records
`update_entities` *	Sparse update (only columns present in the payload are written)
`delete_entities` *	Delete rows matching filters (filters are required)

Queries

Tool	Description
`query_attribute`	ERMrest attribute query on a caller-supplied path expression
`count_table`	Row count with optional equality filters
`query_aggregate`	ERMrest aggregate query (group-by, custom aggregate expressions)

Hatrac object store

Tool	Description
`list_namespace`	List objects in a Hatrac namespace
`get_object_metadata`	Object metadata (size, checksums, version)
`create_namespace` *	Create a Hatrac namespace

Catalog administration

Tool	Description
`create_catalog` *	Create a new empty ERMrest catalog
`delete_catalog` *	Permanently delete a catalog
`clone_catalog` *	Clone a catalog (synchronous)
`clone_catalog_async` *	Clone a catalog as a background task; returns a `task_id`
`create_catalog_alias` *	Create an ERMrest catalog alias
`update_catalog_alias` *	Update alias target or owner
`delete_catalog_alias` *	Delete an alias (not the underlying catalog)
`cite`	Generate a permanent citation URL for a catalog entity

Annotations

Tool	Description
`get_table_annotations`	All annotations on a table
`get_column_annotations`	All annotations on a column
`list_foreign_keys`	Foreign key definitions for a table
`get_handlebars_template_variables`	Variables available in Handlebars row-name patterns
`get_table_sample_data`	Fetch sample rows for template testing
`preview_handlebars_template`	Render a Handlebars template against provided data
`validate_template_syntax`	Validate Handlebars template syntax
`set_display_annotation` *	Set an arbitrary display annotation tag
`set_table_display_name` *	Set the Chaise display name for a table
`set_row_name_pattern` *	Set the Handlebars row-name template
`set_column_display_name` *	Set the Chaise display name for a column
`set_visible_columns` *	Replace the visible-columns annotation for a context
`add_visible_column` *	Add one column to visible-columns
`remove_visible_column` *	Remove one column from visible-columns
`set_visible_foreign_keys` *	Replace the visible-foreign-keys annotation for a context
`add_visible_foreign_key` *	Add one foreign key to visible-foreign-keys
`remove_visible_foreign_key` *	Remove one foreign key from visible-foreign-keys
`reorder_visible_columns` *	Reorder entries in visible-columns
`reorder_visible_foreign_keys` *	Reorder entries in visible-foreign-keys
`set_table_display` *	Set the full table-level display annotation
`set_column_display` *	Set the full column-level display annotation
`apply_navbar_annotations` *	Set catalog-level Chaise navbar and display annotations

Schema DDL

Tool	Description
`create_table` *	Create a table with columns, keys, and foreign keys
`add_column` *	Add a column to an existing table
`set_table_description` *	Set a table's comment
`set_column_description` *	Set a column's comment
`set_column_nullok` *	Change a column's nullability

Vocabulary

Tool	Description
`list_vocabulary_terms`	All terms in a vocabulary table
`lookup_term`	Find a term by name or synonym
`create_vocabulary` *	Create a vocabulary table with standard columns
`add_term` *	Add a term (ID and URI auto-generated)
`update_term` *	Update term name, description, or synonyms
`update_term_description` *	Update only a term's description
`add_synonym` *	Append a synonym to a term
`remove_synonym` *	Remove a synonym from a term
`delete_term` *	Delete a term

Background tasks

Tool	Description
`get_task_status`	Status, progress, result, or error for a background task
`list_tasks`	All background tasks for the current user, with optional status filter
`cancel_task`	Request cancellation of a running task

RAG Subsystem

The RAG subsystem provides semantic search over DERIVA documentation and catalog schemas. It is disabled by default and requires the rag optional dependency group.

Enable it:

DERIVA_MCP_RAG_ENABLED = true

On startup (when DERIVA_MCP_RAG_AUTO_UPDATE=true), the server incrementally crawls the built-in documentation sources (deriva-py, ermrest, chaise) and indexes any changed files. Catalog schemas are indexed automatically when a tool first accesses a catalog (via the on_catalog_connect lifecycle hook).

Vector backends

ChromaDB (default) -- embedded, zero additional services:

DERIVA_MCP_RAG_VECTOR_BACKEND = chroma
DERIVA_MCP_RAG_CHROMA_DIR = ~/.deriva-mcp/chroma

For multi-instance deployments, point all instances at a shared ChromaDB server:

DERIVA_MCP_RAG_CHROMA_URL = http://chroma:8000

pgvector -- recommended for production multi-instance deployments:

DERIVA_MCP_RAG_VECTOR_BACKEND = pgvector
DERIVA_MCP_RAG_PG_DSN = postgresql://user:pass@db:5432/deriva_mcp

Requires the pgvector PostgreSQL extension. The server creates the required table and index automatically on first startup.

RAG tools

Tool	Description
`rag_search`	Semantic search across documentation and catalog schemas
`rag_status`	Per-source chunk counts, timestamps, and indexed schema hashes
`rag_update_docs`	Incremental documentation update (SHA delta, runs inline)
`rag_update_docs_async`	Same as above, submitted as a background task
`rag_index_schema`	Manually trigger schema reindex for a catalog
`rag_index_table`	Index rows from a specific table into the vector store
`rag_ingest`	Force a full re-crawl of one or all documentation sources
`rag_add_source`	Register a new documentation source at runtime (persisted)
`rag_remove_source`	Remove a runtime-added documentation source

Safety Controls

Mutation kill switch

All tools that write to the DERIVA catalog are registered with mutates=True. When DERIVA_MCP_DISABLE_MUTATING_TOOLS=true (the default), these tools return an error immediately without executing. The server logs a warning on startup when mutations are disabled and an info message when they are enabled, so the active state is always visible in the logs.

RAG tools write to the local vector store, not the DERIVA catalog, and are not affected by this setting.

Plugin allowlist

By default all installed plugins (packages that declare the deriva_mcp.plugins entry point) are loaded. To restrict loading to a known set:

# Load only the deriva-ml plugin
DERIVA_MCP_PLUGIN_ALLOWLIST = deriva-ml

# Disable all external plugins
DERIVA_MCP_PLUGIN_ALLOWLIST =

Blocked plugins are logged at WARNING level so operators can audit what was skipped.

Anonymous access

Set DERIVA_MCP_ALLOW_ANONYMOUS=true to allow requests without an Authorization header. This is useful for deployments serving publicly readable catalogs where requiring users to authenticate would be unnecessary friction.

Behavior:

No token provided: the request proceeds with empty DERIVA credentials, equivalent to what an unauthenticated browser request would receive. Mutations are blocked regardless of the kill-switch setting -- anonymous access is always read-only.
Valid token provided: normal Credenza validation and token exchange; the request runs as that authenticated user with their normal permissions.
Invalid/expired token provided: 401 response. A client that sends a token must send a valid one -- bad tokens are never silently downgraded to anonymous.

Sub-modes:

`DERIVA_MCP_CREDENZA_URL` set?	Mode	Effect
Yes	Mixed	Both authenticated and anonymous requests work
No	Anonymous-only	Credenza fields are not required at startup; any provided token is rejected

Example (anonymous-only, public read-only catalog):

DERIVA_MCP_ALLOW_ANONYMOUS = true
# No DERIVA_MCP_CREDENZA_URL, SERVER_URL, CLIENT_SECRET etc. needed

Example (mixed mode -- auth optional):

DERIVA_MCP_ALLOW_ANONYMOUS = true
DERIVA_MCP_CREDENZA_URL = https://your-host/authn
DERIVA_MCP_SERVER_URL = https://your-host/mcp
DERIVA_MCP_SERVER_RESOURCE = https://your-host/mcp
DERIVA_MCP_CLIENT_ID = deriva-mcp
DERIVA_MCP_CLIENT_SECRET = your-client-secret

Per-user mutation claim gating

When the mutation kill switch is off, all authenticated users can execute mutating tools. To restrict mutations to users with a specific claim in their token introspection payload:

# Require membership in a specific group
DERIVA_MCP_MUTATION_REQUIRED_CLAIM = {"groups": ["deriva-mcp-mutator"]}

# Require a boolean claim
DERIVA_MCP_MUTATION_REQUIRED_CLAIM = {"mcp_can_mutate": true}

List values use OR semantics (any match is sufficient). Multiple keys use AND semantics (all must match). Users who fail the check receive an error response and a mutation_claim_denied audit event. The kill switch takes precedence when both controls are active.

Plugin Framework

External packages can register additional MCP tools, resources, and prompts by declaring a register entry point in the deriva_mcp.plugins group:

# pyproject.toml of the plugin package
[project.entry-points."deriva_mcp.plugins"]
my-plugin = "my_package.mcp_plugin:register"

The register function receives a PluginContext that exposes the same API used by the built-in tool modules:

from deriva_mcp_core import get_catalog, get_hatrac_store
from deriva_mcp_core.context import deriva_call


def register(ctx):
    @ctx.tool(mutates=False)
    async def my_read_tool(hostname: str, catalog_id: str) -> str:
        with deriva_call():
            catalog = get_catalog(hostname, catalog_id)
            # ... ERMrest calls here
        return result

    @ctx.tool(mutates=True)
    async def my_write_tool(hostname: str, catalog_id: str, data: dict) -> str:
        with deriva_call():
            catalog = get_catalog(hostname, catalog_id)
            # ... mutating calls here
        return result

Key rules:

Every ctx.tool() call requires an explicit mutates=True or mutates=False (omitting raises TypeError at startup).
Wrap all ERMrest/Hatrac calls in with deriva_call(): for transparent 401 handling.
Mutating tools (mutates=True) emit audit_event on success and failure.
Use ctx.submit_task(coroutine, name) for long-running operations; returns a task_id immediately.

See docs/plugin-authoring-guide.md for the full guide covering credential access patterns, lifecycle hooks, RAG extension, and testing.

Health Endpoint

The HTTP server exposes a health endpoint at GET /health that returns {"status": "ok"} with no authentication required. Suitable for Docker health probes and load balancer checks.

curl http://localhost:8000/health
# {"status":"ok"}

Development Status

deriva-mcp-core is alpha-quality software. The API and configuration surface are still evolving and breaking changes may occur between releases without advance notice. It has been validated end-to-end against live DERIVA deployments, but has not yet seen broad production use. Use in production environments is at your own risk. Bug reports and contributions are welcome via the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
docs		docs
src/deriva_mcp_core		src/deriva_mcp_core
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

deriva-mcp-core

Contents

Transport Modes

Installation

Quick Start

stdio (local development)

HTTP (production)

Configuration Reference

Core settings

RAG settings

Built-in Tools

Schema introspection

Entity CRUD

Queries

Hatrac object store

Catalog administration

Annotations

Schema DDL

Vocabulary

Background tasks

RAG Subsystem

Vector backends

RAG tools

Safety Controls

Mutation kill switch

Plugin allowlist

Anonymous access

Per-user mutation claim gating

Plugin Framework

Further Reading

Health Endpoint

Development Status

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages