Skip to content

Mindburn-Labs/helm-oss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

191 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
title HELM Open Source Scope

HELM — Fail-Closed Execution Firewall for AI Agents

CI OWASP Agentic Top 10 Conformance TLA+ Verified Post-Quantum Provenance License

Fail-closed execution firewall · TLA+-verified policy pipeline · Post-quantum crypto · Court-admissible evidence packs

Fail-closed AI execution substrate — deterministic runtime with signed canonical receipts and offline-verifiable replay. Every agent tool call passes through a non-bypassable policy boundary (core/pkg/firewall/firewall.go), gets JCS-canonicalized and SHA-256 hashed, and produces an Ed25519-signed receipt in a causal DAG you can export and replay. 7 regulatory framework packages plus 9 signed reference policy bundles (SOC 2, PCI-DSS, ISO 42001, EU AI Act, HIPAA, GDPR, DORA). Maps all 10 OWASP Agentic risks to enforcement sites in code. Single static Go binary, 75µs p99 on the benchmark harness (methodology).

Works with your stack — OpenAI-compatible proxy (any SDK; zero code change) plus native adapters for Anthropic, LangChain/LangGraph, CrewAI, LlamaIndex, Vercel AI, Semantic Kernel, Haystack, Dify, Flowise, Mistral, Gemini, and any MCP server. SDKs in Go, Python, TypeScript, Rust, Java.

HELM is a deterministic proxy that sits between your AI agent and the tools it calls. Every tool invocation passes through a fail-closed policy gate, gets canonicalized (JCS + SHA-256), and produces an Ed25519-signed receipt. The receipts form a causal DAG (ProofGraph) that can be exported and verified offline.

- client = openai.OpenAI()
+ client = openai.OpenAI(base_url="http://localhost:8080/v1")

One line. Every tool call is now governed.


What HELM Does

Stops dangerous agent tool calls. Emits signed receipts.

Core Capability Description
MCP interceptor / proxy mode Governs any MCP-compatible or OpenAI-compatible tool call
Tool-call dispatch guard Fail-closed policy gate — undeclared tools are blocked
Connector contract validation Schema pinning on input and output — drift is a hard error
Signed allow/deny receipts Ed25519-signed, Lamport-ordered, even for denied calls
Replayable local evidence EvidencePack export, offline replay, deterministic .tar
Capability-scoped connector bundles Domain-scoped tool sets with explicit capability manifests

What HELM Enforces By Default

Enforcement Meaning
No raw unrestricted tool execution Every call passes through the policy gate
No implicit connector expansion New tools require explicit declaration
No schema drift tolerance Pinned schemas, fail-closed on mismatch
Deny/defer on unknown fields Extra args in tool calls → DENY
Per-call receipts even for denied calls Every deny has a signed receipt with reason code
Deterministic reason codes DENY_TOOL_NOT_FOUND, BUDGET_EXCEEDED, ERR_CONNECTOR_CONTRACT_DRIFT, etc.

The Problem

Incident Root cause
Agent calls undeclared tool → prod outage Nobody declared which tools the model can call
Tool-call overspend GPT-4 made 500 API calls at $0.03 each in a loop
Schema drift breaks prod silently Tool args changed, model sends old format, silent corruption
"Who approved that?" dispute No audit trail for tool call authorization
Compliance gap "Just trust us" doesn't hold for SOC2 / DORA / GDPR

The Fix

Every tool call is governed, hashed, and signed:

  • Fail-closed policy — undeclared tools are blocked, schema drift is a hard error
  • Cryptographic receipts — Ed25519-signed, Lamport-ordered
  • Budget enforcement — ACID locks, fail-closed on ceiling breach
  • Offline verifiable — export EvidencePack, verify without network access
  • Sub-0.1ms p99 overhead — governed hot-path measured at 75µs p99 (methodology)

Install

# Script (macOS / Linux)
curl -fsSL https://raw.githubusercontent.com/Mindburn-Labs/helm-oss/main/install.sh | bash

# Go
go install github.com/Mindburn-Labs/helm-oss/core/cmd/helm@latest

# Docker
docker run --rm ghcr.io/mindburn-labs/helm-oss:latest --help

Quick Start

# 1. Initialize (SQLite + Ed25519 keypair + default config)
helm onboard --yes

# 2. Run the demo — 5 synthetic tool calls, real receipts
helm demo organization --template starter --provider mock

# 3. Export and verify offline
helm export --evidence ./data/evidence --out evidence.tar
helm verify --bundle evidence.tar

The demo produces ALLOW, DENY, and BUDGET_EXCEEDED verdicts. Each verdict has a signed receipt. The EvidencePack is a deterministic .tar — same inputs produce identical output bytes.

Or govern an existing app:

helm proxy --upstream https://api.openai.com/v1
export OPENAI_BASE_URL=http://localhost:8080/v1
python your_app.py

How It Works

Your App (OpenAI SDK)
       │
       │ base_url = localhost:8080
       ▼
   HELM Proxy ──→ Guardian (policy: allow/deny)
       │                │
       │           PEP Boundary (JCS canonicalize → SHA-256)
       │                │
       ▼                ▼
   Executor ──→ Tool ──→ Receipt (Ed25519 signed)
       │                        │
       ▼                        ▼
  ProofGraph DAG          EvidencePack (.tar)
  (append-only)           (offline verifiable)
       │
       ▼
  Replay Verify
  (air-gapped safe)

Execution Security Model

HELM enforces security through three independent layers:

Layer Property What It Does
A — Surface Containment Design-time Reduces the bounded execution surface — capability manifests, connector allowlists, sandbox profiles
B — Dispatch Enforcement Per-call Fail-closed policy gate — schema PEP, budget locks, contract pinning, signed verdicts
C — Verifiable Receipts Post-execution Cryptographic proof — Ed25519 receipts, ProofGraph DAG, offline replay

Execution Security Model · OWASP MCP Threat Mapping


Verify It Works

# 1. Trigger a deny
curl -s http://localhost:8080/v1/tools/execute \
  -H 'Content-Type: application/json' \
  -d '{"tool":"unknown_tool","args":{"bad_field":true}}' | jq .reason_code
# → "DENY_TOOL_NOT_FOUND"

# 2. View receipt
curl -s http://localhost:8080/api/v1/receipts?limit=1 | jq '.[0].receipt_hash'

# 3. Export + verify offline
helm export --evidence ./data/evidence --out pack.tar
helm verify --bundle pack.tar
# → "verification: PASS"

# 4. Conformance
helm conform --level L2 --json
# → {"profile":"CORE","pass":true,"gates":9}

Quickstart · Verification


OWASP Agentic Top 10 -- 10/10 Covered

Risk ID HELM Control Code Path
Prompt Injection ASI-01 Ensemble threat scanner (multi-scanner voting, 12 rule sets, 7 detection vectors) core/pkg/threatscan/ensemble.go
Tool Poisoning ASI-02 Rug-pull detection + DDIPE doc scanning + egress firewall (fail-closed) core/pkg/mcp/rugpull.go · core/pkg/mcp/docscan.go
Excessive Permission ASI-03 Effect permits (single-use, nonce-verified, time-bound) core/pkg/effects/
Insufficient Validation ASI-04 TLA+-verified 6-gate Guardian pipeline core/pkg/guardian/
Improper Output ASI-05 Output quarantine gate Guardian.EvaluateOutput()
Resource Overborrowing ASI-06 Budget gates with ACID locks + memory integrity protection core/pkg/budget/ · core/pkg/kernel/memory_integrity.go
Cascading Effects ASI-07 Circuit breakers + ProofGraph causal DAG core/pkg/effects/circuitbreaker.go
Data Exposure ASI-08 Egress firewall (deny-all default) + selective disclosure JWT core/pkg/firewall/
Plugin/Tool Insecurity ASI-09 MCP governance interceptor + mTLS + SkillFortify capability verification + schema validation core/pkg/mcp/gateway.go · core/pkg/pack/verify_capabilities.go
Insufficient Monitoring ASI-10 Evidence packs (JCS+SHA-256) + ProofGraph + OTel core/pkg/evidencepack/

Full mapping: OWASP Coverage -- includes NIST AI RMF, SOC 2, and EU AI Act cross-references.


Research-Backed Security

Every HELM security feature is grounded in peer-reviewed research:

Capability Paper What It Does
Path-aware policies arXiv 2603.16586 Evaluates full session history, not just current action
Ensemble defense (100% mitigation) arXiv 2509.14285 Multi-scanner voting catches what single scanners miss
DDIPE scanning arXiv 2604.03081 Blocks supply chain attacks via poisoned documentation
Memory integrity arXiv 2603.20357 SHA-256 hash-protected memory with tamper detection
SkillFortify arXiv 2603.00195 Formal proof that skills can't exceed declared capabilities
Dependency provenance arXiv 2604.08407 Prevents LiteLLM-style supply chain attacks
Hybrid PQ signing ePrint 2025/2025 Ed25519 + ML-DSA-65 on every receipt (quantum-ready)
W3C DID identity arXiv 2511.02841 Standard decentralized identifiers for agents
ZK compliance proofs arXiv 2512.14737 Prove governance without revealing decisions
Federated trust arXiv 2512.02410 Cross-organization reputation scoring

58 papers cited. Full research plan: docs/research/


Works With Your Stack

Framework Integration Path
OpenAI SDK Base URL proxy (zero code change) examples/python_openai_baseurl/
Anthropic SDK adapter sdk/ts/adapters/anthropic/
LangChain / LangGraph Middleware sdk/ts/adapters/langchain/
CrewAI Adapter sdk/ts/adapters/crewai/
LlamaIndex Middleware sdk/ts/adapters/llamaindex/
Vercel AI Adapter sdk/ts/adapters/vercel-ai/
Semantic Kernel Adapter sdk/ts/adapters/semantic-kernel/
Haystack Pipeline component sdk/ts/adapters/haystack/
Dify Plugin sdk/ts/adapters/dify/
Flowise Adapter sdk/ts/adapters/flowise/
Mistral Adapter sdk/ts/adapters/mistral/
Gemini Adapter sdk/ts/adapters/gemini/
Any OpenAI-compatible Change base_url 2 min
Any MCP server MCP interceptor helm mcp pack

HELM vs. Alternatives

Feature HELM Microsoft AGT NeMo Guardrails Guardrails AI OPA/Cedar
Enforcement Kernel (every action needs signed permit) Library (middleware) Prompt layer Pre/post validation Generic policy
Fail-closed Default deny (empty policy = block all) Exception = deny Best-effort Advisory App-level
Crypto Hybrid Ed25519 + ML-DSA-65 (post-quantum, dual-verify) Ed25519 only -- -- --
Agent identity W3C DID + AIP delegation chains -- -- -- --
Audit trail Causal DAG + CRDT sync + Rekor anchoring + ZK compliance proofs Merkle chain -- -- --
Evidence Court-admissible packs (JCS + SHA-256) CloudEvents logs -- -- --
Formal verification TLA+ proofs None None None None
Policy sandbox WASM (wazero, deterministic) YAML rules -- -- Rego/Cedar
Compliance 7 framework packages + 9 signed reference bundles 4 frameworks -- -- --
Latency 75us p99 (benchmarked) 0.1ms (no signing) 100ms+ 50ms+ < 5ms
Distribution Single static binary pip install pip install pip install Binary

Integrations

Python (OpenAI SDK)

import openai

client = openai.OpenAI(base_url="http://localhost:8080/v1")

response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "List files in /tmp"}]
)
# X-Helm-Decision-ID: dec_a1b2c3...
# X-Helm-Verdict: ALLOW

examples/python_openai_baseurl/main.py

TypeScript

const response = await fetch("http://localhost:8080/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gpt-4",
    messages: [{ role: "user", content: "What time is it?" }],
  }),
});
// X-Helm-Decision-ID: dec_d4e5f6...
// X-Helm-Verdict: ALLOW

examples/js_openai_baseurl/main.js

MCP Interceptor

# Discover governed tools
curl -s http://localhost:8080/mcp/v1/capabilities | jq '.tools[].name'

# Execute with governance
curl -s -X POST http://localhost:8080/mcp/v1/execute \
  -H 'Content-Type: application/json' \
  -d '{"method":"file_read","params":{"path":"/tmp/test.txt"}}' | jq .
# → { "result": ..., "receipt_id": "rec_...", "reason_code": "ALLOW" }

examples/mcp_client/main.sh

Agent Runtimes

Runtime Quickstart Time
DeerFlow examples/deerflow/ 5 min
OpenClaw examples/openclaw/ 5 min
Any OpenAI-compatible Change base_url 2 min

MCP Client Install

# Claude Desktop
helm mcp pack --client claude-desktop --out helm.mcpb

# Claude Code
helm mcp install --client claude-code

# VS Code / Cursor / Windsurf
helm mcp print-config --client windsurf

CI Integration

# .github/workflows/ci.yml
jobs:
  helm-check:
    uses: Mindburn-Labs/helm-oss/.github/workflows/boundary-checks.yml@main
    with:
      level: L2

SDKs

HELM works with your existing SDK first. Point any OpenAI-compatible client at the HELM proxy and you have governed tool calling with zero code changes. Native SDKs are there when you want tighter integration.

Insertion Guide — three copy-paste paths to get started.

Generated from api/openapi/helm.openapi.yaml.

Language Package Version Registry Install
TypeScript @mindburn/helm + 7 adapters 0.4.0 npm npm install @mindburn/helm
Python helm-sdk + 6 adapters 0.4.0 PyPI pip install helm-sdk
Go github.com/Mindburn-Labs/helm-oss/sdk/go 0.4.0 In-repo go get github.com/Mindburn-Labs/helm-oss/sdk/[email protected]
Rust helm-sdk 0.4.0 crates.io cargo add helm-sdk
Java com.github.Mindburn-Labs:helm-oss v0.4.1-java JitPack Add JitPack repo + v0.4.1-java dep (see below)
c := helm.New("http://localhost:8080")
res, err := c.ChatCompletions(helm.ChatCompletionRequest{
    Model:    "gpt-4",
    Messages: []helm.ChatMessage{{Role: "user", Content: "List /tmp"}},
})
if apiErr, ok := err.(*helm.HelmApiError); ok {
    fmt.Println("Denied:", apiErr.ReasonCode) // DENY_TOOL_NOT_FOUND
}

Java install (Maven):

<repositories>
  <repository>
    <id>jitpack.io</id>
    <url>https://jitpack.io</url>
  </repository>
</repositories>

<dependency>
  <groupId>com.github.Mindburn-Labs</groupId>
  <artifactId>helm-oss</artifactId>
  <version>v0.4.1-java</version>
</dependency>

Gradle: implementation("com.github.Mindburn-Labs:helm-oss:v0.4.1-java") with maven { url 'https://jitpack.io' } in repositories.

examples/ · SDK docs


What You Get

Capability What It Does Links
6-Gate Guardian Pipeline TLA+-specified policy enforcement (Freeze -> Context -> Identity -> Egress -> Threat -> Delegation), model-checked in CI via apalache.yml. See gate-to-code map. guardian/
3-Layer Policy Composition P0 ceilings -> P1 signed bundles -> P2 per-session overlays (WASM sandbox via wazero) policy/
Causal Proof DAG ProofGraph with CRDT sync, Rekor transparency log anchoring, Lamport ordering proofgraph/
Post-Quantum Crypto Ed25519 + ML-DSA-65 (NIST FIPS 204), HSM support, key rotation, selective disclosure JWT crypto/
Evidence Packs Content-addressed, JCS-canonical, court-admissible proof archives evidencepack/
Threat Scanner 12 rule sets: prompt injection, encoding evasion, social engineering, privilege escalation, data exfiltration threatscan/
MCP Security Rug-pull detection, tool fingerprinting, governance interceptor, typosquatting detection mcp/
Circuit Breakers CLOSED/OPEN/HALF_OPEN state machine per connector + registry effects/circuitbreaker.go
Reversibility Classification Effect types tagged as fully/partially/irreversible with approval gating effects/reversibility.go
SLO Engine Latency/error-rate objectives with error budget tracking and exhaustion alerts slo/
Compliance Frameworks 7 Go packages (GDPR, HIPAA, SOX, SEC, MiCA, DORA, FCA) + 9 signed reference policy bundles (SOC 2, PCI-DSS, ISO 42001, EU AI Act high-risk, HIPAA covered entity, GDPR, customer-ops, procurement, recruiting) + RegWatch monitoring compliance/ · reference_packs/
OpenTelemetry Traces (gate-level spans) + metrics (decision latency, gate denials, effect throughput) guardian/otel.go
CloudEvents SIEM Export ProofGraph nodes serialized as CloudEvents v1.0 for Splunk/Datadog/Elastic cloudevents/
Multi-Agent Runtime (experimental) MAMA lanes-based concurrency scaffolding — see package README for current status; public API unstable experimental/mama/
Agent Lifecycle Virtual employee management (create/suspend/resume/terminate) with budget envelopes workforce/
Fault Attribution Shapley-value causal attribution from ProofGraph for multi-agent failures attribution/
Hybrid PQ Signing Ed25519 + ML-DSA-65 on every receipt (quantum-ready, dual-verify) crypto/hybrid_signer.go
W3C DID Identity Decentralized identifiers for agent identity (standard interop) identity/did/
Memory Integrity SHA-256 hash-protected governed memory with tamper detection kernel/memory_integrity.go
Memory Trust Scoring Temporal decay trust scoring for memory entries kernel/memory_trust.go
Ensemble Threat Scanner Multi-scanner voting with ANY/MAJORITY/UNANIMOUS strategies threatscan/ensemble.go
Constant-Size Evidence Summaries O(1) evidence completeness proof for large evidence packs evidencepack/summary.go
SkillFortify Static analysis proving skills can't exceed declared capabilities pack/verify_capabilities.go
Dependency Provenance Cryptographic verification of pack publisher signatures pack/provenance.go
Cost Attribution Per-agent cost breakdown + pre-execution cost estimation effects/types.go · budget/estimate.go
Policy Suggestion Engine Auto-suggest policy rules from ProofGraph analysis policy/suggest/
Static Policy Verification Detect circular deps, shadowed rules, escalation loops policy/verify/
Federated Trust Scoring Cross-organization reputation blending for MCP servers mcp/trust.go
ZK Compliance Proofs Zero-knowledge proof interfaces for privacy-preserving audit crypto/zkp/
AIP Delegation Verification Agent Identity Protocol for MCP delegation chains mcp/aip.go
Continuous Delegation Time-bound, revocable, scope-narrowing delegation (AITH) identity/continuous_delegation.go
Replay Trace Comparison Compare governance decisions across sessions replay/compare.go
DDIPE Doc Scanning Scans MCP tool docs for 7 executable payload patterns mcp/docscan.go
MCPTox Benchmark Validates HELM blocks all MCPTox attack categories mcp/mcptox_test.go
Policy Linter Static analysis with 10 built-in rules (structure, security, performance) lint/
Conformance Testing L1/L2/L3 crucible suites with property-based fuzzing tests/conformance/
GitHub Action CI/CD governance verification (OWASP scan, security scan, evidence verification) governance-scan/

Not included in OSS: managed federation, pack entitlement, compliance intelligence, Studio, managed control plane. See docs/OSS_SCOPE.md.


Security

  • TCB isolation — 8-package kernel boundary, CI-enforced forbidden imports (TCB Policy)
  • Bounded compute — WASI sandbox with gas/time/memory caps, deterministic traps (UC-005)
  • Schema enforcement — JCS canonicalization + SHA-256 on every tool call, input and output (UC-002)
  • Three-layer model — surface containment + dispatch enforcement + verifiable receipts (Execution Security Model)

SECURITY.md · Threat Model · OWASP MCP Mapping


Build & Test

make test       # 115 packages
make crucible   # 12 use cases + conformance L1/L2
make lint       # go vet

Deploy

docker compose up -d                              # local
docker compose -f docker-compose.demo.yml up -d   # production

deploy/README.md

Project Structure

helm-oss/
├── api/openapi/         # OpenAPI 3.1 spec (source of truth for SDKs)
├── core/                # Go kernel (8-package TCB + executor + ProofGraph)
│   └── cmd/helm/        # CLI: proxy, export, verify, replay, conform
├── packages/
│   └── mindburn-helm-cli/  # @mindburn/helm-cli (npm verifier)
├── sdk/                 # TypeScript, Python, Go, Rust, Java
├── examples/            # Runnable per-language + MCP examples
├── deploy/              # Caddy, compose, deploy guide
├── docs/                # Threat model, security model, conformance
└── Makefile             # build, test, crucible, release-binaries

Documentation

Getting Started

  • Quick Start -- Zero to governed agents in 5 minutes
  • Insertion Guide -- Three copy-paste paths to get started
  • Examples -- 18+ runnable examples in Go, Python, TypeScript, Rust, Java

Architecture & Reference

Research & Specifications

Compliance & Security

Deployment


Contributing

See CONTRIBUTING.md.

License

Apache License 2.0


Built by Mindburn Labs -- applied research for execution security in autonomous systems.