Active development · Lean 4 formalization in progress

Cryptographic provenance for LLM inference

You have no proof your LLM provider ran the model they claim. CommitLLM is a cryptographic commit-and-audit protocol that closes that gap: the provider serves normally on GPU and returns a compact receipt. A verifier checks the receipt and opened trace on CPU.

Linear shell: algebraic checks Nonlinear shell: canonical replay Attention: bounded approximate replay Prefix/KV: statistical unless deep audit

GitHub Paper (PDF)

Measured on the kept path

Routine audit (Llama 70B) 1.3 ms/tok
Online tracing overhead ~12–14%
Full audit (1 tok, 70B) ~10 ms
Within 1 quant bucket >99.8%
Verifier CPU only
Provider Normal GPU

The gap

Between fingerprints and zero-knowledge proofs

Two unsatisfying extremes—and a design point between them where real deployments need to live.

Insufficient

Fingerprinting

Statistical heuristics provide evidence but not exact per-response verification. A determined provider can game them.

CommitLLM

Commit-and-audit

Commitment-bound end-to-end. Information-theoretically sound algebraic checks for large linear layers, canonical replay for supported nonlinear components, CPU-only verification.

Impractical

ZK proofs

Strong proof objects, but prover costs remain too high for production LLM serving. Impractical at scale today.

Protocol

Setup once. Commit every response. Verify on challenge.

The verifier holds a secret key derived from public weights. The provider commits during normal inference. Expensive work happens only when challenged.

Phase 0 · Setup

Build the verifier key

From a public checkpoint, the verifier computes a Merkle root over weights, secret Freivalds vectors for eight matrix families (Wq, Wk, Wv, Wo, Wgate, Wup, Wdown, LM_head), and the model configuration needed for canonical replay.

Phase 1 · Commit

Serve normally, return a receipt

The provider runs inference on the normal GPU path with a tracing sidecar that captures retained state. It returns the response plus a compact receipt binding the execution trace, KV state, deployment manifest, prompt, sampling randomness, and token count.

Phase 2 · Audit

Challenge specific positions and layers

The verifier challenges token positions and layers after the commitment. The provider opens the requested region. Routine audit samples prefix state; deep audit opens everything.

Phase 3 · Verify

CPU-only checks

Embedding Merkle proof. Freivalds on shell matmuls. Exact INT8 bridge recomputation. KV provenance. Attention replay against committed post-attention output. Final-token tail from captured residual. LM-head binding. Decode and output policy replay.

Guarantee boundary

What is exact, approximate, and statistical

Commitment-bound end-to-end, with explicit boundaries for each verification class. Not “uniformly exact”—honestly delineated.

Input

Exact

Embedding

Exact

Shell matmuls

Freivalds

INT8 bridges

Exact

Prefix/KV

Statistical*

Attention

Approximate

Final tail

Exact

LM head

Freivalds

Decode

Fail-closed

Exact / canonical replay Algebraic checks Approximate (FP16/BF16) Statistical → exact in deep audit Fail-closed

The attention interior remains approximate because native GPU FP16/BF16 attention is not bit-reproducible across devices or even across runs. CommitLLM constrains it strongly—shell-verified Q, K, and V on both sides, commitment-verified prefix state, independent verifier replay, cross-layer consistency through the residual stream—but does not pretend it is exact. In routine audit mode, prefix/KV provenance is statistical: Merkle binding is exact, sampled positions are shell-verified exactly, but unopened positions are covered probabilistically. Deep audit upgrades this to exact full-prefix verification. The honest claim is not “uniformly exact end-to-end” but a precisely delineated guarantee boundary.

Audit modes

Routine audit stays cheap. Deep audit upgrades coverage.

CommitLLM uses the same receipt in both modes. Routine audit keeps steady-state verification light; deep audit opens the full retained window and upgrades prefix provenance to exact verification.

Routine audit

Low-friction spot checks

Designed for normal operation when you want frequent verification without opening the full trace every time.

Freivalds-based checks on large linear layers
Canonical replay for supported nonlinear subcomputations
Sampled prefix and KV provenance with statistical coverage
Bounded approximate attention replay on CPU

Deep audit

Escalate when the stakes are higher

Use the same commitment, but require a larger opening. This removes the routine-audit statistical gap on the retained prefix window.

Full-prefix and KV openings across the retained audit window
Exact prefix provenance instead of sampled coverage
The same algebraic, replay, and decode checks as routine audit
Higher bandwidth and storage cost, not a different serving path

Operationally: routine audit is the default posture; deep audit is the escalation path when a response is high value, disputed, or randomly selected for full review.

Core mechanism

Verify huge matrix multiplies cheaply

The provider claims z = W @ x for a public weight matrix W. Recomputing the full product is expensive. Freivalds’ algorithm gives a much cheaper check: the verifier precomputes v = r^TW with a secret random vector r, then checks v·x =? r^T·z in the finite field F_p where p = 2³²−5.

If z ≠ Wx, the check fails with probability ≥ 1−1/p. This is information-theoretically sound. Transformers are mostly matrix multiplication; once those multiplies are cheap to audit, the verifier can check model identity without rerunning the full model.

W_q W_k W_v W_o W_gate W_up W_down LM_head

Interactive demo · 3×3 over F_p

Click a button to run Freivalds’ check

Measurements

Performance on the corrected replay path

Measured on Qwen2.5-7B-W8A8 and Llama-3.1-8B-W8A8. Attention mismatch is single-digit and bounded.

Measured today Qwen2.5-7B-W8A8 and Llama-3.1-8B-W8A8

Verifier hardware CPU only

Provider path Normal GPU serving with tracing

Verifier cost · Llama 70B

Routine

1.3 ms

Full

10 ms

Online tracing overhead

Base

baseline

+Trace

+12–14%

Attention corridor · Qwen2.5-7B-W8A8

L∞

frac_eq

>92%

frac≤1

>99.8%

Attention corridor · Llama-3.1-8B-W8A8

L∞

frac_eq

94–96%

frac≤1

>99.9%

Deployment fit

Built to sit beside real serving stacks

CommitLLM is not a replacement inference engine. The provider keeps the normal GPU path and produces request-scoped evidence alongside it.

Supported now

Continuous batching and paged attention

Many user requests can share the same GPU microbatch. CommitLLM still produces per-request receipts and per-request audits.

Supported now

Tensor parallelism and fused kernels

The tracing layer follows the existing execution path instead of replacing production kernels with proof-friendly substitutes.

Supported now

Quantized serving

Quantization metadata is receipt-bound, and the kept path is measured on production-style W8A8 checkpoints.

Not the current story

Cross-request cache reuse and shortcut decoding

Cross-request prefix caching, speculative decoding, and other semantics-changing shortcuts need more protocol work. Unsupported paths should fail closed.

Commitment scope

Four specs, one receipt

CommitLLM binds the entire deployment surface that affects outputs—not just “some model ran.”

Spec	What it binds
input_spec_hash	Tokenizer, chat template, BOS/EOS, truncation, padding, system prompt
model_spec_hash	Checkpoint identity R_W, quantization, LoRA/adapter, RoPE config, RMSNorm ε
decode_spec_hash	Sampler, temperature, top-k/p, penalties, logit bias, grammar, stop rules
output_spec_hash	Detokenization, cleanup, whitespace normalization

Why it matters

Provenance for every deployment

Enterprise procurement

Paying for Llama 70B? Get proof the provider actually served that checkpoint, not a smaller distillation.

Regulated deployments

Banks, hospitals, legal teams—auditable chain from decision to model version, decode policy, and output.

Decentralized compute

Networks like Gensyn, Ritual, or Bittensor cannot rely on “trust the node.” CommitLLM provides the missing layer.

Agent systems

When an agent takes action, which model produced the decision becomes a liability and governance question.

Abstract

Paper summary

Federico Carrone, Diego Kingston, Manuel Puebla, Mauro Toscano
Lambda Class · Centro de Criptografía y Seguridad Digital, UBA

Large language models are increasingly used in settings where integrity matters, but users still lack technical assurance that a provider actually ran the claimed model, decode policy, and output behavior. Fingerprinting and statistical heuristics can provide signals, but not exact per-response verification. Zero-knowledge proof systems provide stronger guarantees, but at prover costs that remain impractical for production LLM serving.

We present CommitLLM, a cryptographic commit-and-audit protocol for open-weight LLM inference. CommitLLM keeps the provider on the normal serving path and keeps verifier work fast and CPU-only. It combines commitment binding, direct audit, and randomized algebraic fingerprints, including Freivalds-style checks for large matrix products, rather than per-response proof generation or full re-execution. Its main costs are retained-state memory over the audit window and audit bandwidth, not per-response proving.

The protocol is commitment-bound end-to-end. Within that binding, large linear layers are verified by verifier-secret, information-theoretically sound algebraic checks, quantization/dequantization boundaries and supported nonlinear subcomputations are checked by canonical re-execution, attention is verified by bounded approximate replay, and routine prefix-state provenance is statistical unless deep audit is used. Unsupported semantics fail closed.

Repository

Code layout

The public project name is CommitLLM. Some internal crate and package paths still use the legacy verilm-* prefix while the rename is being completed.

Component	Path
Core types and traits	crates/verilm-core
Key generation	crates/verilm-keygen
Verifier	crates/verilm-verify
Prover (Rust)	crates/verilm-prover
Python sidecar	sidecar/
Python bindings	crates/verilm-py
Test vectors	crates/verilm-test-vectors
Lean formalization	lean/
Paper	paper/main.pdf