ConvoStore

Lightweight, Redis-backed conversation state service purpose-built for the OpenAI Responses API, vLLM, and custom inference gateways.

Why ConvoStore

The Responses API introduces previous_response_id, yet most inference runtimes (vLLM, llama.cpp, TGI) are still stateless.
Teams repeatedly rebuild message persistence, context stitching, summarisation, and contention control.
Automatic Prefix Caching (APC) delivers major latency gains only when prompts are byte-identical; without strict normalisation the hit rate collapses.

ConvoStore is a thin, standalone service that absorbs those concerns so any gateway can expose stateful conversations without binding logic to the inference layer.

What ConvoStore Delivers

Session Management
- Two-level addressing (conversation_id + response_id).
- Fast lookup by previous_response_id with idempotent writes.
- Atomic append operations to avoid race conditions.
Prompt Normalisation
- Frozen system prompts and few-shot exemplars.
- Consistent delimiters, line endings, and tokenizer settings.
- Returns a prefix_fingerprint so you can track APC hit rates over time.
History Trimming & Summaries
- Preserve the latest K turns verbatim.
- Collapse older context into templated summaries to stay within token budgets.
- Configurable maximum token ceilings per session.
High-Performance Implementation
- Go HTTP service with low-latency concurrency primitives.
- Redis backing with Lua-driven append + compare-and-set to ensure atomicity.
- Sub-millisecond hot path for resolve operations when served from cache.
Observability Tooling
- Metrics: resolve_latency_ms, append_conflicts, session_size_bytes, estimated_input_tokens, apc_fingerprint_changes.
- Ready for Prometheus / Grafana dashboards.
- Structured logs and tracing hooks for incident triage.

Architecture

[Client / Gateway / LiteLLM]
        |
   (HTTP/SSE)
        v
┌───────────────────────────────┐
│         ConvoStore Service    │
│  - /resolve (previous_id→messages)
│  - /append  (append history, idempotent)
│  - /store_response (persist model output)
│  - /trim (summaries / budgeting)
│  - Metrics / Auth / Quota
└──────────────┬────────────────┘
               │
               v
        ┌─────────────┐
        │   Redis     │
        │  (Cluster)  │
        └─────────────┘

Performance Targets

Resolve P50: 2–5 ms / P95 < 10 ms.
Append atomic write: < 3 ms.
APC hit rate: > 80 % with stable templates.
Scale: millions of sessions with Redis Cluster sharding.

Quick Start

# Run the service
docker run -d \
  -e REDIS_URL=redis://host:6379 \
  -p 8080:8080 \
  convostore:latest

# Append a message
curl -X POST http://localhost:8080/v1/sessions/conv123/append \
  -H "Content-Type: application/json" \
  -d '{"role":"user","content":"Hello"}'

# Resolve context from a previous response id
curl "http://localhost:8080/v1/sessions/resolve?previous_response_id=rsp_abc123"

Roadmap

Multi-model context support (system prompts, adapters).
Built-in summariser with pluggable models.
KServe / LiteLLM integration kits.
Helm chart and Kubernetes operator for production rollouts.

Who Benefits

Gateway authors that need pluggable session state (LiteLLM, KServe, bespoke proxies).
Inference platform teams chasing APC gains, lower first-token latency, and predictable quotas.
Applied ML platforms wanting observability around prompt reuse and session growth.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
cmd/convostore		cmd/convostore
deploy		deploy
docs		docs
infra		infra
internal		internal
observability		observability
pkg		pkg
scripts		scripts
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_zh.md		README_zh.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ConvoStore

Why ConvoStore

What ConvoStore Delivers

Architecture

Performance Targets

Quick Start

Roadmap

Who Benefits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ConvoStore

Why ConvoStore

What ConvoStore Delivers

Architecture

Performance Targets

Quick Start

Roadmap

Who Benefits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages