Skip to main content

What is OpenCompress?

OpenCompress is a drop-in middleware that sits between your application and any LLM provider. It compresses your prompts before they reach the model, reducing token usage by 30-40% while preserving output quality.

Quick Start

Get running in under 2 minutes. Change two lines of code.

How It Works

Understand the five-layer compression pipeline.

API Reference

Full OpenAI-compatible endpoint documentation.

Pricing

Pay-for-savings model. No savings = no charge.

Why OpenCompress?

Every LLM call you make contains token waste — filler words, redundant context, verbose formatting that models don’t need. OpenCompress removes this waste before the request hits your provider.
1

Same API format

Fully compatible with OpenAI’s Chat Completions API. Works with any SDK.
2

Any model, any provider

GPT-4o, Claude, Gemini, Llama, DeepSeek — we compress for all of them.
3

You keep 80%

We charge 20% of what we save you. If we don’t save you money, you pay nothing extra.
4

Two lines to integrate

Change base_url and api_key. Everything else stays the same.

How much can you save?

Use CaseTypical CompressionMonthly Savings (at $10K spend)
RAG / retrieval-augmented generation40-55% input reduction2,4002,400 - 3,300
Agent tool calls30-45% input reduction1,8001,800 - 2,700
Chat with long context35-50% input reduction2,1002,100 - 3,000
Code generation25-35% input reduction1,5001,500 - 2,100
Savings vary by prompt structure. Prompts with more natural language and repeated patterns compress best. Try it in the Playground with your actual prompts.