Irreduce compresses long prompts by splitting text into token spans, scoring each span for task relevance and global importance, then selecting the highest information-per-token spans under a tight budget while penalizing redundancy. The result keeps about 95% task performance while cutting roughly 90% of tokens, making large context inference far cheaper and more scalable.
- UI: http://127.0.0.1:8000/app
- API docs: http://127.0.0.1:8000/docs
- Health: http://127.0.0.1:8000/health
- Python 3.13+
cd server
uv sync
uv run uvicorn main:app --reloadOpen http://127.0.0.1:8000/app.
cd server
source .venv/bin/activate
python -m uvicorn main:app --reload- Chunks long context into spans and applies guardrails (headings, entities, code/role markers).
- Scores spans with novelty/entity/number boosts plus optional signal scoring.
- Runs greedy facility-location selection under a token budget.
- Optional paraphrase squeeze (heuristic, local LLM, or Groq).
GET /health- livenessPOST /compress- main Irreduce compressionPOST /compress/longbench- LongBench-style compressionPOST /compare- Irreduce vs TokenCo baselinePOST /evaluate- quality vs savings curveGET /examples- demo scenarios
Irreduce runs without keys; extra features unlock with these environment variables:
TOKENC_API_KEYorTTC_API_KEY- TokenCo baseline in/compareTOKENC_MODEL- TokenCo model override for scriptsTOKEN_COMPANY_API_KEY- required for the custom compressor in scriptsGROQ_API_KEY(and optionalGROQ_MODEL) - Groq paraphraseLOCAL_LLM_MODEL(and optionalLOCAL_LLM_RUNTIME=hf) - local signals/paraphraseOPENAI_API_KEY- evaluation scriptsGOOGLE_API_KEYorGEMINI_API_KEY- LongBench vision script
Optional extras for local LLM:
uv add torch transformers accelerate sentencepiececd server
uv run python -m cosmos.quick_evalclient/- static demo UIserver/- FastAPI API, compression engine, eval scripts
- FastAPI
- Pydantic
- Uvicorn (via FastAPI standard)
- uv (Python package manager)
- Token Company tokenc SDK
- Requests
- Google GenAI SDK (Gemini)
- Pillow
- ChromaDB
- python-dotenv
- OpenAI Python SDK
- Groq API (optional)
- Hugging Face Transformers, PyTorch, Accelerate, SentencePiece (optional)
- Google Fonts (Fraunces, Manrope, IBM Plex Mono)