Compress LLM prompts to reduce costs and latency. 100K tokens compressed in ~85ms.
pip install tokencfrom tokenc import TokenClient
client = TokenClient(api_key="your-api-key")
result = client.compress_input(
input="Your long prompt here...",
model="bear-1.2", # or "bear-1.1", "bear-1", etc.
aggressiveness=0.5 # 0.1 = light, 0.5 = balanced, 0.9 = aggressive
)
print(result.output) # compressed text
print(result.tokens_saved) # tokens removed
print(result.compression_ratio) # e.g. 1.8xWrap text in <ttc_safe> tags to exclude it from compression:
result = client.compress_input(
input="Compress this but <ttc_safe>keep this exactly as is</ttc_safe>.",
model="bear-1.2",
aggressiveness=0.7
)with TokenClient(api_key="your-api-key") as client:
result = client.compress_input(input="Your text...", model="bear-1.2", aggressiveness=0.5)Requests are gzip-compressed and use HTTP keep-alive automatically.
| Input Size | E2E Latency | Throughput |
|---|---|---|
| 10K tokens | 38ms | 198K tok/s |
| 100K tokens | 85ms | 975K tok/s |
| 1M tokens | 542ms | 1.5M tok/s |
from tokenc import TokenClient, AuthenticationError, RateLimitError, APIError
try:
result = client.compress_input(input="Your text...", model="bear-1.2")
except AuthenticationError:
print("Invalid API key")
except RateLimitError:
print("Rate limit exceeded")
except APIError as e:
print(f"API error: {e}")- Get API keys: https://thetokencompany.com
- Issues: https://github.com/TheTokenCompany/tokenc-python-sdk/issues