Kimi Reverse Proxy is a lightweight HTTP reverse proxy that automatically adjusts sampling parameters (temperature, top_p) based on whether a thinking or non-thinking model is being used. It sits between your application and the backend LLM server (e.g., vLLM).
Requirements: Go 1.24.2 or later
go build -o kimi-rp .Configure the proxy using command-line flags or environment variables:
| Flag | Environment Variable | Default | Description |
|---|---|---|---|
-listen |
KIMIRP_LISTEN |
0.0.0.0 |
IP address to listen on |
-port |
KIMIRP_PORT |
9000 |
Port to listen on |
-target |
KIMIRP_TARGET |
http://127.0.0.1:8000 |
Backend target URL |
-loglevel |
KIMIRP_LOGLEVEL |
INFO |
Log level (DEBUG, INFO, WARN, ERROR) |
-thinking-model |
KIMIRP_THINKING_MODEL_NAME |
(required) | Name of the thinking model |
-no-thinking-model |
KIMIRP_NO_THINKING_MODEL_NAME |
(required) | Name of the non-thinking model |
- Client sends a request with a model name in the request body
- Proxy inspects the
modelfield to determine if it's a thinking or non-thinking model - Proxy sets appropriate sampling parameters:
- If thinking model:
temperature=1.0,top_p=0.95,extra_body.thinking=true - If non-thinking model:
temperature=0.6,top_p=0.95,extra_body.thinking=false
- If thinking model:
- Request is forwarded to the backend server
- Response is streamed back to the client
MIT License - see LICENSE file for details.