OpenAI & Ollama compatible API powered by your ChatGPT account.
This is a fork of RayBytes/chatmock. The original Flask + synchronous
requestsstack has been replaced with FastAPI + asynchttpx, a layered architecture (router / service / infra),pydantic-settingsconfiguration, anduvas the build system.
Integration and coverage badges are updated from local runs. Refresh both by running scripts/test.sh with GIST_TOKEN available in your environment or .env.
gptmock runs a local server that proxies requests to the ChatGPT Codex backend, exposing an OpenAI/Ollama compatible API. Use GPT-5, GPT-5-Codex, and other models directly from your ChatGPT Plus/Pro subscription — no API key required.
Migration note:
--reasoning-compatnow defaults tostandard, which emits reasoning viadelta.reasoning_content/message.reasoning_contentinstead of injecting<think>tags intocontent. Set--reasoning-compat think-tags(orGPTMOCK_REASONING_COMPAT=think-tags) to keep the old behavior.
- Python 3.13+
- Paid ChatGPT account (Plus / Pro / Team / Enterprise)
uv(for uvx usage)
The fastest way to run gptmock. No clone, no install — just uvx.
uvx gptmock loginA browser window will open for ChatGPT OAuth. After login, tokens are saved to ~/.config/gptmock/auth.json.
uvx gptmock serveThe server starts at http://127.0.0.1:8000. Use http://127.0.0.1:8000/v1 as your OpenAI base URL.
uvx gptmock infoalias gptmock='uvx gptmock'
gptmock login
gptmock serve --port 9000
gptmock infoNote: To install directly from the GitHub repository instead of PyPI:
uvx --from "git+https://github.com/rapidrabbit76/GPTMock" gptmock login uvx --from "git+https://github.com/rapidrabbit76/GPTMock" gptmock serve
No build required — pull the pre-built image and run.
services:
serve:
image: rapidrabbit76/gptmock:latest
container_name: gptmock
command: ["serve", "--verbose", "--host", "0.0.0.0"]
ports:
- "8000:8000"
- "1455:1455" # OAuth callback port (needed during first-time login)
volumes:
- gptmock-data:/data
environment:
- GPTMOCK_HOME=/data
- GPTMOCK_LOGIN_BIND=0.0.0.0
healthcheck:
test: ["CMD-SHELL", "python -c \"import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/health').status==200 else 1)\""]
interval: 10s
timeout: 5s
retries: 5
start_period: 120s # Allows time for first-time login before health checks begin
volumes:
gptmock-data:Run the container interactively. If no credentials are found, the login flow starts automatically:
docker compose run --rm --service-ports serveA URL will be printed in the terminal:
No credentials found. Starting login flow...
Starting local login server on http://localhost:1455
If your browser did not open, navigate to:
https://auth.openai.com/oauth/authorize?...
If the browser can't reach this machine, paste the full redirect URL here and press Enter:
Two ways to complete login:
- Browser on the same machine — the URL opens automatically and the OAuth callback is caught on port 1455.
- Browser on a different machine — open the URL, complete login, then copy the full redirect URL from the browser address bar (starts with
http://localhost:1455/auth/callback?code=...) and paste it into the terminal.
Once login succeeds, the server starts automatically.
Once credentials are saved in the volume, just run in the background:
docker compose up -d servecurl -s http://localhost:8000/health | jq .All server options below are also available as environment variables. Use the GPTMOCK_* canonical names (see Server Options).
Additional Docker-specific variables:
| Variable | Default | Description |
|---|---|---|
GPTMOCK_HOME |
/data |
Auth file directory — mount a volume here |
GPTMOCK_LOGIN_BIND |
0.0.0.0 |
OAuth callback server bind address |
GPTMOCK_OLLAMA_VERSION |
0.12.10 |
Ollama API compatibility header version |
from openai import OpenAI
client = OpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="anything" # ignored by gptmock
)
resp = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "hello world"}]
)
print(resp.choices[0].message.content)from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://127.0.0.1:8000/v1",
api_key="anything",
model="gpt-5.4",
)
response = llm.invoke("hello world")
print(response.content)curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5.4",
"messages": [{"role": "user", "content": "hello world"}]
}'| Model | Reasoning Efforts | Status |
|---|---|---|
gpt-5 |
minimal / low / medium / high |
|
gpt-5.1 |
low / medium / high |
|
gpt-5.2 |
low / medium / high / xhigh |
✅ Verified upstream |
gpt-5-codex |
low / medium / high |
|
gpt-5.1-codex |
low / medium / high |
|
gpt-5.1-codex-mini |
low / medium / high |
|
gpt-5.1-codex-max |
low / medium / high / xhigh |
|
gpt-5.2-codex |
low / medium / high / xhigh |
|
gpt-5.3-codex |
low / medium / high / xhigh |
✅ Verified upstream |
gpt-5.3-codex-spark |
low / medium / high / xhigh |
✅ Verified upstream |
gpt-5.4 |
low / medium / high / xhigh |
✅ Verified upstream |
gpt-5.4-mini |
low / medium / high / xhigh |
✅ Verified upstream |
gpt-5.4-fast |
low / medium / high / xhigh |
✅ Supported (priority tier alias of gpt-5.4) |
gpt-5.4-mini-fast |
low / medium / high / xhigh |
✅ Supported (priority tier alias of gpt-5.4-mini) |
Fast variants (
*-fast) are synthetic aliases that map to the base model plusservice_tier="priority"in the upstream payload. No separate endpoint or auth is required — the ChatGPT backend accepts them as paid-tier priority requests.
Upstream availability note: model availability can change independently of GPTMock releases. GPTMock may recognize a model ID even when the current ChatGPT Codex backend rejects it for a specific account or subscription. On 2026-04-17, direct probe requests against the current upstream accepted
gpt-5.2,gpt-5.3-codex,gpt-5.3-codex-spark,gpt-5.4, andgpt-5.4-mini, while rejectinggpt-5,gpt-5.1,gpt-5-codex,gpt-5.1-codex,gpt-5.1-codex-mini,gpt-5.1-codex-max, andgpt-5.2-codexwith:The '<model>' model is not supported when using Codex with a ChatGPT account.
None hardcoded in GPTMock at this time. See the upstream availability note above for models that are currently rejected by the ChatGPT Codex backend.
| Method | Path | Description |
|---|---|---|
| POST | /v1/chat/completions |
OpenAI Chat Completions (stream / non-stream) |
| POST | /v1/completions |
OpenAI Text Completions |
| POST | /v1/responses |
OpenAI Responses API (for LangChain codex routing) |
| GET | /v1/models |
List available models |
| GET | /api/version |
Ollama-compatible version info |
| POST | /api/chat |
Ollama-compatible chat |
| POST | /api/show |
Ollama-compatible model details |
| GET | /api/tags |
Ollama model list |
| GET | /health |
Health check |
- Streaming & Non-streaming — real-time SSE and buffered JSON responses
- Structured Output —
response_formatwithjson_schema/json_objectsupport - Tool / Function Calling — including web search with URL citation annotations via
responses_tools - Thinking Summaries —
<think>tags,o3reasoning format, or legacy mode - Responses API —
POST /v1/responsesfor LangChain and other clients that auto-route codex models - Ollama Compatibility — drop-in replacement for Ollama API consumers
- Auto Token Refresh — JWT tokens are refreshed automatically before expiry
gptmock serve [OPTIONS]
Each option can also be set via environment variable. Precedence: CLI flag > GPTMOCK_* env > CHATGPT_LOCAL_* legacy env > default.
| Option | Env var | Default | Description |
|---|---|---|---|
--host |
GPTMOCK_HOST |
127.0.0.1 |
Bind address |
--port |
GPTMOCK_PORT |
8000 |
Bind port |
--verbose |
GPTMOCK_VERBOSE |
off | Log request/response payloads |
--verbose-obfuscation |
GPTMOCK_VERBOSE_OBFUSCATION |
off | Also dump raw SSE/obfuscation events |
--debug-model |
GPTMOCK_DEBUG_MODEL |
— | Force all requests to use this model name |
--reasoning-effort |
GPTMOCK_REASONING_EFFORT |
medium |
minimal / low / medium / high / xhigh |
--reasoning-summary |
GPTMOCK_REASONING_SUMMARY |
auto |
auto / concise / detailed / none |
--reasoning-compat |
GPTMOCK_REASONING_COMPAT |
standard |
How reasoning is exposed: standard / think-tags / o3 / legacy (openai is accepted as an alias for standard, current as an alias for legacy) |
--expose-reasoning-models |
GPTMOCK_EXPOSE_REASONING_MODELS |
off | Show effort variants as separate models in /v1/models |
--enable-web-search |
GPTMOCK_DEFAULT_WEB_SEARCH |
off | Enable web search by default when responses_tools is omitted |
--cors-origins |
GPTMOCK_CORS_ORIGINS |
* |
Comma-separated allowed CORS origins |
Legacy aliases:
CHATGPT_LOCAL_REASONING_EFFORT,CHATGPT_LOCAL_REASONING_SUMMARY,CHATGPT_LOCAL_REASONING_COMPAT,CHATGPT_LOCAL_EXPOSE_REASONING_MODELS,CHATGPT_LOCAL_ENABLE_WEB_SEARCH,CHATGPT_LOCAL_DEBUG_MODELare still accepted as fallbacks.
Use --enable-web-search to enable the web search tool by default for all requests. When enabled, the model decides autonomously whether a query needs a web search. You can also enable web search per-request without the server flag by passing the parameters below.
| Parameter | Values | Description |
|---|---|---|
responses_tools |
[{"type":"web_search"}] |
Enable web search for this request |
responses_tool_choice |
"auto" / "none" |
Let the model decide, or disable |
When web search is active, the model may return annotations containing source URLs. These are included automatically in responses:
Non-streaming (stream: false) — annotations are attached to the message:
{
"choices": [
{
"message": {
"role": "assistant",
"content": "SpaceX launched 29 Starlink satellites...",
"annotations": [
{
"type": "url_citation",
"start_index": 0,
"end_index": 150,
"url": "https://spaceflightnow.com/...",
"title": "SpaceX Falcon 9 launch"
}
]
}
}
]
}Streaming (stream: true) — annotations arrive as a dedicated chunk before the final stop chunk:
data: {"choices": [{"delta": {"annotations": [{"type": "url_citation", "start_index": 0, "end_index": 150, "url": "https://...", "title": "..."}]}, "finish_reason": null}]}
data: {"choices": [{"delta": {}, "finish_reason": "stop"}]}Responses API (POST /v1/responses, non-streaming) — annotations are nested inside the output content:
{
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "SpaceX launched 29 Starlink satellites...",
"annotations": [
{
"type": "url_citation",
"start_index": 0,
"end_index": 150,
"url": "https://spaceflightnow.com/...",
"title": "SpaceX Falcon 9 launch"
}
]
}
]
}
]
}curl http://127.0.0.1:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-5",
"messages": [{"role":"user","content":"Find current METAR rules"}],
"stream": true,
"responses_tools": [{"type": "web_search"}],
"responses_tool_choice": "auto"
}'- Requires an active, paid ChatGPT account.
- Context length may be partially used by internal system instructions.
- For the fastest responses, set
--reasoning-efforttolowand--reasoning-summarytonone. - The context size of this route is larger than what you get in the regular ChatGPT app.
- When the model returns a thinking summary, the default
standardmode emitsreasoning_contentfields without pollutingcontent. Set--reasoning-compat think-tagsto keep<think>tags for older chat apps, or--reasoning-compat legacyfor the older reasoning fields. - This project is not affiliated with OpenAI. Use responsibly and at your own risk.
- Original project: RayBytes/chatmock
- This fork: rapidrabbit76/GPTMock
