Skip to content

rapidrabbit76/GPTMock

Repository files navigation

GPTMock banner

GPTMock

OpenAI & Ollama compatible API powered by your ChatGPT account.

Tests Coverage Python 3.13+ License: MIT

This is a fork of RayBytes/chatmock. The original Flask + synchronous requests stack has been replaced with FastAPI + async httpx, a layered architecture (router / service / infra), pydantic-settings configuration, and uv as the build system.

Integration and coverage badges are updated from local runs. Refresh both by running scripts/test.sh with GIST_TOKEN available in your environment or .env.

gptmock runs a local server that proxies requests to the ChatGPT Codex backend, exposing an OpenAI/Ollama compatible API. Use GPT-5, GPT-5-Codex, and other models directly from your ChatGPT Plus/Pro subscription — no API key required.

Migration note: --reasoning-compat now defaults to standard, which emits reasoning via delta.reasoning_content / message.reasoning_content instead of injecting <think> tags into content. Set --reasoning-compat think-tags (or GPTMOCK_REASONING_COMPAT=think-tags) to keep the old behavior.

Requirements

  • Python 3.13+
  • Paid ChatGPT account (Plus / Pro / Team / Enterprise)
  • uv (for uvx usage)

Quick Start (uvx)

The fastest way to run gptmock. No clone, no install — just uvx.

1. Login

uvx gptmock login

A browser window will open for ChatGPT OAuth. After login, tokens are saved to ~/.config/gptmock/auth.json.

2. Start the server

uvx gptmock serve

The server starts at http://127.0.0.1:8000. Use http://127.0.0.1:8000/v1 as your OpenAI base URL.

3. Verify

uvx gptmock info

Tip: Shell Alias

alias gptmock='uvx gptmock'

gptmock login
gptmock serve --port 9000
gptmock info

Note: To install directly from the GitHub repository instead of PyPI:

uvx --from "git+https://github.com/rapidrabbit76/GPTMock" gptmock login
uvx --from "git+https://github.com/rapidrabbit76/GPTMock" gptmock serve

Quick Start (Docker)

No build required — pull the pre-built image and run.

1. Create docker-compose.yml

services:
  serve:
    image: rapidrabbit76/gptmock:latest
    container_name: gptmock
    command: ["serve", "--verbose", "--host", "0.0.0.0"]
    ports:
      - "8000:8000"
      - "1455:1455"  # OAuth callback port (needed during first-time login)
    volumes:
      - gptmock-data:/data
    environment:
      - GPTMOCK_HOME=/data
      - GPTMOCK_LOGIN_BIND=0.0.0.0
    healthcheck:
      test: ["CMD-SHELL", "python -c \"import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8000/health').status==200 else 1)\""]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 120s  # Allows time for first-time login before health checks begin

volumes:
  gptmock-data:

2. Start (first run — login + serve in one step)

Run the container interactively. If no credentials are found, the login flow starts automatically:

docker compose run --rm --service-ports serve

A URL will be printed in the terminal:

No credentials found. Starting login flow...
Starting local login server on http://localhost:1455
If your browser did not open, navigate to:
  https://auth.openai.com/oauth/authorize?...

If the browser can't reach this machine, paste the full redirect URL here and press Enter:

Two ways to complete login:

  1. Browser on the same machine — the URL opens automatically and the OAuth callback is caught on port 1455.
  2. Browser on a different machine — open the URL, complete login, then copy the full redirect URL from the browser address bar (starts with http://localhost:1455/auth/callback?code=...) and paste it into the terminal.

Once login succeeds, the server starts automatically.

3. Subsequent starts

Once credentials are saved in the volume, just run in the background:

docker compose up -d serve

4. Verify

curl -s http://localhost:8000/health | jq .

Docker Environment Variables

All server options below are also available as environment variables. Use the GPTMOCK_* canonical names (see Server Options).

Additional Docker-specific variables:

Variable Default Description
GPTMOCK_HOME /data Auth file directory — mount a volume here
GPTMOCK_LOGIN_BIND 0.0.0.0 OAuth callback server bind address
GPTMOCK_OLLAMA_VERSION 0.12.10 Ollama API compatibility header version

Usage Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="anything"  # ignored by gptmock
)

resp = client.chat.completions.create(
    model="gpt-5.4",
    messages=[{"role": "user", "content": "hello world"}]
)
print(resp.choices[0].message.content)

Python (LangChain)

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="anything",
    model="gpt-5.4",
)
response = llm.invoke("hello world")
print(response.content)

curl

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5.4",
    "messages": [{"role": "user", "content": "hello world"}]
  }'

Supported Models

Model Reasoning Efforts Status
gpt-5 minimal / low / medium / high ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.1 low / medium / high ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.2 low / medium / high / xhigh ✅ Verified upstream
gpt-5-codex low / medium / high ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.1-codex low / medium / high ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.1-codex-mini low / medium / high ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.1-codex-max low / medium / high / xhigh ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.2-codex low / medium / high / xhigh ⚠️ Recognized by GPTMock, currently rejected upstream for ChatGPT Codex accounts
gpt-5.3-codex low / medium / high / xhigh ✅ Verified upstream
gpt-5.3-codex-spark low / medium / high / xhigh ✅ Verified upstream
gpt-5.4 low / medium / high / xhigh ✅ Verified upstream
gpt-5.4-mini low / medium / high / xhigh ✅ Verified upstream
gpt-5.4-fast low / medium / high / xhigh ✅ Supported (priority tier alias of gpt-5.4)
gpt-5.4-mini-fast low / medium / high / xhigh ✅ Supported (priority tier alias of gpt-5.4-mini)

Fast variants (*-fast) are synthetic aliases that map to the base model plus service_tier="priority" in the upstream payload. No separate endpoint or auth is required — the ChatGPT backend accepts them as paid-tier priority requests.

Upstream availability note: model availability can change independently of GPTMock releases. GPTMock may recognize a model ID even when the current ChatGPT Codex backend rejects it for a specific account or subscription. On 2026-04-17, direct probe requests against the current upstream accepted gpt-5.2, gpt-5.3-codex, gpt-5.3-codex-spark, gpt-5.4, and gpt-5.4-mini, while rejecting gpt-5, gpt-5.1, gpt-5-codex, gpt-5.1-codex, gpt-5.1-codex-mini, gpt-5.1-codex-max, and gpt-5.2-codex with: The '<model>' model is not supported when using Codex with a ChatGPT account.

Deprecated / Unsupported Models

None hardcoded in GPTMock at this time. See the upstream availability note above for models that are currently rejected by the ChatGPT Codex backend.

API Endpoints

Method Path Description
POST /v1/chat/completions OpenAI Chat Completions (stream / non-stream)
POST /v1/completions OpenAI Text Completions
POST /v1/responses OpenAI Responses API (for LangChain codex routing)
GET /v1/models List available models
GET /api/version Ollama-compatible version info
POST /api/chat Ollama-compatible chat
POST /api/show Ollama-compatible model details
GET /api/tags Ollama model list
GET /health Health check

Features

  • Streaming & Non-streaming — real-time SSE and buffered JSON responses
  • Structured Outputresponse_format with json_schema / json_object support
  • Tool / Function Calling — including web search with URL citation annotations via responses_tools
  • Thinking Summaries<think> tags, o3 reasoning format, or legacy mode
  • Responses APIPOST /v1/responses for LangChain and other clients that auto-route codex models
  • Ollama Compatibility — drop-in replacement for Ollama API consumers
  • Auto Token Refresh — JWT tokens are refreshed automatically before expiry

Server Options

gptmock serve [OPTIONS]

Each option can also be set via environment variable. Precedence: CLI flag > GPTMOCK_* env > CHATGPT_LOCAL_* legacy env > default.

Option Env var Default Description
--host GPTMOCK_HOST 127.0.0.1 Bind address
--port GPTMOCK_PORT 8000 Bind port
--verbose GPTMOCK_VERBOSE off Log request/response payloads
--verbose-obfuscation GPTMOCK_VERBOSE_OBFUSCATION off Also dump raw SSE/obfuscation events
--debug-model GPTMOCK_DEBUG_MODEL Force all requests to use this model name
--reasoning-effort GPTMOCK_REASONING_EFFORT medium minimal / low / medium / high / xhigh
--reasoning-summary GPTMOCK_REASONING_SUMMARY auto auto / concise / detailed / none
--reasoning-compat GPTMOCK_REASONING_COMPAT standard How reasoning is exposed: standard / think-tags / o3 / legacy (openai is accepted as an alias for standard, current as an alias for legacy)
--expose-reasoning-models GPTMOCK_EXPOSE_REASONING_MODELS off Show effort variants as separate models in /v1/models
--enable-web-search GPTMOCK_DEFAULT_WEB_SEARCH off Enable web search by default when responses_tools is omitted
--cors-origins GPTMOCK_CORS_ORIGINS * Comma-separated allowed CORS origins

Legacy aliases: CHATGPT_LOCAL_REASONING_EFFORT, CHATGPT_LOCAL_REASONING_SUMMARY, CHATGPT_LOCAL_REASONING_COMPAT, CHATGPT_LOCAL_EXPOSE_REASONING_MODELS, CHATGPT_LOCAL_ENABLE_WEB_SEARCH, CHATGPT_LOCAL_DEBUG_MODEL are still accepted as fallbacks.


Web Search

Use --enable-web-search to enable the web search tool by default for all requests. When enabled, the model decides autonomously whether a query needs a web search. You can also enable web search per-request without the server flag by passing the parameters below.

Request Parameters

Parameter Values Description
responses_tools [{"type":"web_search"}] Enable web search for this request
responses_tool_choice "auto" / "none" Let the model decide, or disable

Annotations (URL Citations)

When web search is active, the model may return annotations containing source URLs. These are included automatically in responses:

Non-streaming (stream: false) — annotations are attached to the message:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "SpaceX launched 29 Starlink satellites...",
        "annotations": [
          {
            "type": "url_citation",
            "start_index": 0,
            "end_index": 150,
            "url": "https://spaceflightnow.com/...",
            "title": "SpaceX Falcon 9 launch"
          }
        ]
      }
    }
  ]
}

Streaming (stream: true) — annotations arrive as a dedicated chunk before the final stop chunk:

data: {"choices": [{"delta": {"annotations": [{"type": "url_citation", "start_index": 0, "end_index": 150, "url": "https://...", "title": "..."}]}, "finish_reason": null}]}
data: {"choices": [{"delta": {}, "finish_reason": "stop"}]}

Responses API (POST /v1/responses, non-streaming) — annotations are nested inside the output content:

{
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "SpaceX launched 29 Starlink satellites...",
          "annotations": [
            {
              "type": "url_citation",
              "start_index": 0,
              "end_index": 150,
              "url": "https://spaceflightnow.com/...",
              "title": "SpaceX Falcon 9 launch"
            }
          ]
        }
      ]
    }
  ]
}

Example Request

curl http://127.0.0.1:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-5",
    "messages": [{"role":"user","content":"Find current METAR rules"}],
    "stream": true,
    "responses_tools": [{"type": "web_search"}],
    "responses_tool_choice": "auto"
  }'

Notes & Limits

  • Requires an active, paid ChatGPT account.
  • Context length may be partially used by internal system instructions.
  • For the fastest responses, set --reasoning-effort to low and --reasoning-summary to none.
  • The context size of this route is larger than what you get in the regular ChatGPT app.
  • When the model returns a thinking summary, the default standard mode emits reasoning_content fields without polluting content. Set --reasoning-compat think-tags to keep <think> tags for older chat apps, or --reasoning-compat legacy for the older reasoning fields.
  • This project is not affiliated with OpenAI. Use responsibly and at your own risk.

Credits

About

Access OpenAI models programmatically through your ChatGPT subscription.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages