Full-stack async chat middleware that bridges Open WebUI to custom AI agent pipelines — with user management, real-time status updates, persistent conversation tracking, and graceful error recovery.
This system serves as the backend orchestration layer between Open WebUI (an open-source ChatGPT-style interface) and custom AI agent workflows. Instead of hardcoding Open WebUI to a single LLM, this middleware lets me route conversations through any AI pipeline — with full observability, user tracking, and error handling that the stock Open WebUI doesn't provide.
Key capabilities:
- Accepts chat requests from Open WebUI via authenticated webhooks
- Routes conversations to custom AI agents with any model/tool configuration
- Persists every interaction to Supabase (user, message, response, timing, errors)
- Provides real-time status polling so the UI shows processing state
- Extracts chain-of-thought reasoning from model responses
- Handles errors gracefully — surfaces them to the user instead of silently failing
- Supports multi-user environments with per-user session isolation
graph LR
subgraph "Frontend"
A[Open WebUI] -->|POST /chat| B[Webhook - Auth]
end
subgraph "Request Pipeline"
B --> C{System Function?}
C -->|Yes| D[Return Immediately]
C -->|No| E[Check/Register User in Supabase]
E --> F[Prepare Agent Context]
F --> G[Write Status: Processing]
G --> H[AI Agent Pipeline]
H --> I[Extract Chain-of-Thought]
I --> J[Format Response]
J --> K[Write Response + Timing to Supabase]
J --> L[Return to Open WebUI]
end
subgraph "Status Polling"
M[Open WebUI Polls /status] --> N{Check for Errors}
N -->|Error Found| O[Return Error to User]
N -->|No Error| P[Check Latest Status]
P --> Q[Return Status Update]
end
subgraph "Error Recovery"
R[Error Trigger] --> S{Has Execution ID?}
S -->|Yes| T[Write Error to Supabase]
S -->|No| U[Silent Fail - No User Context]
end
| Component | Technology | Purpose |
|---|---|---|
| Chat Interface | Open WebUI | User-facing conversational UI |
| API Gateway | n8n Webhooks (3 endpoints) | Request routing, auth, status polling |
| User Management | Supabase (Postgres) | Registration, session tracking, auth validation |
| Status System | Supabase + Polling | Real-time processing status visible in UI |
| AI Processing | Configurable agent pipeline | LLM routing, tool use, chain-of-thought |
| Conversation Store | Supabase | Full audit trail — messages, responses, timing, errors |
| Error Handling | n8n Error Trigger + Supabase | Captures errors, links to execution, surfaces to user |
Why a separate middleware instead of using Open WebUI's built-in LLM connections? Open WebUI connects directly to OpenAI-compatible APIs, but that limits you to simple request/response. This middleware lets me route through complex agent pipelines with tool calling, multi-model orchestration, and custom business logic — while keeping the clean Open WebUI chat experience.
Why Supabase for state management? The async nature of complex AI agent calls means the webhook can't always return a response synchronously. Supabase acts as the shared state layer — the processing pipeline writes status updates, and the polling endpoint reads them. This decouples the request lifecycle from the processing lifecycle.
Why extract chain-of-thought? Models that support extended thinking (like o-series or Qwen QwQ) return reasoning traces. Extracting and storing these separately lets me debug agent behavior, improve prompts, and optionally show reasoning to users — without cluttering the main response.
Why three separate webhook endpoints? Each serves a distinct purpose in the async flow: (1) receives the chat message and kicks off processing, (2) handles status polling from the UI, (3) handles user registration. Separating them keeps each handler simple and independently testable.
- Hosting: Self-hosted on personal infrastructure (Coolify PaaS on dedicated server)
- Orchestration: n8n workflow engine (59 nodes, 42 connections)
- Database: Supabase (self-hosted Postgres with Row Level Security)
- Status: Active — running in production
Building this taught me the practical challenges of async AI pipelines: you can't assume LLM calls will complete within a webhook timeout, so you need a state management strategy. The polling pattern works but adds complexity — in a future iteration, I'd explore Server-Sent Events or WebSocket connections for real-time updates instead of polling.
The error recovery system was added after I lost visibility into failures. Now every error is captured with its execution context and surfaced to the user, which has been invaluable for debugging agent behavior in production.