Ask flow fails with two distinct issues on OpenAI-compatible / Anthropic-compatible providers: hanging SSE streams and invalid tool-call arguments

## Summary

We investigated repeated Ask failures in Sourcebot and captured the real upstream model traffic with a transparent proxy. We found two separate failure modes:

1. Streaming/network failure
   Some model requests return `200 OK` with `text/event-stream`, start streaming, but never terminate correctly. Eventually the client fails with:
   - `TypeError: terminated`
   - `TLSSocket.onHttpSocketClose`
   - `read ETIMEDOUT`

2. Tool-call argument failure
   In OpenAI-compatible mode, the model sometimes emits invalid function arguments JSON for tool calls, which leads to:
   - `invalid function arguments json string`
   - upstream `400 Bad Request`

These are not the same failure.

## Environment

- Sourcebot: current Docker image as of 2026-04-10
- Runtime: Node 24 inside container
- Deployment: local Docker on WSL2
- Provider modes tested:
  - `openai-compatible`
  - `anthropic`

## What we ruled out

We ruled out the following as the primary root cause:

- WSL general outbound connectivity
- Docker general outbound connectivity
- basic long-lived `fetch/undici` SSE streaming in both host and container
- host proxy env leakage into the Sourcebot container

Using the same `sourcebot` container and Node `fetch/undici`, an independent long streaming request to the same model endpoint completed successfully:
- host: `49 chunks`, `23594 bytes`, `38.4s`
- container: `52 chunks`, `25029 bytes`, `43.6s`

This narrowed the failures to Sourcebot's real Ask workflow rather than general WSL/Docker outbound connectivity.

## Failure mode 1: hanging SSE stream

### Symptom

Frontend remains stuck in generating, and backend eventually reports:
- `TypeError: terminated`
- `TLSSocket.onHttpSocketClose`

### Captured behavior

For some Ask requests, upstream responds with:
- `200 OK`
- `content-type: text/event-stream`

But the stream never ends normally. Our proxy later records:
- `message: "terminated"`
- `cause_code: "ETIMEDOUT"`
- `cause_message: "read ETIMEDOUT"`

This happened in both Anthropic-compatible and OpenAI-compatible runs.

### Example evidence

OpenAI-compatible sample:
- request body size about `45 KB`
- stream starts successfully
- then hangs until timeout

Anthropic-compatible sample:
- request to `/anthropic/v1/messages`
- `stream=true`
- `thinking.enabled`
- `tool_choice={"type":"auto"}`
- `tools_count=8`
- stream starts, then never cleanly finishes

See sanitized evidence: https://gist.github.com/leozhengliu-pixel/6252d9b8415ab65dd106d11e2bf59da0

## Failure mode 2: invalid tool-call arguments

### Symptom

Frontend error:

`Failed after 2 attempts with non-retryable error: 'invalid params, invalid function arguments json string ...'`

### Captured behavior

In an OpenAI-compatible request, the assistant produced this tool call:

```json
{
  "id": "call_function_j32rtdm8f446_1",
  "type": "function",
  "function": {
    "name": "read_file",
    "arguments": "\"{\""
  }
}
```

The `arguments` field is not a JSON object string like:

```json
{"path":"...","repo":"..."}
```

It is just a malformed string literal.

The same request also contains the tool error returned into conversation state:

```json
{
  "role": "tool",
  "tool_call_id": "call_function_j32rtdm8f446_1",
  "content": "Invalid input for tool read_file: JSON parsing failed: Text: {.\nError message: Expected property name or '}' in JSON at position 1 (line 1 column 2)"
}
```

That request ultimately receives `400 Bad Request`.

## Why I believe this is relevant to Sourcebot

Even if some instability may be provider-side, Sourcebot is the shared client here, and these failures happen specifically under Sourcebot Ask orchestration:
- tool-rich multi-step interaction
- title-generation subrequests
- conversation accumulation
- streaming consumer path

At minimum, it would help if Sourcebot:
- handled hanging SSE streams more defensively
- surfaced raw upstream request/response diagnostics more clearly
- isolated title-generation failure from main answer flow
- degraded more safely on malformed tool arguments

## Requested guidance

Please help clarify:

1. Is this a known limitation of Ask with OpenAI-compatible / Anthropic-compatible providers?
2. Are there recommended settings to disable or simplify:
   - title generation
   - thinking
   - tool streaming
   - multi-step Ask behavior
3. Is there an existing safeguard for malformed `tool_calls[].function.arguments`?
4. Is there a supported way to capture Sourcebot's exact model payloads without patching Sourcebot?

I can provide more sanitized payloads if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ask flow fails with two distinct issues on OpenAI-compatible / Anthropic-compatible providers: hanging SSE streams and invalid tool-call arguments #1105

Summary

Environment

What we ruled out

Failure mode 1: hanging SSE stream

Symptom

Captured behavior

Example evidence

Failure mode 2: invalid tool-call arguments

Symptom

Captured behavior

Why I believe this is relevant to Sourcebot

Requested guidance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ask flow fails with two distinct issues on OpenAI-compatible / Anthropic-compatible providers: hanging SSE streams and invalid tool-call arguments #1105

Description

Summary

Environment

What we ruled out

Failure mode 1: hanging SSE stream

Symptom

Captured behavior

Example evidence

Failure mode 2: invalid tool-call arguments

Symptom

Captured behavior

Why I believe this is relevant to Sourcebot

Requested guidance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions