Skip to content

Custom openai BYOK model does not send image data to LLM when user uploads an image #156

@c21xdx

Description

@c21xdx

Problem

When using a custom BYOK model with provider_type = openai, uploading an image via the chat attachment feature does not result in the image being sent to the LLM. Instead, the local file path (e.g. [/tmp/shelley-screenshots/upload_xxx.png]) is embedded as plain text in the message content.

The LLM receives only the text path and cannot see the image, leading to hallucinated or incorrect responses.

Environment

  • Model: custom BYOK, provider_type = openai
  • Tested model: gpt-5.4-mini via a custom OpenAI-compatible endpoint
  • This is not a Shelley built-in model issue

Evidence

Tested directly with curl against the same endpoint and model — vision works correctly when the image is passed as image_url:

curl -X POST https://my-api/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -d '{
    "model": "gpt-5.4-mini",
    "messages": [{"role": "user", "content": [
      {"type": "text", "text": "describe this image"},
      {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
    ]}]
  }'
# → correctly describes the image

But when Shelley sends the same request after the user uploads an image, the actual request body logged is:

{
  "role": "user",
  "content": "[/tmp/shelley-screenshots/upload_xxx.png] describe this image"
}

The LLM cannot see the image and hallucinates a response.

Expected Behavior

Shelley should encode the uploaded image as base64 and include it as an image_url content part in the request, the same way it already does for anthropic provider type (which correctly includes images as base64 in tool_result content blocks).

Notes

  • anthropic BYOK provider type handles images correctly ✓
  • openai BYOK provider type sends path as plain text ✗
  • We also tested relying on the model to call the read_image tool, but gpt-5.4-mini (a vision-capable model) skipped the tool call and hallucinated a description instead. The anthropic provider type avoids this by embedding the image directly — the same approach should be applied to openai.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions