Problem
When using a custom BYOK model with provider_type = openai, uploading an image via the chat attachment feature does not result in the image being sent to the LLM. Instead, the local file path (e.g. [/tmp/shelley-screenshots/upload_xxx.png]) is embedded as plain text in the message content.
The LLM receives only the text path and cannot see the image, leading to hallucinated or incorrect responses.
Environment
- Model: custom BYOK,
provider_type = openai
- Tested model:
gpt-5.4-mini via a custom OpenAI-compatible endpoint
- This is not a Shelley built-in model issue
Evidence
Tested directly with curl against the same endpoint and model — vision works correctly when the image is passed as image_url:
curl -X POST https://my-api/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "gpt-5.4-mini",
"messages": [{"role": "user", "content": [
{"type": "text", "text": "describe this image"},
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
]}]
}'
# → correctly describes the image
But when Shelley sends the same request after the user uploads an image, the actual request body logged is:
{
"role": "user",
"content": "[/tmp/shelley-screenshots/upload_xxx.png] describe this image"
}
The LLM cannot see the image and hallucinates a response.
Expected Behavior
Shelley should encode the uploaded image as base64 and include it as an image_url content part in the request, the same way it already does for anthropic provider type (which correctly includes images as base64 in tool_result content blocks).
Notes
anthropic BYOK provider type handles images correctly ✓
openai BYOK provider type sends path as plain text ✗
- We also tested relying on the model to call the
read_image tool, but gpt-5.4-mini (a vision-capable model) skipped the tool call and hallucinated a description instead. The anthropic provider type avoids this by embedding the image directly — the same approach should be applied to openai.
Problem
When using a custom BYOK model with
provider_type = openai, uploading an image via the chat attachment feature does not result in the image being sent to the LLM. Instead, the local file path (e.g.[/tmp/shelley-screenshots/upload_xxx.png]) is embedded as plain text in the message content.The LLM receives only the text path and cannot see the image, leading to hallucinated or incorrect responses.
Environment
provider_type = openaigpt-5.4-minivia a custom OpenAI-compatible endpointEvidence
Tested directly with curl against the same endpoint and model — vision works correctly when the image is passed as
image_url:But when Shelley sends the same request after the user uploads an image, the actual request body logged is:
{ "role": "user", "content": "[/tmp/shelley-screenshots/upload_xxx.png] describe this image" }The LLM cannot see the image and hallucinates a response.
Expected Behavior
Shelley should encode the uploaded image as base64 and include it as an
image_urlcontent part in the request, the same way it already does foranthropicprovider type (which correctly includes images as base64 intool_resultcontent blocks).Notes
anthropicBYOK provider type handles images correctly ✓openaiBYOK provider type sends path as plain text ✗read_imagetool, butgpt-5.4-mini(a vision-capable model) skipped the tool call and hallucinated a description instead. Theanthropicprovider type avoids this by embedding the image directly — the same approach should be applied toopenai.