Skip to content

Support image input in the chat completion request#55

Open
Youho99 wants to merge 4 commits intolhenault:mainfrom
Youho99:main
Open

Support image input in the chat completion request#55
Youho99 wants to merge 4 commits intolhenault:mainfrom
Youho99:main

Conversation

@Youho99
Copy link
Copy Markdown

@Youho99 Youho99 commented Jul 10, 2024

Tested with a single image

This pull request responds to issue #54

It allows you to take into account the architecture of the OpenAI API request with an image

Example on the OpenAI documentation:

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-turbo",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What'\''s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

The code has not been prettyfied, so we need to review that

@lhenault
Copy link
Copy Markdown
Owner

Thanks for your work, will happily review this once you think it's ready (and passing the pre-commit check). If you have a working example for VLM / image processing to share, that would be a nice addition to the existing ones.

@Youho99 Youho99 marked this pull request as ready for review July 15, 2024 14:35
@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Jul 15, 2024

Don't use grpcio and grpcio-tools 1.65.0 version (remised version)

I don't know how to modify it in the poetry requirements

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Jul 16, 2024

I just modified the rules regarding the versions of grpcio and grpcio-tools in the toml, and I regenerated the poetry.lock

Since this is my first time doing this, I would like to request special attention on this.

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Jul 16, 2024

I will provide an example of using my feature in a second step (in another PR I think)

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Jul 16, 2024

@lhenault I think you can review this PR (and change the version accordingly) :)

@lhenault
Copy link
Copy Markdown
Owner

Hey @Youho99 !

I tried your changes the other day and encountered a few issues, but probably because of me. Thanks again for your PR and sorry for the delay, it's very much appreciated. 😌

Let me have another look soon (and if you have a working example for image inputs that might speed up things).

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Aug 28, 2024

@lhenault

In the next few days I'll get back to it, and provide an example.

Let me know if you have any problems.

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Jan 9, 2025

@lhenault Hello and happy new year!

After a fews days (lol), i have finally produce an example for the image support.

Well, this one is not in the format of the examples already present in the library. We can do this work later.

Here is the project:
https://github.com/Youho99/phi-3_5-vision-onnx-simpleai

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Mar 23, 2025

@lhenault any update ?

@lhenault
Copy link
Copy Markdown
Owner

Hey sorry I somehow missed this and the previous update. I'll have a look at it soon. Thanks a lot for the submission!

@Youho99
Copy link
Copy Markdown
Author

Youho99 commented Oct 4, 2025

@lhenault
Can you reviex this PR ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants