Add support for text-to-image by ericstj · Pull Request #6648 · dotnet/extensions

ericstj · 2025-07-23T18:41:23Z

Microsoft Reviewers: Open in CodeFlow

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/DelegatingTextToImageClient.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/ITextToImageClient.cs

src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAITextToImageClient.cs

src/Libraries/Microsoft.Extensions.AI/TextToImage/LoggingTextToImageClient.cs

...s/Microsoft.Extensions.AI/TextToImage/TextToImageClientBuilderTextToImageClientExtensions.cs

test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestJsonSerializerContext.cs

These are all nullable now so that the client can use defaults where appropriate. Remove quality default since it's not consistent across models. Also remove setting ResponseFormat since this is not supported by gpt-image-1.

src/Libraries/Microsoft.Extensions.AI/TextToImage/LoggingTextToImageClient.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageResponse.cs

Copilot

Pull Request Overview

This pull request adds comprehensive support for text-to-image functionality to the Microsoft.Extensions.AI library. It introduces a new ITextToImageClient interface with implementations for OpenAI, complete with dependency injection support, middleware pipeline capabilities, and comprehensive test coverage.

Defines core text-to-image abstractions including ITextToImageClient, TextToImageOptions, and TextToImageResponse
Implements OpenAI text-to-image client integration with support for image generation and editing
Adds middleware pipeline support with logging and configuration options

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 8 comments.

File	Description
`src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/`	Core abstractions and interfaces for text-to-image functionality
`src/Libraries/Microsoft.Extensions.AI/TextToImage/`	Client builder, middleware, and dependency injection extensions
`src/Libraries/Microsoft.Extensions.AI.OpenAI/`	OpenAI-specific implementation of ITextToImageClient
`test/Libraries/Microsoft.Extensions.AI.*Tests/TextToImage/`	Comprehensive test coverage for all text-to-image components

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageResponse.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageOptions.cs

src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAITextToImageClient.cs

OpenAI's image API supports multiple images and this does seem to be common functionality and a better generalization. The client library doesn't expose this yet, but we should account for it. Image models may be capable of things like "Combine the subjects of these images into a single image" or "Create a single image that uses the subject from the first image and background for the second" etc.

…ToImage

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/ITextToImageClient.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageOptions.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageResponse.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/ImageOptions.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/DelegatingImageClient.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/IImageClient.cs

src/Libraries/Microsoft.Extensions.AI.Abstractions/Image/ImageClientExtensions.cs

src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIImageClient.cs

We don't yet have any good public support for streaming to vet this API We can guess at how it might behave for OpenAI, but that doesn't really give enough confidence to build the API around it.

ericstj · 2025-08-12T16:14:59Z

/backport to release/9.8

github-actions · 2025-08-12T16:15:08Z

Started backporting to release/9.8: https://github.com/dotnet/extensions/actions/runs/16914549812

github-actions · 2025-08-12T16:15:23Z

@ericstj backporting to "release/9.8" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Add ITextToImageClient
Applying: Remove URI based edit since it's not available
Applying: Add filename for edit
Applying: Add OpenAI implmentation of ITextToImageClient
.git/rebase-apply/patch:158: trailing whitespace.
    /// <summary>Initializes a new instance of the <see cref="OpenAITextToImageClient"/> class for the specified <see cref="OpenAIClient"/> and model.  
.git/rebase-apply/patch:161: trailing whitespace.
    /// <param name="model">The default model to use for image generation.</param>  
.git/rebase-apply/patch:371: trailing whitespace.
        
.git/rebase-apply/patch:421: trailing whitespace.
        
.git/rebase-apply/patch:430: trailing whitespace.
        
warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.
Using index info to reconstruct a base tree...
M	src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
Falling back to patching base and 3-way merge...
Auto-merging src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
CONFLICT (content): Merge conflict in src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0004 Add OpenAI implmentation of ITextToImageClient
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

* Add ITextToImageClient * Remove URI based edit since it's not available * Add filename for edit * Add OpenAI implmentation of ITextToImageClient * Fix tests * Add tests for TextToImage * Add DeletgatingTextToImageClient and tests * Add integration test and fix some bugs * Add remaining support to MEAI for TextToImage * Make all TextToImageOptions optional These are all nullable now so that the client can use defaults where appropriate. Remove quality default since it's not consistent across models. Also remove setting ResponseFormat since this is not supported by gpt-image-1. * Address feedback * Document some exceptions * Address feedback * Make EditImageAsync plural OpenAI's image API supports multiple images and this does seem to be common functionality and a better generalization. The client library doesn't expose this yet, but we should account for it. Image models may be capable of things like "Combine the subjects of these images into a single image" or "Create a single image that uses the subject from the first image and background for the second" etc. * Address feedback and add/fix tests. * Fix bad merge * Address feedback * Fix test * Use DataContent.Name for filename. * Add extensions for EditImageAsync Extension that accepts a single DataContent and one that accepts a byte[]. I've left out streams and file paths, since these require more opinions about how to load them. I filed dotnet#6683 to address streams. * Fix test * Remove use of `_model` field. * Rename ImageToText to Image * Rename TextToImage directories to Image * Rename files TextToImage -> Image * Add new request and response type * Make GenerateImagesAsync accept ImageRequest * Remove EditImageAsync * Adding GenerateStreamingImagesAsync * Update docs * Rename ImageClient ImageGenerator * Fix up some text-to-image references * Rename Image(Options|Request|Response) * Remove `Images` from `GenerateImagesAsync` * Remove streaming method We don't yet have any good public support for streaming to vet this API We can guess at how it might behave for OpenAI, but that doesn't really give enough confidence to build the API around it. * Address feedback * Provide OpenAI an appropriate filename * Remove Style from ImageGenerationOptions

* Add ITextToImageClient * Remove URI based edit since it's not available * Add filename for edit * Add OpenAI implmentation of ITextToImageClient * Fix tests * Add tests for TextToImage * Add DeletgatingTextToImageClient and tests * Add integration test and fix some bugs * Add remaining support to MEAI for TextToImage * Make all TextToImageOptions optional These are all nullable now so that the client can use defaults where appropriate. Remove quality default since it's not consistent across models. Also remove setting ResponseFormat since this is not supported by gpt-image-1. * Address feedback * Document some exceptions * Address feedback * Make EditImageAsync plural OpenAI's image API supports multiple images and this does seem to be common functionality and a better generalization. The client library doesn't expose this yet, but we should account for it. Image models may be capable of things like "Combine the subjects of these images into a single image" or "Create a single image that uses the subject from the first image and background for the second" etc. * Address feedback and add/fix tests. * Fix bad merge * Address feedback * Fix test * Use DataContent.Name for filename. * Add extensions for EditImageAsync Extension that accepts a single DataContent and one that accepts a byte[]. I've left out streams and file paths, since these require more opinions about how to load them. I filed #6683 to address streams. * Fix test * Remove use of `_model` field. * Rename ImageToText to Image * Rename TextToImage directories to Image * Rename files TextToImage -> Image * Add new request and response type * Make GenerateImagesAsync accept ImageRequest * Remove EditImageAsync * Adding GenerateStreamingImagesAsync * Update docs * Rename ImageClient ImageGenerator * Fix up some text-to-image references * Rename Image(Options|Request|Response) * Remove `Images` from `GenerateImagesAsync` * Remove streaming method We don't yet have any good public support for streaming to vet this API We can guess at how it might behave for OpenAI, but that doesn't really give enough confidence to build the API around it. * Address feedback * Provide OpenAI an appropriate filename * Remove Style from ImageGenerationOptions

ericstj added 9 commits July 18, 2025 10:03

Add ITextToImageClient

ab6a1fe

Remove URI based edit since it's not available

09fc200

Add filename for edit

895c963

Add OpenAI implmentation of ITextToImageClient

5b24228

Fix tests

3c7ec77

Add tests for TextToImage

9408ed4

Add DeletgatingTextToImageClient and tests

b621c24

Add integration test and fix some bugs

136ab90

Add remaining support to MEAI for TextToImage

403d05f

github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Jul 23, 2025

dotnet-policy-service bot assigned ericstj Jul 23, 2025

stephentoub reviewed Jul 23, 2025

View reviewed changes

ericstj added 4 commits July 25, 2025 10:10

Make all TextToImageOptions optional

82cfd03

These are all nullable now so that the client can use defaults where appropriate. Remove quality default since it's not consistent across models. Also remove setting ResponseFormat since this is not supported by gpt-image-1.

Address feedback

49bf1da

Document some exceptions

78a25ea

Address feedback

c23a8c5

ericstj commented Jul 28, 2025

View reviewed changes

src/Libraries/Microsoft.Extensions.AI/TextToImage/LoggingTextToImageClient.cs Outdated Show resolved Hide resolved

src/Libraries/Microsoft.Extensions.AI.Abstractions/TextToImage/TextToImageResponse.cs Show resolved Hide resolved

ericstj marked this pull request as ready for review July 30, 2025 14:46

Copilot AI review requested due to automatic review settings July 30, 2025 14:46

ericstj requested a review from a team as a code owner July 30, 2025 14:46

ericstj marked this pull request as draft July 30, 2025 14:46

Copilot AI reviewed Jul 30, 2025

View reviewed changes

ericstj added 3 commits July 30, 2025 11:22

Address feedback and add/fix tests.

230f02f

Merge branch 'main' of https://github.com/dotnet/extensions into text…

897750a

…ToImage