Skip to content

Realtime Client Proposal#7285

Merged
tarekgh merged 95 commits intodotnet:mainfrom
tarekgh:RealtimeClientProposal
Mar 11, 2026
Merged

Realtime Client Proposal#7285
tarekgh merged 95 commits intodotnet:mainfrom
tarekgh:RealtimeClientProposal

Conversation

@tarekgh
Copy link
Member

@tarekgh tarekgh commented Feb 11, 2026

Realtime Client Proposal

⚠️ Important Notes

  • This is an experimental proposal. All APIs introduced here are subject to change, and breaking changes should be expected as the design evolves.
  • The OpenAI provider currently uses raw WebSocket/JSON rather than the OpenAI .NET SDK's realtime support. The SDK (v2.8.0) does not yet include the latest Realtime API updates (the relevant PR was recently merged). Once a new SDK version is released, the provider will be refactored to use it, eliminating the manual JSON handling.

Overview

This PR introduces a Realtime Client abstraction layer for Microsoft.Extensions.AI, enabling bidirectional, streaming communication with realtime AI services (e.g., OpenAI's Realtime API). The design follows the same middleware/pipeline patterns established by IChatClient and extends them to realtime sessions over WebSocket connections.

Key changes include:

  • New abstractions (IRealtimeClient, IRealtimeClientSession, DelegatingRealtimeClient) in Microsoft.Extensions.AI.Abstractions
  • Strongly-typed client/server message types for audio streaming, text, transcription, function calls, and error handling
  • Immutable session configuration via RealtimeSessionOptions with init-only properties and IReadOnlyList for collection types
  • Extensible server message typesRealtimeServerMessageType is a readonly struct (following the ChatRole pattern) rather than a fixed enum, allowing providers to define custom message types
  • Client-level middleware pipeline via RealtimeClientBuilder with built-in support for:
    • Logging (LoggingRealtimeClient)
    • OpenTelemetry (OpenTelemetryRealtimeClient) following GenAI semantic conventions
    • Function invocation (FunctionInvokingRealtimeClient) with automatic tool call resolution
  • OpenAI provider implementation (OpenAIRealtimeClient, OpenAIRealtimeClientSession) using WebSocket connections
  • Refactored function invocation — extracted shared logic from FunctionInvokingChatClient into reusable components (FunctionInvocationProcessor, FunctionInvocationHelpers, FunctionInvocationLogger) so both chat and realtime sessions share the same invocation pipeline
  • Unified TranscriptionOptions — shared between Realtime and ISpeechToText APIs

Core API Surface

IRealtimeClient

public interface IRealtimeClient : IDisposable
{
    Task<IRealtimeClientSession> CreateSessionAsync(
        RealtimeSessionOptions? options = null,
        CancellationToken cancellationToken = default);

    object? GetService(Type serviceType, object? serviceKey = null);
}

IRealtimeClient is a factory for sessions — analogous to IChatClient. It is stateless, safe for DI registration as a singleton, and serves as the target for middleware composition via RealtimeClientBuilder.

IRealtimeClientSession

public interface IRealtimeClientSession : IAsyncDisposable
{
    RealtimeSessionOptions? Options { get; }

    Task SendAsync(
        RealtimeClientMessage message,
        CancellationToken cancellationToken = default);

    IAsyncEnumerable<RealtimeServerMessage> GetStreamingResponseAsync(
        CancellationToken cancellationToken = default);

    object? GetService(Type serviceType, object? serviceKey = null);
}

IRealtimeClientSession represents a live, bidirectional connection (e.g., a WebSocket). Client messages are sent via SendAsync at any time during the session. Server messages are consumed by enumerating GetStreamingResponseAsync.


Supported Realtime Messages

Client Messages (sent to the server)

Message Type Description
CreateConversationItemRealtimeClientMessage Creates a conversation item (text, audio, or image) to add to the session context.
InputAudioBufferAppendRealtimeClientMessage Appends a chunk of audio data (PCM) to the server's input audio buffer.
InputAudioBufferCommitRealtimeClientMessage Commits the accumulated audio buffer, signaling the server that the audio input is complete.
CreateResponseRealtimeClientMessage Triggers model inference to generate a response. Properties optionally override session-level configuration for this response only.
SessionUpdateRealtimeClientMessage Updates session-level configuration (e.g., voice, tools, instructions) during an active session.

Server Messages (received from the server)

Message Type Description
OutputTextAudioRealtimeServerMessage Carries incremental or completed text (via Text) and audio (via Audio) output from the model.
InputAudioTranscriptionRealtimeServerMessage Carries transcription results (incremental or completed) for user audio input.
ResponseCreatedRealtimeServerMessage Indicates a response has been created or completed; includes token usage on ResponseDone.
ResponseOutputItemRealtimeServerMessage Represents a new output item (e.g., function call) added during response generation.
ErrorRealtimeServerMessage Carries error details including ErrorMessageId to correlate with the originating client message.

Server Message Types (RealtimeServerMessageType — extensible readonly struct)

Type Description
RawContentOnly Unrecognized/provider-specific event with raw data in RawRepresentation.
OutputTextDelta / OutputTextDone Incremental / final text output.
OutputAudioDelta / OutputAudioDone Incremental / final audio output.
OutputAudioTranscriptionDelta / OutputAudioTranscriptionDone Model-generated transcription of audio output.
InputAudioTranscriptionDelta / InputAudioTranscriptionCompleted / InputAudioTranscriptionFailed Transcription of user audio input.
ResponseCreated / ResponseDone Response lifecycle events.
ResponseOutputItemAdded / ResponseOutputItemDone Output item lifecycle events.
Error Server error event.

Middleware Pipeline

Middleware operates at the client level (not the session level), mirroring the IChatClient / ChatClientBuilder pattern. Each middleware class extends DelegatingRealtimeClient and overrides CreateSessionAsync to wrap inner sessions with middleware behavior.

Built-in Middleware

Middleware Description
FunctionInvokingRealtimeClient Automatically invokes AIFunction instances when function call requests are received from the model. Configurable properties: IncludeDetailedErrors, AllowConcurrentInvocation, MaximumIterationsPerRequest, AdditionalTools, TerminateOnUnknownCalls, FunctionInvoker.
LoggingRealtimeClient Logs all SendAsync and GetStreamingResponseAsync operations. Sensitive data (message contents) logged only at Trace level.
OpenTelemetryRealtimeClient Emits OpenTelemetry traces and metrics following the GenAI semantic conventions. Supports EnableSensitiveData for capturing message content in spans.

Builder Extensions

builder.UseFunctionInvocation(configure: f => { ... })
builder.UseLogging(loggerFactory)
builder.UseOpenTelemetry(configure: otel => { ... })

Design Decisions

  • Builder operates on IRealtimeClient — The RealtimeClientBuilder composes middleware around IRealtimeClient (a factory), not IRealtimeClientSession (a live connection). This mirrors ChatClientBuilder / IChatClient and allows the client to be registered as a DI singleton. Sessions are wrapped in middleware when CreateSessionAsync() is called.
  • Session middleware types are internalFunctionInvokingRealtimeClientSession, LoggingRealtimeClientSession, and OpenTelemetryRealtimeClientSession are implementation details. Users interact with the public client classes (FunctionInvokingRealtimeClient, etc.) for configuration.
  • RealtimeSessionOptions uses init-only properties — The options object exposed via IRealtimeClientSession.Options is immutable after creation. Collection properties (Tools, OutputModalities) use IReadOnlyList. Provider-specific options (e.g., voice activity detection) are passed via RawRepresentationFactory.
  • RealtimeServerMessageType is a readonly struct (not an enum) — Follows the ChatRole extensibility pattern. Providers can define custom message types; unrecognized events use RawContentOnly with data in RawRepresentation.
  • Single SendAsync method — There is one way to send client messages, making middleware interception straightforward. GetStreamingResponseAsync takes no input parameter.
  • Separate client/server message type hierarchies — Provides type safety at the API boundary and mirrors the ChatMessage/ChatResponse pattern.

Usage Examples

1. Creating a Realtime Client and Session

using Microsoft.Extensions.AI;

IRealtimeClient realtimeClient = new OpenAIRealtimeClient(
    apiKey: "your-api-key",
    model: "gpt-realtime");

await using var session = await realtimeClient.CreateSessionAsync();

2. Building a Middleware Pipeline

var builder = new RealtimeClientBuilder(realtimeClient)
    .UseFunctionInvocation(configure: f =>
    {
        f.AdditionalTools = [getWeatherFunction];
        f.MaximumIterationsPerRequest = 10;
    })
    .UseOpenTelemetry(configure: otel => otel.EnableSensitiveData = true)
    .UseLogging();

IRealtimeClient wrappedClient = builder.Build(services);
await using var session = await wrappedClient.CreateSessionAsync();

3. Configuring the Session

await session.SendAsync(new SessionUpdateRealtimeClientMessage(new RealtimeSessionOptions
{
    OutputModalities = ["audio"],
    Instructions = "You are a helpful assistant.",
    Voice = "alloy",
    TranscriptionOptions = new TranscriptionOptions
    {
        ModelId = "whisper-1",
        SpeechLanguage = "en"
    },
    Tools = [getWeatherFunction]
}));

4. Sending and Receiving Messages

var cts = new CancellationTokenSource();

// Start listening for server messages
_ = Task.Run(async () =>
{
    await foreach (var msg in session.GetStreamingResponseAsync(cts.Token))
    {
        switch (msg)
        {
            case OutputTextAudioRealtimeServerMessage audio
                when audio.Type == RealtimeServerMessageType.OutputAudioDelta:
                PlayAudio(audio.Audio);
                break;
            case OutputTextAudioRealtimeServerMessage text
                when text.Type == RealtimeServerMessageType.OutputTextDelta:
                Console.Write(text.Text);
                break;
            case ErrorRealtimeServerMessage error:
                Console.WriteLine($"Error: {error.Error?.Message}");
                break;
        }
    }
});

// Send a text message
var item = new RealtimeConversationItem(
    [new TextContent("What's the weather in Seattle?")],
    role: ChatRole.User);
await session.SendAsync(new CreateConversationItemRealtimeClientMessage(item));
await session.SendAsync(new CreateResponseRealtimeClientMessage());

// Send audio
await session.SendAsync(new InputAudioBufferAppendRealtimeClientMessage(
    audioContent: new DataContent($"data:audio/pcm;base64,{Convert.ToBase64String(pcmBytes)}")));
await session.SendAsync(new InputAudioBufferCommitRealtimeClientMessage());
await session.SendAsync(new CreateResponseRealtimeClientMessage());

5. Ending the Session

cts.Cancel();
await session.DisposeAsync();

Demo Application

A complete application consuming the new realtime interfaces can be found at: RealtimeProposalDemoApp

@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Feb 11, 2026
@tarekgh tarekgh marked this pull request as ready for review February 11, 2026 03:12
@tarekgh tarekgh requested review from a team as code owners February 11, 2026 03:12
Copilot AI review requested due to automatic review settings February 11, 2026 03:12
@tarekgh tarekgh added this to the 11.0 milestone Feb 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an experimental Realtime Client / Session abstraction for Microsoft.Extensions.AI, including middleware-style session pipelines (logging, OpenTelemetry, function invocation) and an initial OpenAI realtime provider, while refactoring function-invocation logic to be shared across chat and realtime flows.

Changes:

  • Add IRealtimeClient / IRealtimeSession abstractions plus realtime message/option types (audio, transcription, response items, errors, etc.).
  • Add RealtimeSessionBuilder pipeline + middleware implementations (LoggingRealtimeSession, OpenTelemetryRealtimeSession, FunctionInvokingRealtimeSession).
  • Refactor shared function invocation into reusable internal components (FunctionInvocationProcessor, helpers, logger), used by both chat and realtime.

Reviewed changes

Copilot reviewed 62 out of 63 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionExtensionsTests.cs Unit tests for IRealtimeSession.GetService<T>() extension behavior.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/RealtimeSessionBuilderTests.cs Unit tests for RealtimeSessionBuilder pipeline behavior and ordering.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/LoggingRealtimeSessionTests.cs Unit tests validating logging middleware behavior across methods and log levels.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/FunctionInvokingRealtimeSessionTests.cs Unit tests for function invocation behavior in realtime streaming.
test/Libraries/Microsoft.Extensions.AI.Tests/Realtime/DelegatingRealtimeSessionTests.cs Unit tests for base delegating session behavior (delegation, disposal, services).
test/Libraries/Microsoft.Extensions.AI.Tests/Microsoft.Extensions.AI.Tests.csproj Includes shared TestRealtimeSession in test compilation.
test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeSessionTests.cs Unit tests for OpenAI realtime session basic behaviors and guardrails.
test/Libraries/Microsoft.Extensions.AI.OpenAI.Tests/OpenAIRealtimeClientTests.cs Unit tests for OpenAI realtime client creation and service exposure.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/TestRealtimeSession.cs Test double for IRealtimeSession with callback hooks.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeSessionOptionsTests.cs Tests for RealtimeSessionOptions and related option types.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeServerMessageTests.cs Tests for server message types and their property roundtrips.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeContentItemTests.cs Tests for RealtimeContentItem construction and mutation.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeClientMessageTests.cs Tests for client message types and their properties.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeAudioFormatTests.cs Tests for RealtimeAudioFormat behavior.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionExtensions.cs Adds GetService<T>() extension for IRealtimeSession.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilderRealtimeSessionExtensions.cs Adds AsBuilder() extension for sessions.
src/Libraries/Microsoft.Extensions.AI/Realtime/RealtimeSessionBuilder.cs Implements session middleware/pipeline builder.
src/Libraries/Microsoft.Extensions.AI/Realtime/OpenTelemetryRealtimeSessionBuilderExtensions.cs Builder extension to add OpenTelemetry middleware to a realtime session.
src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSessionBuilderExtensions.cs Builder extension to add logging middleware to a realtime session.
src/Libraries/Microsoft.Extensions.AI/Realtime/LoggingRealtimeSession.cs Delegating session middleware that logs calls and streaming messages.
src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSessionBuilderExtensions.cs Builder extension to add function invocation middleware.
src/Libraries/Microsoft.Extensions.AI/Realtime/FunctionInvokingRealtimeSession.cs Implements tool/function invocation loop for realtime streaming.
src/Libraries/Microsoft.Extensions.AI/Realtime/AnonymousDelegatingRealtimeSession.cs Anonymous delegate-based middleware for streaming interception.
src/Libraries/Microsoft.Extensions.AI/OpenTelemetryConsts.cs Extends OpenTelemetry constants for realtime and token subcategories.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationStatus.cs Shared internal status enum for invocation outcomes.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationProcessor.cs Shared processor implementing serial/parallel invocation with instrumentation.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationLogger.cs Shared logger messages used by chat and realtime invocation flows.
src/Libraries/Microsoft.Extensions.AI/Common/FunctionInvocationHelpers.cs Shared helpers (activity detection, elapsed time, tool map creation).
src/Libraries/Microsoft.Extensions.AI/ChatCompletion/FunctionInvokingChatClient.cs Refactors chat function invocation to use shared processor/helpers/logger.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClient.cs Adds OpenAI realtime client implementation that creates/initializes sessions.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIClientExtensions.cs Adds AsIRealtimeClient extension for OpenAI client integration.
src/Libraries/Microsoft.Extensions.AI.OpenAI/Microsoft.Extensions.AI.OpenAI.csproj Adds internals visibility for tests and Channels dependency (non-net10).
src/Libraries/Microsoft.Extensions.AI.Evaluation.Reporting/CSharp/Microsoft.Extensions.AI.Evaluation.Reporting.csproj Comment formatting change.
src/Libraries/Microsoft.Extensions.AI.Abstractions/UsageDetails.cs Adds realtime-specific token breakdown fields.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Tools/ToolChoiceMode.cs Adds tool choice mode enum for realtime use.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetection.cs Adds VAD options type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/TranscriptionOptions.cs Adds transcription configuration type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/ServerVoiceActivityDetection.cs Adds server VAD settings.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/SemanticVoiceActivityDetection.cs Adds semantic VAD settings.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs Adds session configuration options (audio formats, tools, tracing, etc.).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionKind.cs Adds session kind enum (realtime vs transcription).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseOutputItemMessage.cs Adds server message for output items.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerResponseCreatedMessage.cs Adds server message for response lifecycle/usage metadata.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerOutputTextAudioMessage.cs Adds server message for output text/audio streaming.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessageType.cs Adds server message type enum.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerMessage.cs Adds base server message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerInputAudioTranscriptionMessage.cs Adds server transcription message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeServerErrorMessage.cs Adds server error message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeContentItem.cs Adds realtime conversation item wrapper.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientResponseCreateMessage.cs Adds client response request message type (modalities/tools/etc.).
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientMessage.cs Adds base client message type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferCommitMessage.cs Adds client message for committing audio input buffer.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientInputAudioBufferAppendMessage.cs Adds client message for appending audio input buffer.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeClientConversationItemCreateMessage.cs Adds client message for creating a conversation item.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeAudioFormat.cs Adds audio format specification type.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/NoiseReductionOptions.cs Adds noise reduction options enum.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeSession.cs Adds realtime session interface.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClient.cs Adds realtime client interface.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/DelegatingRealtimeSession.cs Adds base delegating session implementation.

Copy link
Contributor

@shyamnamboodiripad shyamnamboodiripad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off on behalf of eval (so that the whitespace change in Reporting.csproj does not block merge)

tarekgh and others added 4 commits February 11, 2026 14:13
The extension method on OpenAIClient was not useful because it
completely ignored the OpenAIClient instance - only validating it
for null before creating a new OpenAIRealtimeClient with the
separately provided apiKey and model parameters.

Users can construct OpenAIRealtimeClient directly instead.
- Fix RealtimeSessionExtensions XML doc to reference IRealtimeSession
  instead of IChatClient
- Replace non-standard <ref name> tags with <see cref> in
  RealtimeServerMessageType.cs for proper IntelliSense/doc rendering
- Fix ResponseDone doc summary to say 'completed' instead of 'created'
- Add missing Throw.IfNull(updates) in LoggingRealtimeSession
  .GetStreamingResponseAsync for consistency with other sessions
- Split RealtimeServerMessageType enum: add ResponseOutputItemDone
  and ResponseOutputItemAdded to distinguish per-item events
  (response.output_item.done, conversation.item.done) from
  whole-response events (response.done, response.created)

- Fix function result serialization: use JsonSerializer.Serialize()
  instead of ToString() to properly serialize complex objects

- Fix OTel streaming duration: start stopwatch at method entry
  instead of immediately before recording, so duration histogram
  measures actual streaming time

- URL-encode model name in WebSocket URI for defensive safety

- Fix OTel metadata tag ordering: apply user metadata before
  standard tags so standard OTel attributes take precedence
  if keys collide
@tarekgh tarekgh force-pushed the RealtimeClientProposal branch from 81300f4 to 8ad70f5 Compare February 12, 2026 19:59
@tarekgh tarekgh force-pushed the RealtimeClientProposal branch from 8ad70f5 to fbdc7cb Compare February 12, 2026 20:16
Tarek Mahmoud Sayed added 6 commits February 12, 2026 12:49
- Move TranscriptionOptions from Realtime/ to SpeechToText/ folder
- Change experimental flag from AIRealTime to AISpeechToText
- Make properties nullable with parameterless constructor
- Rename Language to SpeechLanguage, Model to ModelId
- Replace SpeechToTextOptions.ModelId and .SpeechLanguage with Transcription property
- Update all consumers and tests
Tarek Mahmoud Sayed added 15 commits March 10, 2026 12:54
Restore ModelId and SpeechLanguage as direct properties on
SpeechToTextOptions instead of nesting them under a Transcription
property. TranscriptionOptions class remains for RealtimeSessionOptions.

- Restore SpeechToTextOptions.ModelId and .SpeechLanguage properties
- Remove SpeechToTextOptions.Transcription property
- Update all consumers to use direct property access
- Fix SpeechToTextClientMetadata doc reference
- Regenerate CompatibilitySuppressions.xml
- Update all related tests
Apply the [JsonIgnore] public + [JsonInclude] internal Core backing
property pattern to the 4 experimental audio/text token properties
on UsageDetails, matching the convention used elsewhere (e.g.
ChatOptions.AllowBackgroundResponses, ChatResponse.ContinuationToken).

This enables JSON serialization of these properties when using the
library's JsonSerializerOptions while keeping the public API gated
behind [Experimental].

- InputAudioTokenCount -> InputAudioTokenCountCore
- InputTextTokenCount -> InputTextTokenCountCore
- OutputAudioTokenCount -> OutputAudioTokenCountCore
- OutputTextTokenCount -> OutputTextTokenCountCore
- Add tests for property roundtrip, Add summation, and JSON
  serialization/deserialization of the new properties
Replace silent return on cancellation with ThrowIfCancellationRequested,
throw InvalidOperationException when session is not connected, and remove
catch block that swallowed OperationCanceledException, ObjectDisposedException,
and WebSocketException.

Update tests to verify the new exception behavior.
Refactor FunctionInvocationProcessor to accept a Func<string, AITool?>
delegate instead of Dictionary<string, AITool>? to align with main
branch's FindTool pattern and avoid reintroducing the removed toolMap.
… cast

Options.Tools is IReadOnlyList<AITool> which does not implement
IList<AITool>, so the 'as IList<AITool>' cast could silently return
null and ignore tools. Use IEnumerable<AITool> which both IList<T>
and IReadOnlyList<T> implement.
Add known limitation note to FunctionInvokingRealtimeClientSession
XML docs explaining that incoming server messages (including user
interruptions) are buffered during function invocation.
ConversationItemAdded and ConversationItemDone were incorrectly mapped
to ResponseOutputItemAdded/Done, which could cause the function
invoking session to invoke the same function call twice.
Remove gen_ai.request.tool_choice custom attribute that is not part
of the OpenTelemetry GenAI semantic conventions.
Consistent with constructor which already validates via Throw.IfNull.
Document ArgumentNullException on constructors and property setters
that validate via Throw.IfNull/Throw.IfNullOrWhitespace.
HTTP headers are case-insensitive per RFC 7230.
Avoids baking values into consuming assemblies at compile time.
Add GetRequiredService and GetRequiredService<T> to both
IRealtimeClient and IRealtimeClientSession extensions, matching
the pattern from IChatClient and IEmbeddingGenerator.
@tarekgh tarekgh merged commit d84159b into dotnet:main Mar 11, 2026
6 checks passed
jeffhandley pushed a commit to jeffhandley/extensions that referenced this pull request Mar 17, 2026
* Realtime Client Proposal

* Replace [Experimental("MEAI001")] with DiagnosticIds.Experiments.AIRealTime

* Apply suggestions from code review

Co-authored-by: Copilot <[email protected]>

* Remove AsIRealtimeClient extension method

The extension method on OpenAIClient was not useful because it
completely ignored the OpenAIClient instance - only validating it
for null before creating a new OpenAIRealtimeClient with the
separately provided apiKey and model parameters.

Users can construct OpenAIRealtimeClient directly instead.

* Address PR review comments

- Fix RealtimeSessionExtensions XML doc to reference IRealtimeSession
  instead of IChatClient
- Replace non-standard <ref name> tags with <see cref> in
  RealtimeServerMessageType.cs for proper IntelliSense/doc rendering
- Fix ResponseDone doc summary to say 'completed' instead of 'created'
- Add missing Throw.IfNull(updates) in LoggingRealtimeSession
  .GetStreamingResponseAsync for consistency with other sessions

* Fix multiple issues found during code review

- Split RealtimeServerMessageType enum: add ResponseOutputItemDone
  and ResponseOutputItemAdded to distinguish per-item events
  (response.output_item.done, conversation.item.done) from
  whole-response events (response.done, response.created)

- Fix function result serialization: use JsonSerializer.Serialize()
  instead of ToString() to properly serialize complex objects

- Fix OTel streaming duration: start stopwatch at method entry
  instead of immediately before recording, so duration histogram
  measures actual streaming time

- URL-encode model name in WebSocket URI for defensive safety

- Fix OTel metadata tag ordering: apply user metadata before
  standard tags so standard OTel attributes take precedence
  if keys collide

* Remove duplicate internal FunctionInvocationStatus enum

* Remove stale CreateToolsMap calls before the loop

* Remove FunctionInvocationResultInternal; use public FunctionInvocationResult directly

* Add IAsyncDisposable to IRealtimeSession; store and await receive task in OpenAIRealtimeSession

* Revert API baseline JSON changes

* Clarify audio/text token counts are subsets of Input/OutputTokenCount

* Replace ToolChoiceMode enum with ChatToolMode for cross-modality consistency

* Unify TranscriptionOptions across Realtime and SpeechToText

- Move TranscriptionOptions from Realtime/ to SpeechToText/ folder
- Change experimental flag from AIRealTime to AISpeechToText
- Make properties nullable with parameterless constructor
- Rename Language to SpeechLanguage, Model to ModelId
- Replace SpeechToTextOptions.ModelId and .SpeechLanguage with Transcription property
- Update all consumers and tests

* Use HasTopLevelMediaType instead of StartsWith for audio check

* Add API compat suppressions for SpeechToTextOptions breaking changes

* Replace string Eagerness with SemanticEagerness struct (ChatRole pattern)

* Remove InternalsVisibleToTest from Microsoft.Extensions.AI.OpenAI

- Make OpenAIRealtimeSession constructor and ConnectAsync public
- Remove InternalsVisibleToTest from csproj
- Remove OpenAIRealtimeSessionSerializationTests (depended on internal ConnectWithWebSocketAsync)

Co-authored-by: Copilot <[email protected]>

* Add RawRepresentationFactory to RealtimeSessionOptions

Add Func<IRealtimeSession, object?>? RawRepresentationFactory property following
the same pattern used by ChatOptions, EmbeddingGenerationOptions, and other
abstraction options types. Add note in OpenAIRealtimeSession to consume the
factory when switching to the OpenAI SDK.

* Remove OpenAI-specific tracing properties from RealtimeSessionOptions

Remove EnableAutoTracing, TracingGroupId, TracingWorkflowName, and
TracingMetadata from the abstraction layer. These are OpenAI-specific
and should be configured via RawRepresentationFactory when the OpenAI
SDK dependency is added.

* Remove AIFunction/HostedMcpServerTool properties, use ChatToolMode

Remove redundant AIFunction and HostedMcpServerTool properties from
RealtimeSessionOptions and RealtimeClientResponseCreateMessage. Callers
should use ChatToolMode.RequireSpecific(functionName) instead.

Update OpenAI serialization to emit structured tool_choice JSON object
when RequireSpecific is used. Update OpenTelemetry and tests accordingly.

* Improve XML docs for Usage on response and transcription messages

* Remove leftover temp files

* Improve ConversationId XML docs on RealtimeServerResponseCreatedMessage

* Split Text/Audio properties on RealtimeServerOutputTextAudioMessage

Add separate Audio property for Base64-encoded audio data. Text is now
only used for text and transcript content. Update OpenAI parser,
OpenTelemetry session, and tests accordingly.

* Replace RealtimeServerMessageType enum with readonly struct

Follow the ChatRole smart-enum pattern: readonly struct with string
Value, IEquatable, operators, and JsonConverter. Providers can now
define custom message types by constructing new instances.

Update pattern-matching in OpenTelemetryRealtimeSession to use ==
comparisons instead of constant patterns.

* Move Parameter into ErrorContent.Details on error messages

Remove the Parameter property from RealtimeServerErrorMessage and
map error.param to ErrorContent.Details instead. Improve ErrorEventId
XML docs to clarify it correlates to the originating client event.

* Add RawRepresentation property to RealtimeContentItem

Add object? RawRepresentation to hold the original provider data
structure, following the same pattern as other types in the
abstraction layer (e.g., ChatMessage). Updated tests accordingly.

* Rename Metadata to AdditionalProperties for consistency

Rename the Metadata property to AdditionalProperties on both
RealtimeClientResponseCreateMessage and RealtimeServerResponseCreatedMessage
to be consistent with the established pattern used across the AI
abstractions (ChatMessage, ChatOptions, AIContent, etc.). Updated
XML docs, OpenAI provider, OTel session, and tests accordingly.

* Improve MaxOutputTokens XML docs to clarify modality scope

Clarify that MaxOutputTokens is a total budget across all output
modalities (text, audio) and tool calls, not per-modality.

* Improve XML docs on RealtimeClientResponseCreateMessage properties

Clarify that ExcludeFromConversation creates an out-of-band response
whose output is not added to conversation history. Document that
Instructions, Tools, ToolMode, OutputModalities, OutputAudioOptions,
and OutputVoice are per-response overrides of session configuration.

* Improve RealtimeClientResponseCreateMessage class-level XML doc

Clarify that this message triggers model inference and that its
properties are per-response overrides of session configuration.

* Rename EventId to MessageId for terminology consistency

Rename EventId to MessageId on RealtimeClientMessage and
RealtimeServerMessage, and ErrorEventId to ErrorMessageId on
RealtimeServerErrorMessage. The abstraction uses 'message' terminology
throughout (class names, docs, method signatures), so properties
should match. The OpenAI provider maps MessageId to/from the wire
protocol's event_id field.

* Remove blank line between using directives in audio buffer messages

* Rename RealtimeAudioFormat.Type to MediaType for consistency

Rename to match the established MediaType naming convention used
across the abstractions (DataContent, HostedFileContent, UriContent,
ImageGenerationOptions). Updated OpenAI provider and tests.

* Refactor IRealtimeSession: rename InjectClientMessageAsync to SendClientMessageAsync and remove updates parameter from GetStreamingResponseAsync

- Rename InjectClientMessageAsync -> SendClientMessageAsync across all implementations
- Remove IAsyncEnumerable<RealtimeClientMessage> updates parameter from GetStreamingResponseAsync
- Move per-message telemetry from WrapClientMessagesForTelemetryAsync into SendClientMessageAsync override in OpenTelemetryRealtimeSession
- Delete WrapUpdatesWithLoggingAsync from LoggingRealtimeSession
- Delete WrapClientMessagesForTelemetryAsync from OpenTelemetryRealtimeSession
- Update AnonymousDelegatingRealtimeSession delegate signature
- Update RealtimeSessionBuilder.Use overload signature
- Update all tests to use new API

Co-authored-by: Copilot <[email protected]>

* Make RealtimeSessionOptions and related types immutable with init-only properties and IReadOnlyList

* Fix XML doc param order, add null validation to constructors

* OpenAI Realtime Provider using OpenAI SDK

* Fix typo: add missing space in comment ('IDsince' -> 'ID since')

* Make CreateSessionAsync return non-nullable IRealtimeSession

- Change IRealtimeClient.CreateSessionAsync return type from Task<IRealtimeSession?> to Task<IRealtimeSession>
- Remove exception-swallowing catch blocks in OpenAIRealtimeClient.CreateSessionAsync and OpenAIRealtimeSession.ConnectAsync; let exceptions propagate to callers
- Change ConnectAsync from Task<bool> to Task
- Remove unused System.IO and System.Net.WebSockets usings
- Update tests to expect OperationCanceledException instead of null/false returns

* Remove SemanticEagerness from Abstractions; move eagerness to AdditionalProperties

- Delete SemanticEagerness.cs (OpenAI-specific concept)
- Remove Eagerness property from SemanticVoiceActivityDetection
- Add AdditionalProperties to base VoiceActivityDetection for provider-specific settings
- Update OpenAI provider to read/write eagerness via AdditionalProperties["eagerness"]

* Include CompatibilitySuppressions for SpeechToTextOptions API changes

* Remove MCP-specific types from abstractions; add provider contract docs

* Remove trailing blank line in RealtimeServerErrorMessage

* Clarify ErrorMessageId XML doc to distinguish from base MessageId

* Rename ErrorMessageId to OriginatingMessageId for clarity

* Change ExcludeFromConversation from bool to bool? for provider-default consistency

* Remove blank lines between Experimental attribute and class declaration

* Add null validation to Content property setter

* Make RealtimeAudioFormat.SampleRate non-nullable and remove misleading doc

* Rename IRealtimeSession to IRealtimeClientSession and SendClientMessageAsync to SendAsync

* Fix null DefaultConversationConfiguration when ExcludeFromConversation is unset

* Remove IDisposable from IRealtimeClientSession, keep only IAsyncDisposable

* Remove ConversationId from RealtimeServerResponseCreatedMessage

* Rename ResponseCreateMessage and ConversationItemCreateMessage to CreateResponseMessage and CreateConversationItemMessage

* Rename RealtimeContentItem to RealtimeConversationItem

* Remove VoiceSpeed from RealtimeSessionOptions and use RawRepresentationFactory seed pattern

* Remove UpdateAsync from IRealtimeClientSession, use message-based approach

Replace UpdateAsync method on IRealtimeClientSession with a new
RealtimeClientSessionUpdateMessage type sent via SendAsync. This avoids
requiring all providers to implement session updates (e.g. Gemini does
not support mid-session updates).

- Add RealtimeClientSessionUpdateMessage with Options property
- Remove UpdateAsync from IRealtimeClientSession and DelegatingRealtimeSession
- Move update logic to OpenAIRealtimeSession.SendAsync handler
- Remove UpdateAsync overrides from Logging and OpenTelemetry sessions
- Update all tests to use SendAsync with the new message type

* Rename realtime session classes to include Client in name

Align implementation class names with IRealtimeClientSession interface
naming. All session-related classes now include 'Client':

- DelegatingRealtimeSession -> DelegatingRealtimeClientSession
- OpenAIRealtimeSession -> OpenAIRealtimeClientSession
- LoggingRealtimeSession -> LoggingRealtimeClientSession
- OpenTelemetryRealtimeSession -> OpenTelemetryRealtimeClientSession
- FunctionInvokingRealtimeSession -> FunctionInvokingRealtimeClientSession
- RealtimeSessionBuilder -> RealtimeClientSessionBuilder
- RealtimeSessionExtensions -> RealtimeClientSessionExtensions
- TestRealtimeSession -> TestRealtimeClientSession
- All related builder extension and test classes updated

* Simplify DisposeAsyncCore to return ValueTask directly

Remove unnecessary async/await in DelegatingRealtimeClientSession.DisposeAsyncCore,
returning InnerSession.DisposeAsync() directly instead.

* Convert RealtimeSessionKind from enum to extensible string struct

Convert RealtimeSessionKind from a closed enum to a readonly struct
following the same pattern as RealtimeServerMessageType and
ChatFinishReason. This allows providers to define custom session kinds.

Rename Realtime -> Conversation for clarity, avoiding the redundant
RealtimeSessionKind.Realtime naming.

* Remove NoiseReductionOptions from abstraction

NoiseReductionOptions is OpenAI-specific and not supported by other
providers. Remove it from the abstraction layer; users who need noise
reduction can use provider-specific options.

* Remove VoiceActivityDetection hierarchy from abstraction

VoiceActivityDetection, ServerVoiceActivityDetection, and
SemanticVoiceActivityDetection are too OpenAI-specific. Remove them
from the abstraction; users who need VAD can configure it through
provider-specific options via RawRepresentationFactory.

* Remove AnonymousDelegatingRealtimeClientSession and anonymous Use overload

Remove the anonymous Use overload that only intercepts
GetStreamingResponseAsync without the ability to intercept SendAsync,
making it too limited to be useful in practice.

* Fix JSON injection in MessageId by using JsonSerializer.Serialize

* Remove meaningless duration metric from SendAsync in OTel session

The stopwatch was started and recorded within the same block without
wrapping the actual SendAsync call, measuring nothing useful. Duration
metrics remain in GetStreamingResponseAsync where they measure actual
streaming time.

* Refactor builder to operate on IRealtimeClient instead of IRealtimeClientSession

- Create DelegatingRealtimeClient base class (replaces DelegatingRealtimeClientSession)
- Create RealtimeClientBuilder (replaces RealtimeClientSessionBuilder)
- Create FunctionInvokingRealtimeClient, LoggingRealtimeClient, OpenTelemetryRealtimeClient
- Make session middleware types (FunctionInvokingRealtimeClientSession, etc.) internal
- Rename builder extension classes to remove 'Session' from names
- Add RealtimeClientExtensions and RealtimeClientBuilderRealtimeClientExtensions
- Use DiagnosticIds.Experiments.AIRealTime for Experimental attributes
- Update all tests to use public client APIs only

* Rename message classes: move RealtimeClient/RealtimeServer before Message

Rename pattern: RealtimeClient{Name}Message -> {Name}RealtimeClientMessage
Rename pattern: RealtimeServer{Name}Message -> {Name}RealtimeServerMessage

Base classes RealtimeClientMessage and RealtimeServerMessage unchanged.

* Fix cancellation tests to also accept WebSocketException on net462

* Remove PreviousId from CreateConversationItemRealtimeClientMessage as OpenAI-specific

* Document that ResponseId may be null and response lifecycle events may be synthesized

* Document that CreateResponseRealtimeClientMessage may be a no-op for VAD-driven providers

* Add RealtimeResponseStatus constants and document interruption/barge-in via Status property

* Remove session parameter from RealtimeSessionOptions.RawRepresentationFactory

* Fix AOT compatibility errors in OpenAIRealtimeClientSession by using source-generated JSON serialization

* Fix SpeechToTextOptionsTests for unified TranscriptionOptions after upstream merge

* Revert SpeechToTextOptions changes per review feedback

Restore ModelId and SpeechLanguage as direct properties on
SpeechToTextOptions instead of nesting them under a Transcription
property. TranscriptionOptions class remains for RealtimeSessionOptions.

- Restore SpeechToTextOptions.ModelId and .SpeechLanguage properties
- Remove SpeechToTextOptions.Transcription property
- Update all consumers to use direct property access
- Fix SpeechToTextClientMetadata doc reference
- Regenerate CompatibilitySuppressions.xml
- Update all related tests

* Use JsonInclude/Core pattern for experimental UsageDetails properties

Apply the [JsonIgnore] public + [JsonInclude] internal Core backing
property pattern to the 4 experimental audio/text token properties
on UsageDetails, matching the convention used elsewhere (e.g.
ChatOptions.AllowBackgroundResponses, ChatResponse.ContinuationToken).

This enables JSON serialization of these properties when using the
library's JsonSerializerOptions while keeping the public API gated
behind [Experimental].

- InputAudioTokenCount -> InputAudioTokenCountCore
- InputTextTokenCount -> InputTextTokenCountCore
- OutputAudioTokenCount -> OutputAudioTokenCountCore
- OutputTextTokenCount -> OutputTextTokenCountCore
- Add tests for property roundtrip, Add summation, and JSON
  serialization/deserialization of the new properties

* Fix SendAsync to propagate exceptions instead of swallowing them

Replace silent return on cancellation with ThrowIfCancellationRequested,
throw InvalidOperationException when session is not connected, and remove
catch block that swallowed OperationCanceledException, ObjectDisposedException,
and WebSocketException.

Update tests to verify the new exception behavior.

* Replace toolMap dictionary with FindTool delegate pattern

Refactor FunctionInvocationProcessor to accept a Func<string, AITool?>
delegate instead of Dictionary<string, AITool>? to align with main
branch's FindTool pattern and avoid reintroducing the removed toolMap.

* Use IEnumerable<AITool> in FindTool/HasAnyTools to avoid unsafe IList cast

Options.Tools is IReadOnlyList<AITool> which does not implement
IList<AITool>, so the 'as IList<AITool>' cast could silently return
null and ignore tools. Use IEnumerable<AITool> which both IList<T>
and IReadOnlyList<T> implement.

* Document that function invocation blocks message processing loop

Add known limitation note to FunctionInvokingRealtimeClientSession
XML docs explaining that incoming server messages (including user
interruptions) are buffered during function invocation.

* Add distinct ConversationItem message types to prevent double invocation

ConversationItemAdded and ConversationItemDone were incorrectly mapped
to ResponseOutputItemAdded/Done, which could cause the function
invoking session to invoke the same function call twice.

* Remove custom ToolChoice telemetry attribute

Remove gen_ai.request.tool_choice custom attribute that is not part
of the OpenTelemetry GenAI semantic conventions.

* Add null validation to Item property setter

Consistent with constructor which already validates via Throw.IfNull.

* Add <exception> XML doc tags to realtime types

Document ArgumentNullException on constructors and property setters
that validate via Throw.IfNull/Throw.IfNullOrWhitespace.

* Use OrdinalIgnoreCase comparer for MCP headers dictionary

HTTP headers are case-insensitive per RFC 7230.

* Change RealtimeResponseStatus from const fields to static properties

Avoids baking values into consuming assemblies at compile time.

* Add GetRequiredService extension methods for realtime types

Add GetRequiredService and GetRequiredService<T> to both
IRealtimeClient and IRealtimeClientSession extensions, matching
the pattern from IChatClient and IEmbeddingGenerator.

* Remove unused System.Threading.Channels package reference

* Update OTel semantic conventions version reference from v1.39 to v1.40

---------

Co-authored-by: Tarek Mahmoud Sayed <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants