Add VoiceActivityDetection options to realtime session abstractions#7399
Merged
tarekgh merged 3 commits intodotnet:mainfrom Mar 17, 2026
Merged
Add VoiceActivityDetection options to realtime session abstractions#7399tarekgh merged 3 commits intodotnet:mainfrom
tarekgh merged 3 commits intodotnet:mainfrom
Conversation
Introduce VoiceActivityDetectionOptions with Enabled and AllowInterruption properties to RealtimeSessionOptions. These represent the common VAD options supported across multiple realtime AI models (OpenAI, Gemini, Anthropic Claude, AWS Nova Sonic). - Add VoiceActivityDetectionOptions class with Enabled (default true) and AllowInterruption (default true) properties - Add VoiceActivityDetection property to RealtimeSessionOptions - Map VAD options to OpenAI SDK TurnDetection in both conversation and transcription session paths - Add concurrency guidance for SendAsync in IRealtimeClientSession docs - Add unit tests for the new types
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a cross-provider abstraction for Voice Activity Detection (VAD) configuration to Microsoft.Extensions.AI realtime sessions, and wires it into the OpenAI realtime provider so callers can consistently enable/disable VAD and control barge-in behavior.
Changes:
- Introduces
VoiceActivityDetectionOptionsand aVoiceActivityDetectionproperty onRealtimeSessionOptions. - Updates
OpenAIRealtimeClientSessionto map the new abstraction onto OpenAI SDK turn detection options for both conversation and transcription sessions. - Expands unit tests in
Microsoft.Extensions.AI.Abstractions.Teststo cover defaulting and round-tripping of the new options.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeSessionOptionsTests.cs | Adds tests for RealtimeSessionOptions.VoiceActivityDetection and VoiceActivityDetectionOptions defaults/roundtrip. |
| src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClientSession.cs | Maps VoiceActivityDetectionOptions to OpenAI realtime turn detection configuration in session option builders. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetectionOptions.cs | Adds the new experimental options type representing VAD settings. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs | Adds VoiceActivityDetection init-only property to the session options abstraction. |
| src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClientSession.cs | Adds concurrency guidance to SendAsync XML docs for provider implementers. |
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClientSession.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClientSession.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClientSession.cs
Show resolved
Hide resolved
- Fix type name in SendAsync concurrency docs: FunctionInvokingRealtimeSession -> FunctionInvokingRealtimeClientSession - Preserve existing TurnDetection from seed options (RawRepresentationFactory) by mutating InterruptResponseEnabled on existing RealtimeServerVadTurnDetection instead of replacing it, in both conversation and transcription paths
Member
Author
|
@stephentoub could you please help having a quick look? Thanks! CC @jeffhandley |
stephentoub
reviewed
Mar 17, 2026
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetectionOptions.cs
Show resolved
Hide resolved
stephentoub
approved these changes
Mar 17, 2026
Document that AllowInterruption only takes effect when Enabled is true, and that disabling VAD fully disables turn detection making interruption not applicable.
jeffhandley
pushed a commit
to jeffhandley/extensions
that referenced
this pull request
Mar 17, 2026
…otnet#7399) * Add VoiceActivityDetection options to realtime session abstractions Introduce VoiceActivityDetectionOptions with Enabled and AllowInterruption properties to RealtimeSessionOptions. These represent the common VAD options supported across multiple realtime AI models (OpenAI, Gemini, Anthropic Claude, AWS Nova Sonic). - Add VoiceActivityDetectionOptions class with Enabled (default true) and AllowInterruption (default true) properties - Add VoiceActivityDetection property to RealtimeSessionOptions - Map VAD options to OpenAI SDK TurnDetection in both conversation and transcription session paths - Add concurrency guidance for SendAsync in IRealtimeClientSession docs - Add unit tests for the new types * Address PR review feedback - Fix type name in SendAsync concurrency docs: FunctionInvokingRealtimeSession -> FunctionInvokingRealtimeClientSession - Preserve existing TurnDetection from seed options (RawRepresentationFactory) by mutating InterruptResponseEnabled on existing RealtimeServerVadTurnDetection instead of replacing it, in both conversation and transcription paths * Clarify relationship between Enabled and AllowInterruption in VAD docs Document that AllowInterruption only takes effect when Enabled is true, and that disabling VAD fully disables turn detection making interruption not applicable. --------- Co-authored-by: Tarek Mahmoud Sayed <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Voice Activity Detection (VAD) options to the MEAI realtime session abstractions, exposing the common VAD capabilities that are supported across multiple realtime AI models.
Motivation
Different realtime AI providers (OpenAI, Google Gemini, Anthropic Claude, AWS Nova Sonic) each support Voice Activity Detection with varying levels of configuration. After analyzing the VAD capabilities across these models, two options emerged as universally supported:
InterruptResponseEnabled)allowInterruptions)turn_detection.create_response)voiceConfiguration)Provider-specific options like
SilenceDurationMsandPrefixPaddingMs(only OpenAI and Gemini) are intentionally excluded from this initial abstraction and can still be configured viaRawRepresentationFactory.Changes
Abstractions (
Microsoft.Extensions.AI.Abstractions)VoiceActivityDetectionOptionsclass with:Enabled(default:true) — controls whether the provider uses VAD to automatically detect speechAllowInterruption(default:true) — controls whether user speech can interrupt an in-progress model response (barge-in)VoiceActivityDetectionproperty onRealtimeSessionOptionsIRealtimeClientSession.SendAsyncfor provider implementersOpenAI Provider (
Microsoft.Extensions.AI.OpenAI)VoiceActivityDetection.Enabled = falsetoinputAudioOptions.DisableTurnDetection()VoiceActivityDetection.AllowInterruptiontoRealtimeServerVadTurnDetection.InterruptResponseEnabledTests
RealtimeSessionOptionsDesign Decisions
TranscriptionOptionspattern (nested mutable class onRealtimeSessionOptions)nullmeans "use provider default" — no VAD configuration is sent unless explicitly setRawRepresentationFactoryMicrosoft Reviewers: Open in CodeFlow