Skip to content

Add VoiceActivityDetection options to realtime session abstractions#7399

Merged
tarekgh merged 3 commits intodotnet:mainfrom
tarekgh:feature/realtime-vad-options
Mar 17, 2026
Merged

Add VoiceActivityDetection options to realtime session abstractions#7399
tarekgh merged 3 commits intodotnet:mainfrom
tarekgh:feature/realtime-vad-options

Conversation

@tarekgh
Copy link
Member

@tarekgh tarekgh commented Mar 17, 2026

Summary

Add Voice Activity Detection (VAD) options to the MEAI realtime session abstractions, exposing the common VAD capabilities that are supported across multiple realtime AI models.

Motivation

Different realtime AI providers (OpenAI, Google Gemini, Anthropic Claude, AWS Nova Sonic) each support Voice Activity Detection with varying levels of configuration. After analyzing the VAD capabilities across these models, two options emerged as universally supported:

Option OpenAI Gemini Anthropic Claude AWS Nova Sonic
Enable/Disable VAD
Allow Interruption ✅ (InterruptResponseEnabled) ✅ (allowInterruptions) ✅ (turn_detection.create_response) ✅ (voiceConfiguration)

Provider-specific options like SilenceDurationMs and PrefixPaddingMs (only OpenAI and Gemini) are intentionally excluded from this initial abstraction and can still be configured via RawRepresentationFactory.

Changes

Abstractions (Microsoft.Extensions.AI.Abstractions)

  • New VoiceActivityDetectionOptions class with:
    • Enabled (default: true) — controls whether the provider uses VAD to automatically detect speech
    • AllowInterruption (default: true) — controls whether user speech can interrupt an in-progress model response (barge-in)
  • New VoiceActivityDetection property on RealtimeSessionOptions
  • Concurrency documentation added to IRealtimeClientSession.SendAsync for provider implementers

OpenAI Provider (Microsoft.Extensions.AI.OpenAI)

  • Maps VoiceActivityDetection.Enabled = false to inputAudioOptions.DisableTurnDetection()
  • Maps VoiceActivityDetection.AllowInterruption to RealtimeServerVadTurnDetection.InterruptResponseEnabled
  • Applied to both conversation and transcription session paths

Tests

  • Added unit tests for default values, property roundtripping, and integration with RealtimeSessionOptions

Design Decisions

  • Follows the existing TranscriptionOptions pattern (nested mutable class on RealtimeSessionOptions)
  • null means "use provider default" — no VAD configuration is sent unless explicitly set
  • Only universally-supported options are included; provider-specific tuning remains available through RawRepresentationFactory
Microsoft Reviewers: Open in CodeFlow

Introduce VoiceActivityDetectionOptions with Enabled and AllowInterruption
properties to RealtimeSessionOptions. These represent the common VAD
options supported across multiple realtime AI models (OpenAI, Gemini,
Anthropic Claude, AWS Nova Sonic).

- Add VoiceActivityDetectionOptions class with Enabled (default true) and
  AllowInterruption (default true) properties
- Add VoiceActivityDetection property to RealtimeSessionOptions
- Map VAD options to OpenAI SDK TurnDetection in both conversation and
  transcription session paths
- Add concurrency guidance for SendAsync in IRealtimeClientSession docs
- Add unit tests for the new types
@tarekgh tarekgh requested a review from a team as a code owner March 17, 2026 00:27
Copilot AI review requested due to automatic review settings March 17, 2026 00:27
@github-actions github-actions bot added the area-ai Microsoft.Extensions.AI libraries label Mar 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a cross-provider abstraction for Voice Activity Detection (VAD) configuration to Microsoft.Extensions.AI realtime sessions, and wires it into the OpenAI realtime provider so callers can consistently enable/disable VAD and control barge-in behavior.

Changes:

  • Introduces VoiceActivityDetectionOptions and a VoiceActivityDetection property on RealtimeSessionOptions.
  • Updates OpenAIRealtimeClientSession to map the new abstraction onto OpenAI SDK turn detection options for both conversation and transcription sessions.
  • Expands unit tests in Microsoft.Extensions.AI.Abstractions.Tests to cover defaulting and round-tripping of the new options.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Realtime/RealtimeSessionOptionsTests.cs Adds tests for RealtimeSessionOptions.VoiceActivityDetection and VoiceActivityDetectionOptions defaults/roundtrip.
src/Libraries/Microsoft.Extensions.AI.OpenAI/OpenAIRealtimeClientSession.cs Maps VoiceActivityDetectionOptions to OpenAI realtime turn detection configuration in session option builders.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/VoiceActivityDetectionOptions.cs Adds the new experimental options type representing VAD settings.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/RealtimeSessionOptions.cs Adds VoiceActivityDetection init-only property to the session options abstraction.
src/Libraries/Microsoft.Extensions.AI.Abstractions/Realtime/IRealtimeClientSession.cs Adds concurrency guidance to SendAsync XML docs for provider implementers.

- Fix type name in SendAsync concurrency docs: FunctionInvokingRealtimeSession -> FunctionInvokingRealtimeClientSession
- Preserve existing TurnDetection from seed options (RawRepresentationFactory) by mutating InterruptResponseEnabled on existing RealtimeServerVadTurnDetection instead of replacing it, in both conversation and transcription paths
@tarekgh
Copy link
Member Author

tarekgh commented Mar 17, 2026

@stephentoub could you please help having a quick look? Thanks!

CC @jeffhandley

@tarekgh tarekgh requested a review from stephentoub March 17, 2026 00:43
Document that AllowInterruption only takes effect when Enabled is true,
and that disabling VAD fully disables turn detection making interruption
not applicable.
@tarekgh tarekgh merged commit 228ae1b into dotnet:main Mar 17, 2026
6 checks passed
jeffhandley pushed a commit to jeffhandley/extensions that referenced this pull request Mar 17, 2026
…otnet#7399)

* Add VoiceActivityDetection options to realtime session abstractions

Introduce VoiceActivityDetectionOptions with Enabled and AllowInterruption
properties to RealtimeSessionOptions. These represent the common VAD
options supported across multiple realtime AI models (OpenAI, Gemini,
Anthropic Claude, AWS Nova Sonic).

- Add VoiceActivityDetectionOptions class with Enabled (default true) and
  AllowInterruption (default true) properties
- Add VoiceActivityDetection property to RealtimeSessionOptions
- Map VAD options to OpenAI SDK TurnDetection in both conversation and
  transcription session paths
- Add concurrency guidance for SendAsync in IRealtimeClientSession docs
- Add unit tests for the new types

* Address PR review feedback

- Fix type name in SendAsync concurrency docs: FunctionInvokingRealtimeSession -> FunctionInvokingRealtimeClientSession
- Preserve existing TurnDetection from seed options (RawRepresentationFactory) by mutating InterruptResponseEnabled on existing RealtimeServerVadTurnDetection instead of replacing it, in both conversation and transcription paths

* Clarify relationship between Enabled and AllowInterruption in VAD docs

Document that AllowInterruption only takes effect when Enabled is true,
and that disabling VAD fully disables turn detection making interruption
not applicable.

---------

Co-authored-by: Tarek Mahmoud Sayed <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants