Skip to content

feat(voice-call): pre-cache inbound greeting TTS for instant playback#18447

Merged
steipete merged 4 commits intoopenclaw:mainfrom
JayMishra-source:feat/voice-greeting-precache
Feb 16, 2026
Merged

feat(voice-call): pre-cache inbound greeting TTS for instant playback#18447
steipete merged 4 commits intoopenclaw:mainfrom
JayMishra-source:feat/voice-greeting-precache

Conversation

@JayMishra-source
Copy link
Contributor

@JayMishra-source JayMishra-source commented Feb 16, 2026

Summary

  • Pre-generates TTS audio for inboundGreeting at gateway startup
  • Plays cached audio instantly on inbound call connect (zero synthesis delay)
  • Falls back to original TTS synthesis path for outbound calls or when no cache

Context

When an inbound call connects, the greeting has a noticeable 500ms+ delay because TTS synthesis happens on-demand after the media stream is established. For voice UX, sub-second greeting is table stakes — callers expect immediate acknowledgment.

This pre-generates the greeting audio once at startup and sends it directly via the media stream when an inbound call connects, bypassing the synthesis step entirely.

Changes

  • providers/twilio.ts: Added cachedGreetingAudio buffer storage with set/get methods
  • runtime.ts: After TTS provider initialization, pre-synthesizes the inboundGreeting asynchronously
  • webhook.ts: On onConnect for inbound calls, streams cached audio chunks directly instead of waiting for TTS synthesis. Reduces fallback delay from 500ms to 100ms for non-cached cases.

Test plan

  • Configure inboundGreeting: "Hello!" → verify "Cached greeting audio: N bytes" in startup logs
  • Call inbound → verify instant greeting with no delay
  • Outbound call → verify normal TTS synthesis path (not affected)
  • No inboundGreeting configured → verify no errors, normal behavior
  • TTS provider unavailable at startup → verify warning logged, fallback works

🤖 Generated with Claude Code

Greptile Summary

Pre-caches TTS audio for inboundGreeting at startup and plays it instantly when inbound calls connect, eliminating the 500ms synthesis delay.

Key changes:

  • Added cachedGreetingAudio buffer storage to TwilioProvider with getter/setter methods
  • Pre-synthesizes greeting asynchronously in runtime.ts after TTS provider initialization
  • Modified webhook.ts onConnect handler to check for cached audio and stream it directly for inbound calls, bypassing TTS synthesis
  • Reduced fallback timeout from 500ms to 100ms for non-cached scenarios

Issues found:

  • State mutation (delete call.metadata.initialMessage) isn't persisted to disk, unlike the existing speakInitialMessage path which calls persistCallRecord after deletion (manager/outbound.ts:205-207). Could cause greeting to replay after gateway restart.

Confidence Score: 3/5

  • Safe to merge with minor risk - the implementation works correctly for the happy path but has a state persistence edge case
  • The core caching mechanism is sound and the performance optimization is well-implemented. However, there's a state persistence bug where call.metadata.initialMessage deletion isn't persisted to disk, which could cause greeting replay after restart. The issue is not critical for production but violates the existing pattern used elsewhere in the codebase.
  • Pay close attention to webhook.ts lines 150-152 where state mutation isn't persisted

Last reviewed commit: 91b667b

Pre-generates TTS audio for the configured inboundGreeting at startup
and serves it instantly when an inbound call connects, eliminating the
500ms+ TTS synthesis delay on the first ring.

Changes:
- twilio.ts: Add cachedGreetingAudio storage with getter/setter
- runtime.ts: Pre-synthesize greeting TTS after provider initialization
- webhook.ts: Play cached audio directly via media stream on inbound
  connect, falling back to the original TTS path for outbound calls
  or when no cached audio is available

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@openclaw-barnacle openclaw-barnacle bot added channel: voice-call Channel integration: voice-call size: S labels Feb 16, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +150 to +152
if (cachedAudio && call?.metadata?.initialMessage && call.direction === "inbound") {
console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`);
delete call.metadata.initialMessage; // prevent re-speaking via fallback
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting call.metadata.initialMessage mutates shared state but doesn't persist to disk

The code deletes call.metadata.initialMessage to prevent re-speaking via fallback, but this mutation isn't persisted to disk (unlike in speakInitialMessage at manager/outbound.ts:205-207 which calls persistCallRecord). If the gateway restarts after playing cached greeting but before the call ends, the greeting could be spoken again.

Suggested change
if (cachedAudio && call?.metadata?.initialMessage && call.direction === "inbound") {
console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`);
delete call.metadata.initialMessage; // prevent re-speaking via fallback
console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`);
delete call.metadata.initialMessage;
const handler = this.mediaStreamHandler!;

Consider calling the persistence function after mutation, or refactoring to use the existing speakInitialMessage path which already handles this correctly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/voice-call/src/webhook.ts
Line: 150:152

Comment:
deleting `call.metadata.initialMessage` mutates shared state but doesn't persist to disk

The code deletes `call.metadata.initialMessage` to prevent re-speaking via fallback, but this mutation isn't persisted to disk (unlike in `speakInitialMessage` at manager/outbound.ts:205-207 which calls `persistCallRecord`). If the gateway restarts after playing cached greeting but before the call ends, the greeting could be spoken again.

```suggestion
          console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`);
          delete call.metadata.initialMessage;
          const handler = this.mediaStreamHandler!;
```

Consider calling the persistence function after mutation, or refactoring to use the existing `speakInitialMessage` path which already handles this correctly.

How can I resolve this? If you propose a fix, please make it concise.

JayMishra-source and others added 3 commits February 16, 2026 10:34
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Address review feedback: the in-memory deletion of initialMessage is
not persisted to disk, which is acceptable because a gateway restart
would also sever the media stream, making replay impossible.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@steipete steipete merged commit 0764999 into openclaw:main Feb 16, 2026
23 checks passed
@sebslight
Copy link
Member

Reverted on main as an accidental merge.

Revert commits:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: voice-call Channel integration: voice-call size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants