feat(voice-call): pre-cache inbound greeting TTS for instant playback#18447
Merged
steipete merged 4 commits intoopenclaw:mainfrom Feb 16, 2026
Merged
Conversation
Pre-generates TTS audio for the configured inboundGreeting at startup and serves it instantly when an inbound call connects, eliminating the 500ms+ TTS synthesis delay on the first ring. Changes: - twilio.ts: Add cachedGreetingAudio storage with getter/setter - runtime.ts: Pre-synthesize greeting TTS after provider initialization - webhook.ts: Play cached audio directly via media stream on inbound connect, falling back to the original TTS path for outbound calls or when no cached audio is available Co-Authored-By: Claude Opus 4.6 <[email protected]>
extensions/voice-call/src/webhook.ts
Outdated
Comment on lines
+150
to
+152
| if (cachedAudio && call?.metadata?.initialMessage && call.direction === "inbound") { | ||
| console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`); | ||
| delete call.metadata.initialMessage; // prevent re-speaking via fallback |
Contributor
There was a problem hiding this comment.
deleting call.metadata.initialMessage mutates shared state but doesn't persist to disk
The code deletes call.metadata.initialMessage to prevent re-speaking via fallback, but this mutation isn't persisted to disk (unlike in speakInitialMessage at manager/outbound.ts:205-207 which calls persistCallRecord). If the gateway restarts after playing cached greeting but before the call ends, the greeting could be spoken again.
Suggested change
| if (cachedAudio && call?.metadata?.initialMessage && call.direction === "inbound") { | |
| console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`); | |
| delete call.metadata.initialMessage; // prevent re-speaking via fallback | |
| console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`); | |
| delete call.metadata.initialMessage; | |
| const handler = this.mediaStreamHandler!; |
Consider calling the persistence function after mutation, or refactoring to use the existing speakInitialMessage path which already handles this correctly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/voice-call/src/webhook.ts
Line: 150:152
Comment:
deleting `call.metadata.initialMessage` mutates shared state but doesn't persist to disk
The code deletes `call.metadata.initialMessage` to prevent re-speaking via fallback, but this mutation isn't persisted to disk (unlike in `speakInitialMessage` at manager/outbound.ts:205-207 which calls `persistCallRecord`). If the gateway restarts after playing cached greeting but before the call ends, the greeting could be spoken again.
```suggestion
console.log(`[voice-call] Playing cached greeting (${cachedAudio.length} bytes)`);
delete call.metadata.initialMessage;
const handler = this.mediaStreamHandler!;
```
Consider calling the persistence function after mutation, or refactoring to use the existing `speakInitialMessage` path which already handles this correctly.
How can I resolve this? If you propose a fix, please make it concise.Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
Address review feedback: the in-memory deletion of initialMessage is not persisted to disk, which is acceptable because a gateway restart would also sever the media stream, making replay impossible. Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
inboundGreetingat gateway startupContext
When an inbound call connects, the greeting has a noticeable 500ms+ delay because TTS synthesis happens on-demand after the media stream is established. For voice UX, sub-second greeting is table stakes — callers expect immediate acknowledgment.
This pre-generates the greeting audio once at startup and sends it directly via the media stream when an inbound call connects, bypassing the synthesis step entirely.
Changes
providers/twilio.ts: AddedcachedGreetingAudiobuffer storage withset/getmethodsruntime.ts: After TTS provider initialization, pre-synthesizes theinboundGreetingasynchronouslywebhook.ts: OnonConnectfor inbound calls, streams cached audio chunks directly instead of waiting for TTS synthesis. Reduces fallback delay from 500ms to 100ms for non-cached cases.Test plan
inboundGreeting: "Hello!"→ verify "Cached greeting audio: N bytes" in startup logsinboundGreetingconfigured → verify no errors, normal behavior🤖 Generated with Claude Code
Greptile Summary
Pre-caches TTS audio for
inboundGreetingat startup and plays it instantly when inbound calls connect, eliminating the 500ms synthesis delay.Key changes:
cachedGreetingAudiobuffer storage toTwilioProviderwith getter/setter methodsruntime.tsafter TTS provider initializationwebhook.tsonConnecthandler to check for cached audio and stream it directly for inbound calls, bypassing TTS synthesisIssues found:
delete call.metadata.initialMessage) isn't persisted to disk, unlike the existingspeakInitialMessagepath which callspersistCallRecordafter deletion (manager/outbound.ts:205-207). Could cause greeting to replay after gateway restart.Confidence Score: 3/5
call.metadata.initialMessagedeletion isn't persisted to disk, which could cause greeting replay after restart. The issue is not critical for production but violates the existing pattern used elsewhere in the codebase.Last reviewed commit: 91b667b