You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project explores how to build a native iOS conversational assistant with a clean, testable architecture. The app captures spoken input, transcribes it on-device with Apple speech frameworks, sends conversation context to OpenAI, and speaks the assistant response back to the user.
The implementation follows the same spirit as essential-feed-case-study: describe the product behavior as specs first, isolate details behind protocol boundaries, and keep the UI driven by presentation models instead of service logic.
Feature Specs
Story: User starts a voice conversation
Narrative #1
As a user
I want to speak to the app
So I can have a hands-free conversation with the assistant
Scenarios (Acceptance Criteria)
Given the user granted speech recognition and microphone access
When the user taps the microphone button
Then the app should start listening
And the app should display the live transcript
Given the app is listening
When the user taps stop
And the transcript contains meaningful text
Then the app should send the transcript as a user message
And the app should show a processing state
Given the app is listening
When the user taps stop
And the transcript is empty or too short
Then the app should not send a message
And the app should return to idle
Story: User sends a typed message
Narrative #2
As a user
I want to type a message
So I can use the assistant without speaking
Scenarios (Acceptance Criteria)
Given the user entered text
When the user taps send
Then the app should append the text as a user message
And the app should request a response from the remote AI service
And the app should show a processing state
Story: User receives an assistant response
Narrative #3
As a user
I want the assistant to answer my message
So I can continue the conversation naturally
Scenarios (Acceptance Criteria)
Given the remote AI service returns a valid response
When the response is received
Then the app should append the assistant message to the conversation
And the app should transition to speaking
And the response should be spoken aloud
Given speech playback finishes
When the synthesizer completes
Then the app should return to idle
Story: User cancels or resets the conversation
Narrative #4
As a user
I want to stop an in-flight request or reset the session
So I can recover quickly and start over
Scenarios (Acceptance Criteria)
Given the app is processing a remote response
When the user taps cancel
Then the pending request should be cancelled
And the app should return to idle
Given the user wants a fresh conversation
When the user taps reset
Then the app should clear all messages
And the app should stop speech playback
And the app should return to idle
Story: User encounters configuration, transport, or device errors
Narrative #5
As a user
I want failures to be surfaced clearly
So I understand why the conversation could not continue
Scenarios (Acceptance Criteria)
Given the app has no API key configured
When the user sends a message
Then the app should fail with a configuration error
Given the device has no connectivity
When the user sends a message
Then the app should fail with a connectivity error
Given the server returns invalid or unexpected data
When the response is mapped
Then the app should fail with an invalid data error
Given the app cannot access a valid recording input
When the user starts recording
Then the app should fail with a microphone availability error
Use Cases
Start Speech Recognition Use Case
Primary course
Execute "Start Recording" command.
System cancels any in-flight AI request.
System stops any active speech playback.
System activates the audio session for recording.
System starts speech recognition.
System delivers listening state.
System delivers live transcript updates.
Recording error course
System delivers an error.
Stop Speech Recognition And Submit Transcript Use Case
Primary course
Execute "Stop Recording" command.
System stops speech recognition.
System reads the current transcript.
System validates the transcript is not empty.
System appends a user message.
System requests a remote assistant response.
System delivers processing state.
Empty transcript course
System does not send a message.
System delivers idle state.
Load Chat Response From Remote Use Case
Data
Conversation messages
Primary course
Execute "Load Chat Response" command with the conversation messages.
System validates API configuration.
System builds the OpenAI chat completions request.
System performs the HTTP request.
System validates the HTTP response.
System maps the JSON payload into assistant text.
System delivers the assistant response.
Invalid request course
System delivers invalid data error.
No connectivity course
System delivers connectivity error.
Invalid response course
System delivers invalid data error.
API error response course
System delivers the API error returned by the server.
Cancel course
System does not deliver response nor error.
Speak Assistant Response Use Case
Data
Assistant response text
Primary course
Execute "Speak Response" command with the assistant text.
System activates the audio session for playback.
System starts speech synthesis.
System delivers speaking state.
System notifies completion when playback ends.
System delivers idle state.
Cancel Pending Request Use Case
Primary course
Execute "Cancel Pending Request" command.
System cancels the in-flight async task.
System does not deliver a stale response.
System delivers idle state.
Reset Conversation Use Case
Primary course
Execute "Reset Conversation" command.
System cancels the in-flight request.
System stops speech playback.
System clears all messages.
System clears the live transcript.
System delivers idle state.
Request Speech Permissions Use Case
Primary course
Execute "Request Permissions" command.
System requests speech recognition authorization from iOS.
Flowcharts
Architecture
The project is split so business rules and request orchestration can be tested without SwiftUI or concrete platform services. The app target acts as the composition root, while the reusable feature module defines protocols, request mapping, and presentation flow.
Concrete platform and network implementation details
Request Lifecycle
ContentView renders a ConversationViewModel.
User actions such as startRecording(), stopRecording(), sendMessage(_:), resetConversation(), and cancelPendingRequest() are forwarded by the view model to its delegate.
ConversationPresentationAdapter coordinates the concrete use case:
starts or stops recording
collects transcript updates
appends user messages
launches the remote chat request
triggers speech synthesis
RemoteChatLoader validates the API key, builds a ChatEndpoint request, and executes it through HTTPClient.
ChatMapper validates the HTTP response and extracts assistant text from the JSON payload.
ConversationPresenter converts domain events into plain display models for state, messages, and transcript.
ConversationViewModel publishes those updates to SwiftUI.
SpeechSynthesizer speaks the assistant response and notifies the presenter when playback finishes.
Error Map
Failure source
Technical error
Mapped by
Presented as
User-visible effect
Missing OPENAI_API_KEY
ChatMapper.Error.apiError(statusCode: 401, message: "Missing OpenAI API key.", type: nil)
RemoteChatLoader
ChatState.error(error.localizedDescription)
Status text shows an error while conversation stays intact
Build and run on a real device when possible, since simulator microphone behavior can be limited.
Tradeoffs And Next Steps
The current implementation uses the Chat Completions endpoint and a fixed model string. A future version could move to streaming or realtime APIs for lower perceived latency.
Errors currently flow into localizedDescription, which is simple and centralized but not yet polished for product-quality copy.
Cancellation prevents stale UI delivery, but there is no explicit remote cancellation handshake beyond local task cancellation.
Conversation history is in-memory only. Persistence or resumable sessions could be added later.
There is no offline cache yet, which keeps the feature lean but leaves the app without a fallback when disconnected.
About
A case-study style iOS conversational AI app built with SwiftUI, speech recognition, text-to-speech, OpenAI integration, clean modular architecture, unit tests, CI, and architecture/flow documentation.