Conversational AI Mobile App Case Study

This project explores how to build a native iOS conversational assistant with a clean, testable architecture. The app captures spoken input, transcribes it on-device with Apple speech frameworks, sends conversation context to OpenAI, and speaks the assistant response back to the user.

The implementation follows the same spirit as essential-feed-case-study: describe the product behavior as specs first, isolate details behind protocol boundaries, and keep the UI driven by presentation models instead of service logic.

Feature Specs

Story: User starts a voice conversation

Narrative #1

As a user
I want to speak to the app
So I can have a hands-free conversation with the assistant

Scenarios (Acceptance Criteria)

Given the user granted speech recognition and microphone access
When the user taps the microphone button
Then the app should start listening
And the app should display the live transcript

Given the app is listening
When the user taps stop
And the transcript contains meaningful text
Then the app should send the transcript as a user message
And the app should show a processing state

Given the app is listening
When the user taps stop
And the transcript is empty or too short
Then the app should not send a message
And the app should return to idle

Story: User sends a typed message

Narrative #2

As a user
I want to type a message
So I can use the assistant without speaking

Scenarios (Acceptance Criteria)

Given the user entered text
When the user taps send
Then the app should append the text as a user message
And the app should request a response from the remote AI service
And the app should show a processing state

Story: User receives an assistant response

Narrative #3

As a user
I want the assistant to answer my message
So I can continue the conversation naturally

Scenarios (Acceptance Criteria)

Given the remote AI service returns a valid response
When the response is received
Then the app should append the assistant message to the conversation
And the app should transition to speaking
And the response should be spoken aloud

Given speech playback finishes
When the synthesizer completes
Then the app should return to idle

Story: User cancels or resets the conversation

Narrative #4

As a user
I want to stop an in-flight request or reset the session
So I can recover quickly and start over

Scenarios (Acceptance Criteria)

Given the app is processing a remote response
When the user taps cancel
Then the pending request should be cancelled
And the app should return to idle

Given the user wants a fresh conversation
When the user taps reset
Then the app should clear all messages
And the app should stop speech playback
And the app should return to idle

Story: User encounters configuration, transport, or device errors

Narrative #5

As a user
I want failures to be surfaced clearly
So I understand why the conversation could not continue

Scenarios (Acceptance Criteria)

Given the app has no API key configured
When the user sends a message
Then the app should fail with a configuration error

Given the device has no connectivity
When the user sends a message
Then the app should fail with a connectivity error

Given the server returns invalid or unexpected data
When the response is mapped
Then the app should fail with an invalid data error

Given the app cannot access a valid recording input
When the user starts recording
Then the app should fail with a microphone availability error

Use Cases

Start Speech Recognition Use Case

Primary course

Execute "Start Recording" command.
System cancels any in-flight AI request.
System stops any active speech playback.
System activates the audio session for recording.
System starts speech recognition.
System delivers listening state.
System delivers live transcript updates.

Recording error course

System delivers an error.

Stop Speech Recognition And Submit Transcript Use Case

Primary course

Execute "Stop Recording" command.
System stops speech recognition.
System reads the current transcript.
System validates the transcript is not empty.
System appends a user message.
System requests a remote assistant response.
System delivers processing state.

Empty transcript course

System does not send a message.
System delivers idle state.

Load Chat Response From Remote Use Case

Data

Conversation messages

Primary course

Execute "Load Chat Response" command with the conversation messages.
System validates API configuration.
System builds the OpenAI chat completions request.
System performs the HTTP request.
System validates the HTTP response.
System maps the JSON payload into assistant text.
System delivers the assistant response.

Invalid request course

System delivers invalid data error.

No connectivity course

System delivers connectivity error.

Invalid response course

System delivers invalid data error.

API error response course

System delivers the API error returned by the server.

Cancel course

System does not deliver response nor error.

Speak Assistant Response Use Case

Data

Assistant response text

Primary course

Execute "Speak Response" command with the assistant text.
System activates the audio session for playback.
System starts speech synthesis.
System delivers speaking state.
System notifies completion when playback ends.
System delivers idle state.

Cancel Pending Request Use Case

Primary course

Execute "Cancel Pending Request" command.
System cancels the in-flight async task.
System does not deliver a stale response.
System delivers idle state.

Reset Conversation Use Case

Primary course

Execute "Reset Conversation" command.
System cancels the in-flight request.
System stops speech playback.
System clears all messages.
System clears the live transcript.
System delivers idle state.

Request Speech Permissions Use Case

Primary course

Execute "Request Permissions" command.
System requests speech recognition authorization from iOS.

Flowcharts

Architecture

The project is split so business rules and request orchestration can be tested without SwiftUI or concrete platform services. The app target acts as the composition root, while the reusable feature module defines protocols, request mapping, and presentation flow.

Architecture Overview

Module Responsibilities

Module	Responsibility
`ConversationApp`	Composition root, secret loading, iOS speech services, audio session management
`ConversationUI`	SwiftUI rendering of chat history, controls, status, and transcript
`Conversation`	Feature policies, request orchestration, presenter logic, API mapping, shared abstractions

Layer Responsibilities

Layer	Main Types	Responsibility
Composition	`ConversationAppApp`, `ConversationUIComposer`	Wire concrete implementations to abstractions
UI	`ContentView`, `ChatView`, `ControlsView`, `BubbleView`	Render state and forward user intent
Presentation	`ConversationViewModel`, `ConversationPresentationAdapter`, `ConversationPresenter`	Translate intent into use cases and domain events into view state
Feature Contracts	`ChatLoader`, `SpeechRecognizing`, `SpeechSynthesizing`, `Message`, `ChatState`	Stable boundaries and shared models
API	`RemoteChatLoader`, `ChatEndpoint`, `ChatMapper`	Build requests, call remote service, validate and map responses
Infrastructure	`URLSessionHTTPClient`, `SpeechRecognizer`, `SpeechSynthesizer`, `AudioSessionManager`	Concrete platform and network implementation details

Request Lifecycle

ContentView renders a ConversationViewModel.
User actions such as startRecording(), stopRecording(), sendMessage(_:), resetConversation(), and cancelPendingRequest() are forwarded by the view model to its delegate.
ConversationPresentationAdapter coordinates the concrete use case:
- starts or stops recording
- collects transcript updates
- appends user messages
- launches the remote chat request
- triggers speech synthesis
RemoteChatLoader validates the API key, builds a ChatEndpoint request, and executes it through HTTPClient.
ChatMapper validates the HTTP response and extracts assistant text from the JSON payload.
ConversationPresenter converts domain events into plain display models for state, messages, and transcript.
ConversationViewModel publishes those updates to SwiftUI.
SpeechSynthesizer speaks the assistant response and notifies the presenter when playback finishes.

Error Map

Failure source	Technical error	Mapped by	Presented as	User-visible effect
Missing `OPENAI_API_KEY`	`ChatMapper.Error.apiError(statusCode: 401, message: "Missing OpenAI API key.", type: nil)`	`RemoteChatLoader`	`ChatState.error(error.localizedDescription)`	Status text shows an error while conversation stays intact
Request creation failure	`RemoteChatLoader.Error.invalidData`	`RemoteChatLoader`	`ChatState.error(...)`	User sees a generic invalid data error
Network or transport failure	`RemoteChatLoader.Error.connectivity`	`RemoteChatLoader`	`ChatState.error(...)`	User sees a connectivity-related error
Non-200 API response	`ChatMapper.Error.apiError(statusCode:message:type:)`	`ChatMapper`	`ChatState.error(...)`	User sees the server-side failure propagated through localized error text
Malformed success payload	`ChatMapper.Error.invalidData` then `RemoteChatLoader.Error.invalidData`	`ChatMapper` and `RemoteChatLoader`	`ChatState.error(...)`	User sees an invalid data error
Invalid recording input, such as simulator without microphone	`RecordingError.invalidAudioFormat`	`SpeechRecognizer`	`ChatState.error(...)`	User sees a microphone availability error
Cancelled async request	`Task` cancellation with no delivered result	`ConversationPresentationAdapter`	`ChatState.idle`	No stale assistant message is rendered
Empty or too-short transcript	No error, early return	`ConversationPresentationAdapter`	`ChatState.idle`	Nothing is sent, conversation remains unchanged

Error Ownership Notes

Low-level transport errors are normalized in RemoteChatLoader so callers only handle stable feature-level failures.
API response failures preserve server metadata through ChatMapper.Error.apiError.
UI state never mutates directly from network or speech services; everything flows through the presenter.
Cancellation is modeled as a non-event, which prevents stale responses from being rendered after the user backs out.

Model Specs

Message

Property	Type
`id`	`UUID`
`role`	`Message.Role`
`content`	`String`
`timestamp`	`Date`

Message.Role

Case	Meaning
`user`	A message authored by the user
`assistant`	A message authored by the AI assistant

ChatState

Case	Meaning
`idle`	No active recording, processing, or playback
`listening`	Speech recognition is active
`processing`	A remote response is being requested
`speaking`	Assistant response is being spoken aloud
`error(String)`	A failure occurred and should be surfaced in the UI

Payload Contracts

Request Contract

POST /v1/chat/completions
Authorization: Bearer <OPENAI_API_KEY>
Content-Type: application/json

{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    }
  ]
}

Success Response Contract Used By `ChatMapper`

{
  "choices": [
    {
      "message": {
        "content": "Hello! How can I help?"
      }
    }
  ]
}

Error Response Contract Used By `ChatMapper`

{
  "error": {
    "message": "Invalid authentication credentials",
    "type": "invalid_request_error"
  }
}

Test Strategy

The test suite already reflects the case-study architecture by verifying behavior at boundary seams instead of relying on UI-level integration alone.

Test Suite	What It Protects
`ChatEndpointTests`	Request URL, headers, method, and JSON body generation
`ChatMapperTests`	Mapping of valid responses, malformed payloads, and API error envelopes
`RemoteChatLoaderTests`	Request dispatching, connectivity failures, invalid data failures, and missing API key behavior
`ConversationViewModelTests`	State transitions from idle to processing to speaking, speaking completion, and no-send behavior for empty transcript
`URLSessionHTTPClientTests`	HTTP client transport behavior

Run the Project

Open the workspace or Xcode project.
Copy ConversationApp/ConversationApp/Secrets.plist.example to ConversationApp/ConversationApp/Secrets.plist.
Add your OPENAI_API_KEY.
Ensure the app target includes speech recognition and microphone usage descriptions in ConversationApp/ConversationApp/Info.plist.
Build and run on a real device when possible, since simulator microphone behavior can be limited.

Tradeoffs And Next Steps

The current implementation uses the Chat Completions endpoint and a fixed model string. A future version could move to streaming or realtime APIs for lower perceived latency.
Errors currently flow into localizedDescription, which is simple and centralized but not yet polished for product-quality copy.
Cancellation prevents stale UI delivery, but there is no explicit remote cancellation handshake beyond local task cancellation.
Conversation history is in-memory only. Persistence or resumable sessions could be added later.
There is no offline cache yet, which keeps the feature lean but leaves the app without a fallback when disconnected.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
Conversation		Conversation
ConversationApp		ConversationApp
.gitignore		.gitignore
README.md		README.md
architecture.png		architecture.png
chat_flowchart.png		chat_flowchart.png

Folders and files

Latest commit

History

Repository files navigation

Conversational AI Mobile App Case Study

Feature Specs

Story: User starts a voice conversation

Narrative #1

Scenarios (Acceptance Criteria)

Story: User sends a typed message

Narrative #2

Scenarios (Acceptance Criteria)

Story: User receives an assistant response

Narrative #3

Scenarios (Acceptance Criteria)

Story: User cancels or resets the conversation

Narrative #4

Scenarios (Acceptance Criteria)

Story: User encounters configuration, transport, or device errors

Narrative #5

Scenarios (Acceptance Criteria)

Use Cases

Start Speech Recognition Use Case

Primary course

Recording error course

Stop Speech Recognition And Submit Transcript Use Case

Primary course

Empty transcript course

Load Chat Response From Remote Use Case

Data

Primary course

Invalid request course

No connectivity course

Invalid response course

API error response course

Cancel course

Speak Assistant Response Use Case

Data

Primary course

Cancel Pending Request Use Case

Primary course

Reset Conversation Use Case

Primary course

Request Speech Permissions Use Case

Primary course

Flowcharts

Architecture

Architecture Overview

Module Responsibilities

Layer Responsibilities

Request Lifecycle

Error Map

Error Ownership Notes

Model Specs

Message

Message.Role

ChatState

Payload Contracts

Request Contract

Success Response Contract Used By ChatMapper

Error Response Contract Used By ChatMapper

Test Strategy

Run the Project

Tradeoffs And Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Success Response Contract Used By `ChatMapper`

Error Response Contract Used By `ChatMapper`

Packages