๐ก Inspiration
The world is more connected than ever, yet language remains the final barrier.
While traveling, I realized that existing translation apps felt like applications, not extensions of the self.
They were:
Too Slow
TheInput โ Cloud โ Processing โ Returnloop introduced an awkward 2โ3 second delay that killed natural conversation flow.Privacy Invasive
Sending intimate conversations to a remote server felt wrong.Dependent
Losing signal meant losing your voice.
I envisioned Aether (formerly GlobalSense):
A tool that wasn't a "Chat Box", but an invisible conversational layer โ
a real-time, offline interpreter that lived on the device, privacy-first.
๐ ๏ธ How We Built It
We adopted a Simulator-First approach to validate the UX before diving into Android internals.
1. The UX Simulator
We built a high-fidelity web simulator (simulator.html) to perfect the interaction model.
Stack
- HTML5
- Vanilla JavaScript
- CSS3 (Glassmorphism)
Design
- Moved away from Chat Bubbles (WhatsApp-style)
- Adopted Transcript Cards (Professional Interpreter-style)
Visuals
- CSS variables for dynamic theming
- Keyframe animations for a โbreathingโ microphone effect
2. The Android Architecture
The core is an Offline-First Android pipeline developed in Kotlin.
๐๏ธ Speech-to-Text (STT)
- Integrated Vosk for offline speech recognition
- Runs a small neural network model directly on the CPU
๐ Translation Engine
- Designed to support NLLB (No Language Left Behind) via ONNX Runtime
Math Note
Standard Transformer complexity is:
[ O(n^2) ]
with respect to sequence length.
We optimized for sentence-level processing where n is small, ensuring:
[ Latency\ L < 200ms ]
๐ง Audio Routing
- Configured BluetoothHeadset SCO (Synchronous Connection Oriented) channels
- Enabled Travel Mode:
- Phone stays in pocket
- Audio plays directly in the earpiece
๐งฉ Challenges Faced
The "Chat Box" Trap
Challenge
Initial designs looked like a messaging app, causing users to wait for replies instead of speaking naturally.
Solution
We redesigned the UI to resemble a live transcript:
- Source language displayed on top
- Translation displayed below
- High-contrast cards replaced chat bubbles
This shifted the userโs mental model from Messaging โ Broadcasting.
Latency vs. Accuracy
Challenge
Balancing model size:
- Large models (1GB+) = accurate but slow
- Tiny models = fast but lose context
Solution: Hybrid Pipeline
- Vosk Small Model for instant STT
(~50ms) - Lightweight text-correction layer
- Regex + heuristics
- Fixes common phonetic errors before translation
The "Offline" Constraint
Challenge
Most high-quality translation APIs (DeepL, Google) require internet access.
Solution
- Pivoted to on-device inference
- Used INT8 quantization
Results
- Model size reduced by 4ร
- Negligible accuracy loss:
[ \Delta BLEU < 1.5 ]
๐ What I Learned
UX is Performance
A 200ms delay feels instantaneous if the UI provides immediate visual feedback
(e.g., the breathing microphone animation).
Local AI is the Future
On-device models arenโt just about privacy โ they are a UX advantage.
Offline reliability beats unpredictable cloud latency every time.
Iterative Prototyping Works
Building the web simulator first saved weeks of Android development by:
- Failing fast
- Fixing UX logic cheaply
- Entering native development with confidence


Log in or sign up for Devpost to join the conversation.