๐Ÿ’ก Inspiration

The world is more connected than ever, yet language remains the final barrier.
While traveling, I realized that existing translation apps felt like applications, not extensions of the self.

They were:

  • Too Slow
    The Input โ†’ Cloud โ†’ Processing โ†’ Return loop introduced an awkward 2โ€“3 second delay that killed natural conversation flow.

  • Privacy Invasive
    Sending intimate conversations to a remote server felt wrong.

  • Dependent
    Losing signal meant losing your voice.

I envisioned Aether (formerly GlobalSense):
A tool that wasn't a "Chat Box", but an invisible conversational layer โ€”
a real-time, offline interpreter that lived on the device, privacy-first.


๐Ÿ› ๏ธ How We Built It

We adopted a Simulator-First approach to validate the UX before diving into Android internals.


1. The UX Simulator

We built a high-fidelity web simulator (simulator.html) to perfect the interaction model.

Stack

  • HTML5
  • Vanilla JavaScript
  • CSS3 (Glassmorphism)

Design

  • Moved away from Chat Bubbles (WhatsApp-style)
  • Adopted Transcript Cards (Professional Interpreter-style)

Visuals

  • CSS variables for dynamic theming
  • Keyframe animations for a โ€œbreathingโ€ microphone effect

2. The Android Architecture

The core is an Offline-First Android pipeline developed in Kotlin.

๐ŸŽ™๏ธ Speech-to-Text (STT)

  • Integrated Vosk for offline speech recognition
  • Runs a small neural network model directly on the CPU

๐ŸŒ Translation Engine

  • Designed to support NLLB (No Language Left Behind) via ONNX Runtime

Math Note
Standard Transformer complexity is:

[ O(n^2) ]

with respect to sequence length.

We optimized for sentence-level processing where n is small, ensuring:

[ Latency\ L < 200ms ]


๐ŸŽง Audio Routing

  • Configured BluetoothHeadset SCO (Synchronous Connection Oriented) channels
  • Enabled Travel Mode:
    • Phone stays in pocket
    • Audio plays directly in the earpiece

๐Ÿงฉ Challenges Faced


The "Chat Box" Trap

Challenge
Initial designs looked like a messaging app, causing users to wait for replies instead of speaking naturally.

Solution
We redesigned the UI to resemble a live transcript:

  • Source language displayed on top
  • Translation displayed below
  • High-contrast cards replaced chat bubbles

This shifted the userโ€™s mental model from Messaging โ†’ Broadcasting.


Latency vs. Accuracy

Challenge
Balancing model size:

  • Large models (1GB+) = accurate but slow
  • Tiny models = fast but lose context

Solution: Hybrid Pipeline

  • Vosk Small Model for instant STT
    (~50ms)
  • Lightweight text-correction layer
    • Regex + heuristics
    • Fixes common phonetic errors before translation

The "Offline" Constraint

Challenge
Most high-quality translation APIs (DeepL, Google) require internet access.

Solution

  • Pivoted to on-device inference
  • Used INT8 quantization

Results

  • Model size reduced by 4ร—
  • Negligible accuracy loss:

[ \Delta BLEU < 1.5 ]


๐ŸŽ“ What I Learned

UX is Performance

A 200ms delay feels instantaneous if the UI provides immediate visual feedback
(e.g., the breathing microphone animation).

Local AI is the Future

On-device models arenโ€™t just about privacy โ€” they are a UX advantage.
Offline reliability beats unpredictable cloud latency every time.

Iterative Prototyping Works

Building the web simulator first saved weeks of Android development by:

  • Failing fast
  • Fixing UX logic cheaply
  • Entering native development with confidence

Built With

Share this project:

Updates