image to show

💡 Inspiration

The world is more connected than ever, yet language remains the final barrier.
While traveling, I realized that existing translation apps felt like applications, not extensions of the self.

They were:

Too Slow
The Input → Cloud → Processing → Return loop introduced an awkward 2–3 second delay that killed natural conversation flow.
Privacy Invasive
Sending intimate conversations to a remote server felt wrong.
Dependent
Losing signal meant losing your voice.

I envisioned Aether (formerly GlobalSense):
A tool that wasn't a "Chat Box", but an invisible conversational layer —
a real-time, offline interpreter that lived on the device, privacy-first.

🛠️ How We Built It

We adopted a Simulator-First approach to validate the UX before diving into Android internals.

1. The UX Simulator

We built a high-fidelity web simulator (simulator.html) to perfect the interaction model.

Stack

HTML5
Vanilla JavaScript
CSS3 (Glassmorphism)

Design

Moved away from Chat Bubbles (WhatsApp-style)
Adopted Transcript Cards (Professional Interpreter-style)

Visuals

CSS variables for dynamic theming
Keyframe animations for a “breathing” microphone effect

2. The Android Architecture

The core is an Offline-First Android pipeline developed in Kotlin.

🎙️ Speech-to-Text (STT)

Integrated Vosk for offline speech recognition
Runs a small neural network model directly on the CPU

🌍 Translation Engine

Designed to support NLLB (No Language Left Behind) via ONNX Runtime

Math Note
Standard Transformer complexity is:

[ O(n^2) ]

with respect to sequence length.

We optimized for sentence-level processing where n is small, ensuring:

[ Latency\ L < 200ms ]

🎧 Audio Routing

Configured BluetoothHeadset SCO (Synchronous Connection Oriented) channels
Enabled Travel Mode:
- Phone stays in pocket
- Audio plays directly in the earpiece

🧩 Challenges Faced

The "Chat Box" Trap

Challenge
Initial designs looked like a messaging app, causing users to wait for replies instead of speaking naturally.

Solution
We redesigned the UI to resemble a live transcript:

Source language displayed on top
Translation displayed below
High-contrast cards replaced chat bubbles

This shifted the user’s mental model from Messaging → Broadcasting.

Latency vs. Accuracy

Challenge
Balancing model size:

Large models (1GB+) = accurate but slow
Tiny models = fast but lose context

Solution: Hybrid Pipeline

Vosk Small Model for instant STT
(~50ms)
Lightweight text-correction layer
- Regex + heuristics
- Fixes common phonetic errors before translation

The "Offline" Constraint

Challenge
Most high-quality translation APIs (DeepL, Google) require internet access.

Solution

Pivoted to on-device inference
Used INT8 quantization

Results

Model size reduced by 4×
Negligible accuracy loss:

[ \Delta BLEU < 1.5 ]

🎓 What I Learned

UX is Performance

A 200ms delay feels instantaneous if the UI provides immediate visual feedback
(e.g., the breathing microphone animation).

Local AI is the Future

On-device models aren’t just about privacy — they are a UX advantage.
Offline reliability beats unpredictable cloud latency every time.

Iterative Prototyping Works

Building the web simulator first saved weeks of Android development by:

Failing fast
Fixing UX logic cheaply
Entering native development with confidence

Built With

css3
html5
vanilla-js

Updates

Harish Kumar Johnson started this project — Dec 24, 2025 01:39 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.