DelightfulOS

Collar tap
Collar close up

Inspiration

What it does

How we built it# DelightfulOS — Devpost Submission (HARD MODE: Hardware x AI)

Project Story

Inspiration

AI has made every person more insular. We asked: what if AI facilitated physical experience instead, opening our perception toward others?

DelightfulOS is a distributed operating system for real-time interactions between wearable devices under an Internet of Bodies framework. We believe the most meaningful AI interfaces are not screens or speakers, they're the body itself.

We were inspired by a few core questions:

How much control would you give a friend over your perception? The collar is a physical interface that others interact with. Tapping someone's collar toggles what AR overlays you see over them — your body becomes a shared control surface.
Can we detect intent before speech? A piezo contact microphone on the throat picks up vocal-cord preparation and inhalation ~200ms before you make a sound. That anticipatory signal drives turn-taking, highlights "about to speak" in AR, and keeps mediation subtle.
Can hardware and AI be co-designed in a structured way? We built a Hardware Description Language (HDL) with a five-dimensional wearable grammar (body site, signal ecology, output modality, intelligence function, temporal scope) so humans and AI can reason together about full-stack wearable specs — from anatomy to embodiment philosophy.

We wanted to show that hardware × AI isn't just "sensors + cloud model." It's a layered system: sub-50ms signal processing, sub-200ms rule-based policy, and ~2s LLM mediation when the situation is ambiguous. The collar, the glasses, and the AI are one pipeline: body → bus → state → policy → output → device.

What We Built

Custom wearable collar (ESP32-S3)
Piezo contact mic on the throat (pre-speech + speech + tap detection), built-in PDM MEMS mic for speech capture, 4-channel directional haptics. Firmware streams events over WebSocket and optionally raw PCM for Gemini Live. See collar/WIRING.md and collar/firmware/contact_mic.ino.
Real-time body state pipeline
Raw piezo → RMS/ZCR/spectral features → voice activity detector (speech + pre-speech) → state estimator (speech_intent, speech_active, stress_level, engagement, overloaded). State flows to a policy engine: fast rules (turn-taking, overload protection, collar-tap → AR toggle) and an LLM social mediator (Prime Intellect) for complex situations.
AR layer with Snap Spectacles
Spectacles connect via Supabase Realtime (Snap Cloud). They receive per-user body state at 2Hz, live transcriptions from Gemini Live, and overlay commands (highlight, show/remove cube) — including actions triggered by physical collar taps.
Hardware Description Language (HDL)
A compositional grammar for AI wearables (body location, signal ecology, output modality, intelligence function, temporal scope, plus electronics, firmware, interaction/embodiment). AI-assisted co-design: natural language → full wearable spec (sensors, outputs, BOM, consent model, embodiment principles). System gap analysis across multi-device setups. See delightfulos/hdl/grammar.py, delightfulos/hdl/codesign.py, and delightfulos/hdl/library/devices/.
Multi-user, multi-device
Every user has independent state; turn-taking and collar-tap policies are multi-user aware. A live dashboard (React + Three.js) visualizes piezo telemetry and tap events.

How We Built It

Architecture
Layered monorepo inspired by ROS (typed pub/sub signal bus) and manager-based composition:

OS (delightfulos/os/): Zero external deps — Signal, Action, DeviceInfo, bus, registry, state estimator.
Runtime (delightfulos/runtime/): Policy engine (rules + signal-reactive handlers), output router.
AI (delightfulos/ai/): Piezo signal processing (VAD, features), Prime Intellect client, Gemini Live sessions, social mediator.
XR (delightfulos/xr/): Platform-agnostic protocol; Spectacles adapter; WebSocket handler.
Networking (delightfulos/networking/): Collar WebSocket, Supabase Realtime bridge, device simulator.
HDL (delightfulos/hdl/): Grammar, loader, library, AI co-design and gap-analysis prompts.

Signal flow:
Device → Bus → StateEstimator → PolicyManager → OutputRouter → Device (with LLM mediator for complex cases).

Hardware
XIAO ESP32-S3 Sense; piezo on A0 (ADC1) with 100nF coupling and 1M bias divider; 4× haptic motors via MOSFETs; Arduino firmware with NVS for WiFi/server/user config.

Server
FastAPI app with routers for system, collar, AI, HDL, Supabase; WebSocket endpoints for XR, collar, and Gemini Live audio.

Challenges We Faced

Pre-speech detection
Piezo signal is noisy and user-dependent. We tuned thresholds (pre-speech vs speech vs tap) and used a sliding-window history (recent vs older RMS) plus ZCR to reduce false positives. Calibration mode and per-user baselines help.
Latency tiers
Collar taps and gestures must drive AR changes immediately; we added signal-reactive policies that run per-signal instead of waiting for the 200ms state batch. Turn-taking rules had to avoid highlighting "about to speak" when someone is already talking, and to resolve ties (yield haptic for one user, highlight the other).
Spectacles / Supabase protocol
Matching Snap’s Phoenix/Realtime topic naming (realtime:cursor-{channel}), user ID conventions, and our os-state / os-action / ar-overlay payloads required careful alignment with the Spectacles sample and docs.
HDL as a design material
Making the grammar both machine-parseable (YAML, JSON) and rich enough for embodiment (consent model, social dynamics, signal interpretations) took several iterations. The AI co-design prompt had to balance structure (exact JSON schema) with reasoning (anatomical and psychological notes).
Multi-user state and overlay toggles
Collar tap toggles “show/hide this person’s cube” for the viewer. We track hidden_overlays per user so each viewer has independent visibility; actions are routed to the right Spectacles clients.

Built With

Languages: Python 3.12+, C++ (Arduino), TypeScript (React), YAML
Frameworks & platforms: FastAPI, uvicorn, Pydantic, React Three Fiber, Snap Spectacles (Lens Studio), Supabase Realtime
Hardware: Seeed XIAO ESP32-S3 Sense, piezo contact microphone, PDM MEMS mic, 4× haptic motors, MOSFET drivers
AI / cloud: Prime Intellect (OpenAI-compatible, 100+ models), Google Gemini Live (bidirectional audio), Supabase (Realtime WebSocket)
Tooling: uv, Arduino IDE, WebSockets, PyYAML

Try It Out

GitHub: https://github.com/delightfulvision/DelightfulOS (replace with your actual repo URL)
Docs: See README.md for quick start; docs/VISION.md for concept and technical position; collar/WIRING.md for hardware.
Run locally: uv pip install -e ".[dev]", then cd server, configure .env, and uv run uvicorn app.main:app --host 0.0.0.0 --port 8000. Dashboard: http://localhost:8000/dashboard. Use scripts/demo.sh for server + simulators.

Built With

c++
python
react
spectacles
typescript
yaml

Updates

Alif Jakir started this project — Mar 10, 2026 03:06 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.