Inspiration

Facial expressions are a fundamental part of human communication, conveying emotions and social cues that we often take for granted. However, for individuals who are blind or visually impaired, this vital communication is inaccessible. Without visual cues, they must rely solely on voice tone and context to interpret others' emotions. This is an incomplete picture that can lead to missed social connections. Our project bridges this gap by using artificial intelligence to detect facial expressions in real-time and translate them into haptic feedback. This enables blind individuals to perceive the emotional landscape of their interactions, supporting better communication, deeper connections, and greater confidence in social situations.

What it does

Our system combines computer vision and biometric sensing to detect emotions through two complementary methods:

  1. Facial Expression Analysis: Using a webcam and deep learning models, the system analyzes facial expressions in real-time to identify emotions (happy, sad, angry, surprise, fear, disgust, or neutral).

  2. Biometric Integration: The system integrates with Presage technology to collect heartbeat and breathing rate data, providing additional physiological context that supports and enhances emotion detection accuracy.

When a blind user wants to know someone's emotional state, they perform a long press gesture on the iOS app, which triggers the system to analyze and send the current emotion data to Gemini for processing. The system processes both visual and biometric data, then communicates the detected emotion through distinctive beep sound patterns in the hardware. Each emotion corresponds to a unique auditory pattern, allowing users to quickly and intuitively understand the emotional context of their interactions.

How we built it

Our system is built as a distributed architecture with three main components working together. The core emotion detection engine runs on a Python server, where we use DeepFace's pre-trained emotion recognition models built on TensorFlow to analyze facial expressions in real-time. The system captures video frames from a webcam and processes them through deep learning models to identify seven core emotions. The system captures video frames from a webcam using OpenCV, which handles video capture, frame processing, and image preprocessing. DeepFace then uses the SSD (Single Shot Detector) backend for accurate face detection before analyzing emotions through deep learning models to identify seven core emotions.

The Python server exposes a Flask REST API with two key endpoints: /biometrics receives heartbeat and breathing rate data from the iOS app, while /gemini processes recent emotion detections and uses Google's Gemini AI to provide a summarized, context-aware emotion assessment. We implemented a multi-threaded architecture where the detection loop runs in a separate thread from the Flask server, ensuring real-time processing without blocking API requests. A sliding window maintains the last 5 seconds of emotion data for analysis, allowing the system to provide more stable and context-aware results. The hardware bridge module translates detected emotions into buzzer patterns and sends them to Arduino hardware via USB serial connection.

The iOS companion application, built with SwiftUI, integrates Presage's SmartSpectra technology to capture physiological data such as heartbeat and breathing rates through the device's camera without requiring additional sensors. This biometric data is sent to the Python server via HTTP POST requests, providing additional context that enhances emotion detection accuracy beyond facial analysis alone.

The hardware component consists of an Arduino microcontroller with a buzzer module that receives emotion signals via serial communication. When an emotion is detected, the Arduino plays distinctive beep patterns corresponding to each emotion, providing auditory feedback that blind users can easily interpret.

The complete system flow works as follows: the webcam continuously captures facial expressions and performs analytics using the DeepFace library, while the iOS app collects biometric data in the background. When the user performs a long-press gesture on the iOS app, the Python Flask server captures the most recent data from these sources and sends to Gemini AI, which then processes the recent detection history to provide a refined emotion assessment. The system sends the emotion signal to the Arduino hardware, which gets translated to the corresponding beep patterns.

Several key technical decisions shaped our architecture. We chose multi-modal sensing by combining visual and physiological data to improve accuracy beyond facial analysis alone. The sliding window approach maintains recent history for more stable, context-aware emotion detection. We designed a modular architecture that separates detection, API, and hardware components, enabling independent development and testing.

Challenges we ran into

During development, we encountered several challenges.

One significant challenge was optimizing emotion detection accuracy, particularly for subtle expressions like sadness. The default DeepFace configuration with basic OpenCV backend wasn't providing sufficient accuracy for our use case. We experimented with multiple detector backends (mtcnn, retinaface, SSD) and found that SSD provided the best balance of accuracy and performance for real-time processing.

Moreover, one of the biggest hurdles was integrating Presage's SDK for biometric data collection. Presage's SDK is not publicly released for macOS, so we had to pivot to developing an iOS companion application despite having no prior iOS development experience. Our team quickly learned Swift and SwiftUI to build the iOS app within our project timeline. Additionally, Presage's emotion expression features were not yet available in the SDK, so we had to work with the SDK's existing heartbeat and breathing rate APIs to fulfill our accessibility requirements, then combine this physiological data with our own emotion detection system on the backend.

We also faced dependency management challenges when dealing with Python library version conflicts. The compatibility between NumPy, OpenCV, and TensorFlow required careful version pinning, especially with newer Python versions. We resolved this by creating a strict version constraint in our requirements file and using a virtual environment to isolate dependencies.

One of our original goals was to deliver feedback in a way that felt subtle and non-disruptive during conversations, which led us to explore haptic feedback through vibration. Unfortunately, due to hardware constraints and limited access to a vibration motor within the hackathon timeframe, we weren’t able to implement this approach. As a practical alternative, we pivoted to using a buzzer to generate distinct sound patterns that could still effectively convey the intended signals.

Lastly, we were out of snacks!

Accomplishments that we're proud of

One of our biggest accomplishments was rapidly learning iOS development from scratch to build the Presage integration. Despite having no prior experience with Swift or SwiftUI, we were able to create a working iOS application that successfully captures and transmits biometric data to our backend server.

We're also proud of our multi-modal approach to emotion detection. By combining facial expression analysis with physiological data (heartbeat and breathing rates), we created a more robust and accurate system than either method alone could provide. The integration of Google's Gemini AI further enhances this by providing context-aware emotion assessment that considers recent detection history.

From a technical standpoint, we successfully integrated multiple complex systems: DeepFace emotion models, Presage SDK, Flask REST API, Arduino hardware communication, and a Tkinter UI. These all working together in real-time without blocking or performance issues. Our multi-threaded architecture ensures smooth operation even when processing video, handling API requests, and managing hardware communication simultaneously.

Finally, we're proud that we overcame significant technical challenges, from macOS audio quirks to dependency conflicts to cross-platform hardware communication, and delivered a working prototype that demonstrates the potential of assistive technology to bridge communication gaps.

What we learned

On the mobile development front, we learned iOS development from scratch, mastering Swift, SwiftUI, and the intricacies of iOS app architecture. Working with the Presage SDK taught us how to integrate third-party APIs, handle biometric data collection, and build REST API clients that communicate with backend servers in real-time. For accessibility, we implemented a yellow color theme with high contrast and gesture-based interactions instead of buttons (long press to send data to Gemini and double tap to start the biometric measurement progress) which proved easier for blind users to navigate than traditional button interfaces. This required understanding iOS gesture recognizers and UI state management while keeping accessibility at the forefront of our design decisions.

What's next for VibeSense

Looking ahead, we have several exciting improvements planned for VibeSense. First, we aim to replace the buzzer-based audio feedback with vibration-based haptic feedback. This will allow blind users to feel the emotion patterns through tactile vibrations rather than relying on beep sounds, providing a more intuitive and discreet way to receive emotional information in social situations. Unlike audio beeps that others can hear, vibrations are completely private where only the blind user feels them, allowing for natural, unobtrusive social interactions without drawing attention to the assistive technology.

Second, we plan to enhance facial expression detection accuracy through improved preprocessing techniques, better model fine-tuning, and potentially training custom emotion recognition models optimized for our specific use case. This will help the system better distinguish between subtle emotional expressions and reduce false positives.

Third, when Presage supports emotion detection in the future, we aim to combine emotion detection and physiological data from one camera instead of having two separate cameras: one for DeepFace and one for Presage. This unified approach would simplify the hardware setup, reduce system complexity, and create a more smoother user experience while maintaining the multi-modal benefits of combining visual and biometric data.

Built With

Share this project:

Updates