Skip to content

dark-enstein/somo

Repository files navigation

SOMO — Hand-Gesture VR Interaction System

Real-time controller-free VR interactions using hand-gesture recognition.

Overview

SOMO enables intuitive VR interactions without physical controllers by combining lightweight ML gesture classification with Unity XR. Using hand-tracking data from MediaPipe (or Oculus/Meta/Ultraleap), the system recognizes five core gestures and maps them to VR actions like menu toggling, object manipulation, and confirmation.

Key Features:

  • 5 gesture classes: Open hand, Fist, Pinch, Point, Thumbs-up
  • Real-time classification with < 50ms latency
  • REST API for remote inference (FastAPI)
  • Interaction mechanics: radial menu, grab/rotate/scale objects
  • Hardware-agnostic ML model (31-feature vector)

Project Structure

somo/
├── core/
│   ├── api/                # FastAPI REST API server
│   │   ├── main.py         # FastAPI application
│   │   ├── config.py       # API configuration
│   │   ├── models/         # Pydantic schemas
│   │   ├── routers/        # API endpoints
│   │   ├── services/       # Business logic
│   │   └── utils/          # Helpers and exceptions
│   └── ml/
│       ├── data/           # Datasets
│       ├── models/         # Trained models
│       └── scripts/        # Training & feature extraction
│
├── unity-vr/              # Unity XR project
│   ├── Assets/
│   │   ├── Scenes/        # MainScene (XR Origin + interaction objects)
│   │   ├── Scripts/       # C# gesture classifier + interaction logic
│   │   └── Prefabs/       # Radial menu, interactable cube
│   ├── Packages/          # XR Interaction Toolkit, Barracuda
│   └── ProjectSettings/
│
├── artifacts/             # Model artifacts for deployment
│   └── model/             # Production models (.pkl)
│
├── docs/                  # Documentation
│   ├── api/               # API design documents
│   └── ml/                # ML pipeline docs
│
├── run_api.py             # API server launcher
└── test_api_client.py     # API test client

Tech Stack

Component Technology
Hand Tracking MediaPipe Hands (webcam-based, 21 landmarks)
ML Framework scikit-learn (kNN / Random Forest baseline)
Model Export ONNX (for Unity Barracuda inference)
API Server FastAPI + Uvicorn (Python REST API)
VR Engine Unity 2022.3 LTS + XR Interaction Toolkit
Runtime Inference Unity Barracuda (ONNX runtime)

Gesture Classes

Gesture Action Detection Logic
Open Hand Toggle radial menu All fingers extended
Fist Idle / No action All fingers curled
Pinch Grab/release cube Thumb-index distance < threshold
Point Menu hover Index extended, others curled
Thumbs-up Confirm selection Thumb extended, others curled

Getting Started

Prerequisites

# Python environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install ML dependencies
pip install -r requirements.txt

# Install API dependencies (optional, for REST API server)
pip install -r requirements-api.txt

# Unity (for VR integration)
# Install Unity 2022.3 LTS via Unity Hub
# Open unity-vr/ folder as Unity project

Quick Start: API Server

The fastest way to use SOMO is through the REST API:

# 1. Start the API server
python run_api.py

# 2. Test it (in another terminal)
python test_api_client.py

# 3. Access interactive docs
# Open http://localhost:8000/docs in your browser

API Endpoints:

  • GET /health - Server health check
  • POST /predict/features - Predict from 31-dimensional feature vector
  • POST /predict/landmarks - Predict from 21 MediaPipe landmarks
  • POST /predict/batch - Batch predictions (up to 100 samples)
  • GET /models/info - Model information

Example API Call:

import requests

response = requests.post(
    "http://localhost:8000/predict/features",
    json={
        "features": [0.15] * 31,  # 31 feature values
        "model": "rf"
    }
)

result = response.json()
print(f"Gesture: {result['gesture']}, Confidence: {result['confidence']:.2%}")

📖 Full API Documentation: core/api/README.md 🚀 Quick Start Guide: QUICKSTART_API.md


ML Pipeline Usage

Recording Gestures

python ml/scripts/record_gestures.py --gesture open_hand --samples 200 --camera 0

Training Models

# Train both KNN and Random Forest
python ml/scripts/train_classifier.py \
  --data ml/data/processed/gestures_merged.csv \
  --model both \
  --output-dir artifacts/model

Testing Models

# Test on dataset
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl

# Test with live webcam
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl --live

Unity VR Integration

  1. Open unity-vr/ in Unity Hub
  2. Load Assets/Scenes/MainScene.unity
  3. Import trained ONNX model to Assets/Models/
  4. Attach GestureClassifier.cs to XR Origin
  5. Enter Play mode (requires VR headset or simulator)

Documentation

Document Description
QUICKSTART_API.md Get API running in 3 steps
core/api/README.md API usage guide with examples
docs/api/FASTAPI_DESIGN.md Complete API design document
IMPLEMENTATION_SUMMARY.md Implementation details
REFACTORING_SUMMARY.md Code refactoring documentation

Performance

Metric Value
API Latency (P50) ~2-5ms
API Latency (P99) <50ms
Model Size (KNN) 501 KB
Model Size (RF) 2.1 MB
Accuracy (KNN) ~92%
Accuracy (RF) ~96%
Throughput 200-300 req/s

Future Enhancements (V2 Roadmap)

API Server

  • WebSocket support for streaming predictions
  • API authentication (JWT tokens, API keys)
  • Rate limiting and request throttling
  • Model versioning and A/B testing
  • Prometheus metrics and monitoring

ML Pipeline

  • More Gestures: Swipe, rotate, peace sign
  • Continuous Tracking: LSTM for temporal patterns
  • Multi-Hand: Simultaneous two-hand gestures
  • Model Improvements: Deep learning models, data augmentation

VR Integration

  • Haptic Feedback: Vibration via controller-free haptics
  • Production VR: Migrate to Oculus/Meta native hand tracking
  • Multi-platform: Support for Quest, Pico, Vive

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

MIT License - see LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors