SOMO — Hand-Gesture VR Interaction System

Real-time controller-free VR interactions using hand-gesture recognition.

Overview

SOMO enables intuitive VR interactions without physical controllers by combining lightweight ML gesture classification with Unity XR. Using hand-tracking data from MediaPipe (or Oculus/Meta/Ultraleap), the system recognizes five core gestures and maps them to VR actions like menu toggling, object manipulation, and confirmation.

Key Features:

5 gesture classes: Open hand, Fist, Pinch, Point, Thumbs-up
Real-time classification with < 50ms latency
REST API for remote inference (FastAPI)
Interaction mechanics: radial menu, grab/rotate/scale objects
Hardware-agnostic ML model (31-feature vector)

Project Structure

somo/
├── core/
│   ├── api/                # FastAPI REST API server
│   │   ├── main.py         # FastAPI application
│   │   ├── config.py       # API configuration
│   │   ├── models/         # Pydantic schemas
│   │   ├── routers/        # API endpoints
│   │   ├── services/       # Business logic
│   │   └── utils/          # Helpers and exceptions
│   └── ml/
│       ├── data/           # Datasets
│       ├── models/         # Trained models
│       └── scripts/        # Training & feature extraction
│
├── unity-vr/              # Unity XR project
│   ├── Assets/
│   │   ├── Scenes/        # MainScene (XR Origin + interaction objects)
│   │   ├── Scripts/       # C# gesture classifier + interaction logic
│   │   └── Prefabs/       # Radial menu, interactable cube
│   ├── Packages/          # XR Interaction Toolkit, Barracuda
│   └── ProjectSettings/
│
├── artifacts/             # Model artifacts for deployment
│   └── model/             # Production models (.pkl)
│
├── docs/                  # Documentation
│   ├── api/               # API design documents
│   └── ml/                # ML pipeline docs
│
├── run_api.py             # API server launcher
└── test_api_client.py     # API test client

Tech Stack

Component	Technology
Hand Tracking	MediaPipe Hands (webcam-based, 21 landmarks)
ML Framework	scikit-learn (kNN / Random Forest baseline)
Model Export	ONNX (for Unity Barracuda inference)
API Server	FastAPI + Uvicorn (Python REST API)
VR Engine	Unity 2022.3 LTS + XR Interaction Toolkit
Runtime Inference	Unity Barracuda (ONNX runtime)

Gesture Classes

Gesture	Action	Detection Logic
Open Hand	Toggle radial menu	All fingers extended
Fist	Idle / No action	All fingers curled
Pinch	Grab/release cube	Thumb-index distance < threshold
Point	Menu hover	Index extended, others curled
Thumbs-up	Confirm selection	Thumb extended, others curled

Getting Started

Prerequisites

# Python environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install ML dependencies
pip install -r requirements.txt

# Install API dependencies (optional, for REST API server)
pip install -r requirements-api.txt

# Unity (for VR integration)
# Install Unity 2022.3 LTS via Unity Hub
# Open unity-vr/ folder as Unity project

Quick Start: API Server

The fastest way to use SOMO is through the REST API:

# 1. Start the API server
python run_api.py

# 2. Test it (in another terminal)
python test_api_client.py

# 3. Access interactive docs
# Open http://localhost:8000/docs in your browser

API Endpoints:

GET /health - Server health check
POST /predict/features - Predict from 31-dimensional feature vector
POST /predict/landmarks - Predict from 21 MediaPipe landmarks
POST /predict/batch - Batch predictions (up to 100 samples)
GET /models/info - Model information

Example API Call:

import requests

response = requests.post(
    "http://localhost:8000/predict/features",
    json={
        "features": [0.15] * 31,  # 31 feature values
        "model": "rf"
    }
)

result = response.json()
print(f"Gesture: {result['gesture']}, Confidence: {result['confidence']:.2%}")

📖 Full API Documentation: core/api/README.md 🚀 Quick Start Guide: QUICKSTART_API.md

ML Pipeline Usage

Recording Gestures

python ml/scripts/record_gestures.py --gesture open_hand --samples 200 --camera 0

Training Models

# Train both KNN and Random Forest
python ml/scripts/train_classifier.py \
  --data ml/data/processed/gestures_merged.csv \
  --model both \
  --output-dir artifacts/model

Testing Models

# Test on dataset
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl

# Test with live webcam
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl --live

Unity VR Integration

Open unity-vr/ in Unity Hub
Load Assets/Scenes/MainScene.unity
Import trained ONNX model to Assets/Models/
Attach GestureClassifier.cs to XR Origin
Enter Play mode (requires VR headset or simulator)

Documentation

Document	Description
QUICKSTART_API.md	Get API running in 3 steps
core/api/README.md	API usage guide with examples
docs/api/FASTAPI_DESIGN.md	Complete API design document
IMPLEMENTATION_SUMMARY.md	Implementation details
REFACTORING_SUMMARY.md	Code refactoring documentation

Performance

Metric	Value
API Latency (P50)	~2-5ms
API Latency (P99)	<50ms
Model Size (KNN)	501 KB
Model Size (RF)	2.1 MB
Accuracy (KNN)	~92%
Accuracy (RF)	~96%
Throughput	200-300 req/s

Future Enhancements (V2 Roadmap)

API Server

WebSocket support for streaming predictions
API authentication (JWT tokens, API keys)
Rate limiting and request throttling
Model versioning and A/B testing
Prometheus metrics and monitoring

ML Pipeline

More Gestures: Swipe, rotate, peace sign
Continuous Tracking: LSTM for temporal patterns
Multi-Hand: Simultaneous two-hand gestures
Model Improvements: Deep learning models, data augmentation

VR Integration

Haptic Feedback: Vibration via controller-free haptics
Production VR: Migrate to Oculus/Meta native hand tracking
Multi-platform: Support for Quest, Pico, Vive

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOMO — Hand-Gesture VR Interaction System

Overview

Project Structure

Tech Stack

Gesture Classes

Getting Started

Prerequisites

Quick Start: API Server

ML Pipeline Usage

Recording Gestures

Training Models

Testing Models

Unity VR Integration

Documentation

Performance

Future Enhancements (V2 Roadmap)

API Server

ML Pipeline

VR Integration

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
artifacts/model		artifacts/model
core		core
deploy		deploy
docs		docs
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
build_test_model.ipynb		build_test_model.ipynb
docker-compose.yml		docker-compose.yml
main.py		main.py
requirements-api.txt		requirements-api.txt
requirements.txt		requirements.txt
test_api_client.py		test_api_client.py

Folders and files

Latest commit

History

Repository files navigation

SOMO — Hand-Gesture VR Interaction System

Overview

Project Structure

Tech Stack

Gesture Classes

Getting Started

Prerequisites

Quick Start: API Server

ML Pipeline Usage

Recording Gestures

Training Models

Testing Models

Unity VR Integration

Documentation

Performance

Future Enhancements (V2 Roadmap)

API Server

ML Pipeline

VR Integration

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages