Real-time controller-free VR interactions using hand-gesture recognition.
SOMO enables intuitive VR interactions without physical controllers by combining lightweight ML gesture classification with Unity XR. Using hand-tracking data from MediaPipe (or Oculus/Meta/Ultraleap), the system recognizes five core gestures and maps them to VR actions like menu toggling, object manipulation, and confirmation.
Key Features:
- 5 gesture classes: Open hand, Fist, Pinch, Point, Thumbs-up
- Real-time classification with < 50ms latency
- REST API for remote inference (FastAPI)
- Interaction mechanics: radial menu, grab/rotate/scale objects
- Hardware-agnostic ML model (31-feature vector)
somo/
├── core/
│ ├── api/ # FastAPI REST API server
│ │ ├── main.py # FastAPI application
│ │ ├── config.py # API configuration
│ │ ├── models/ # Pydantic schemas
│ │ ├── routers/ # API endpoints
│ │ ├── services/ # Business logic
│ │ └── utils/ # Helpers and exceptions
│ └── ml/
│ ├── data/ # Datasets
│ ├── models/ # Trained models
│ └── scripts/ # Training & feature extraction
│
├── unity-vr/ # Unity XR project
│ ├── Assets/
│ │ ├── Scenes/ # MainScene (XR Origin + interaction objects)
│ │ ├── Scripts/ # C# gesture classifier + interaction logic
│ │ └── Prefabs/ # Radial menu, interactable cube
│ ├── Packages/ # XR Interaction Toolkit, Barracuda
│ └── ProjectSettings/
│
├── artifacts/ # Model artifacts for deployment
│ └── model/ # Production models (.pkl)
│
├── docs/ # Documentation
│ ├── api/ # API design documents
│ └── ml/ # ML pipeline docs
│
├── run_api.py # API server launcher
└── test_api_client.py # API test client
| Component | Technology |
|---|---|
| Hand Tracking | MediaPipe Hands (webcam-based, 21 landmarks) |
| ML Framework | scikit-learn (kNN / Random Forest baseline) |
| Model Export | ONNX (for Unity Barracuda inference) |
| API Server | FastAPI + Uvicorn (Python REST API) |
| VR Engine | Unity 2022.3 LTS + XR Interaction Toolkit |
| Runtime Inference | Unity Barracuda (ONNX runtime) |
| Gesture | Action | Detection Logic |
|---|---|---|
| Open Hand | Toggle radial menu | All fingers extended |
| Fist | Idle / No action | All fingers curled |
| Pinch | Grab/release cube | Thumb-index distance < threshold |
| Point | Menu hover | Index extended, others curled |
| Thumbs-up | Confirm selection | Thumb extended, others curled |
# Python environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install ML dependencies
pip install -r requirements.txt
# Install API dependencies (optional, for REST API server)
pip install -r requirements-api.txt
# Unity (for VR integration)
# Install Unity 2022.3 LTS via Unity Hub
# Open unity-vr/ folder as Unity projectThe fastest way to use SOMO is through the REST API:
# 1. Start the API server
python run_api.py
# 2. Test it (in another terminal)
python test_api_client.py
# 3. Access interactive docs
# Open http://localhost:8000/docs in your browserAPI Endpoints:
GET /health- Server health checkPOST /predict/features- Predict from 31-dimensional feature vectorPOST /predict/landmarks- Predict from 21 MediaPipe landmarksPOST /predict/batch- Batch predictions (up to 100 samples)GET /models/info- Model information
Example API Call:
import requests
response = requests.post(
"http://localhost:8000/predict/features",
json={
"features": [0.15] * 31, # 31 feature values
"model": "rf"
}
)
result = response.json()
print(f"Gesture: {result['gesture']}, Confidence: {result['confidence']:.2%}")📖 Full API Documentation: core/api/README.md 🚀 Quick Start Guide: QUICKSTART_API.md
python ml/scripts/record_gestures.py --gesture open_hand --samples 200 --camera 0# Train both KNN and Random Forest
python ml/scripts/train_classifier.py \
--data ml/data/processed/gestures_merged.csv \
--model both \
--output-dir artifacts/model# Test on dataset
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl
# Test with live webcam
python ml/scripts/test_model.py --model artifacts/model/gesture_classifier_rf.pkl --live- Open
unity-vr/in Unity Hub - Load
Assets/Scenes/MainScene.unity - Import trained ONNX model to
Assets/Models/ - Attach
GestureClassifier.csto XR Origin - Enter Play mode (requires VR headset or simulator)
| Document | Description |
|---|---|
| QUICKSTART_API.md | Get API running in 3 steps |
| core/api/README.md | API usage guide with examples |
| docs/api/FASTAPI_DESIGN.md | Complete API design document |
| IMPLEMENTATION_SUMMARY.md | Implementation details |
| REFACTORING_SUMMARY.md | Code refactoring documentation |
| Metric | Value |
|---|---|
| API Latency (P50) | ~2-5ms |
| API Latency (P99) | <50ms |
| Model Size (KNN) | 501 KB |
| Model Size (RF) | 2.1 MB |
| Accuracy (KNN) | ~92% |
| Accuracy (RF) | ~96% |
| Throughput | 200-300 req/s |
- WebSocket support for streaming predictions
- API authentication (JWT tokens, API keys)
- Rate limiting and request throttling
- Model versioning and A/B testing
- Prometheus metrics and monitoring
- More Gestures: Swipe, rotate, peace sign
- Continuous Tracking: LSTM for temporal patterns
- Multi-Hand: Simultaneous two-hand gestures
- Model Improvements: Deep learning models, data augmentation
- Haptic Feedback: Vibration via controller-free haptics
- Production VR: Migrate to Oculus/Meta native hand tracking
- Multi-platform: Support for Quest, Pico, Vive
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - see LICENSE file for details.