A multi-camera 3D scene visualization platform for monitoring battery and screw assembly processes. Integrates YOLOv11 for segmentation, DOPE for 6D object pose estimation, and VGGT for real-time 3D scene reconstruction using synchronized video recordings.
conda create -n HAUP python=3.10 -y
conda activate HAUPFor macOS (Apple Silicon - M1/M2/M3):
conda install pytorch::pytorch torchvision torchaudio -c pytorch -yFor NVIDIA GPU (CUDA 12.1):
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121pip install -r requirements.txtpython 3d_scene/3dscene.pyOpen your browser at http://localhost:8085
βββ 3d_scene/ # Main application
β βββ 3dscene.py # Backend server (aiohttp)
β βββ web_interface.html # 3D visualization frontend (Three.js)
β βββ screw_sequence_tracker.py # Screw sequence state machine
β βββ sequence_from_distance_tool.py # CLI monitoring tool
β βββ distance_tool_screw.py # Distance API client
β βββ dope_inference.py # DOPE 6D pose estimation
β βββ yolo_inference.py # YOLOv11 segmentation
β βββ vggt_inference.py # 3D point cloud reconstruction
β βββ battery_fsm_module.py # Battery tracking state machine (YOLO-based)
β βββ config/ # Camera calibrations & DOPE config
β
βββ data/
β βββ recording_1-12/ # Multi-camera recordings (8 cameras each)
β βββ scanned_objects/ # 3D models (case, e-screwdriver)
β βββ cams_calibrations.yml # Camera calibration data
β
βββ weights/ # Model weights
β βββ dope_tool.pth # DOPE weights for screwdriver
β βββ dope_case.pth # DOPE weights for case
β βββ model.pt # YOLOv11 finetuned weights
β
βββ frameworks/ # External frameworks
β βββ dope/ # DOPE implementation
β βββ vggt/ # VGGT point cloud
β
βββ yolov11_finetuned/ # YOLOv11 training & testing
This project is licensed under the MIT License - see the LICENSE file for details.
This project was developed as part of the Practical Laboratory: Human Activity Understanding at the Technical University of Munich (TUM), Chair of Media Technology, supervised by Prof. Dr.-Ing. Eckehard Steinbach.
- LucΓa Balsa Picado (luciabalsa)
- Ioannis Papadongonas (ipapadongonas)
-
DOPE (Deep Object Pose Estimation) - 6D pose estimation for object detection https://github.com/NVlabs/Deep_Object_Pose
-
VGGT (Visual Geometry Grounded Transformer) - 3D scene reconstruction https://vgg-t.github.io/
-
YOLO (You Only Look Once, by Ultralytics) - state-of-the-art real-time object detection https://github.com/ultralytics/ultralytics

