Skip to content

jimezsa/Deep_Assembly_Lines

Repository files navigation

Human Activity Understanding - Deep Assembly Lines

GitHub Repo Banner

A multi-camera 3D scene visualization platform for monitoring battery and screw assembly processes. Integrates YOLOv11 for segmentation, DOPE for 6D object pose estimation, and VGGT for real-time 3D scene reconstruction using synchronized video recordings.

Demo

x5Demo

Installation

1. Create Conda Environment

conda create -n HAUP python=3.10 -y
conda activate HAUP

2. Install PyTorch

For macOS (Apple Silicon - M1/M2/M3):

conda install pytorch::pytorch torchvision torchaudio -c pytorch -y

For NVIDIA GPU (CUDA 12.1):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3. Install Dependencies

pip install -r requirements.txt

Run

python 3d_scene/3dscene.py

Open your browser at http://localhost:8085

πŸ“ Project Structure

β”œβ”€β”€ 3d_scene/                    # Main application
β”‚   β”œβ”€β”€ 3dscene.py              # Backend server (aiohttp)
β”‚   β”œβ”€β”€ web_interface.html      # 3D visualization frontend (Three.js)
β”‚   β”œβ”€β”€ screw_sequence_tracker.py   # Screw sequence state machine
β”‚   β”œβ”€β”€ sequence_from_distance_tool.py  # CLI monitoring tool
β”‚   β”œβ”€β”€ distance_tool_screw.py  # Distance API client
β”‚   β”œβ”€β”€ dope_inference.py       # DOPE 6D pose estimation
β”‚   β”œβ”€β”€ yolo_inference.py       # YOLOv11 segmentation
β”‚   β”œβ”€β”€ vggt_inference.py       # 3D point cloud reconstruction
β”‚   β”œβ”€β”€ battery_fsm_module.py   # Battery tracking state machine (YOLO-based)
β”‚   └── config/                 # Camera calibrations & DOPE config
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ recording_1-12/         # Multi-camera recordings (8 cameras each)
β”‚   β”œβ”€β”€ scanned_objects/        # 3D models (case, e-screwdriver)
β”‚   └── cams_calibrations.yml   # Camera calibration data
β”‚
β”œβ”€β”€ weights/                    # Model weights
β”‚   β”œβ”€β”€ dope_tool.pth          # DOPE weights for screwdriver
β”‚   β”œβ”€β”€ dope_case.pth          # DOPE weights for case
β”‚   └── model.pt               # YOLOv11 finetuned weights
β”‚
β”œβ”€β”€ frameworks/                 # External frameworks
β”‚   β”œβ”€β”€ dope/                  # DOPE implementation
β”‚   └── vggt/                  # VGGT point cloud
β”‚
└── yolov11_finetuned/         # YOLOv11 training & testing

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This project was developed as part of the Practical Laboratory: Human Activity Understanding at the Technical University of Munich (TUM), Chair of Media Technology, supervised by Prof. Dr.-Ing. Eckehard Steinbach.

Huge Thanks to My Teammates

Research Works Used

About

πŸ› οΈ 3D-Assembly-Vision: Real-time 3D monitoring for industrial assembly. Powered by YOLOv11, DOPE, VGGT and LSTM for synchronized 6D pose estimation and scene reconstruction.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors