YOLO-3D is an integrated computer vision system that combines YOLOv11 for object detection, Depth Anything v2 for depth estimation, and Segment Anything Model (SAM 2.0) for instance segmentation. This creates a comprehensive 3D scene understanding pipeline from a single camera input.
- Multi-model Integration: Combines state-of-the-art models for detection, depth, and segmentation
- 3D Visualization: Renders 3D bounding boxes with accurate depth perception
- Bird's Eye View: Top-down spatial visualization of detected objects
- Instance Segmentation: Precise object masks using SAM 2.0
- Object Tracking: Track objects across video frames
- Modern GUI: User-friendly interface for video processing and visualization
- Multi-view Display: View original, detection, depth, segmentation, and 3D results simultaneously
- Python 3.8+
- PyTorch 2.0+
- CUDA compatible GPU recommended (though CPU mode works)
- Other dependencies listed in
requirements.txt
-
Clone this repository:
https://github.com/Pavankunchala/Yolo-3d-GUI.git cd YOLO-3d-GUI -
Install dependencies:
pip install -r requirements.txt
-
Models will be downloaded automatically on first run or you can download them manually:
# YOLOv11 Nano model wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov11n.pt # SAM 2.0 Base model wget https://github.com/ultralytics/assets/releases/download/v8.1.0/sam2_b.pt
The easiest way to use YOLO-3D is through the GUI:
python yolo3d_gui.pyThis will open the interface where you can:
- Select video files or webcam input
- Choose models and adjust parameters
- Toggle different features
- View all visualization modes
- Save processed video or individual frames
For batch processing or integration into other pipelines, you can use the command line:
python run.py --source path/to/video.mp4 --output output.mp4 --yolo nano --depth small --sam sam2_b.ptYou can also use the components in your own Python code:
from detection_model import ObjectDetector
from depth_model import DepthEstimator
from segmentation_model import SegmentationModel
from bbox3d_utils import BBox3DEstimator, BirdEyeView
# Initialize models
detector = ObjectDetector(model_size="nano")
depth_estimator = DepthEstimator(model_size="small")
segmenter = SegmentationModel(model_name="sam2_b.pt")
bbox3d_estimator = BBox3DEstimator()
bev = BirdEyeView(scale=60, size=(300, 300))
# Process a frame
frame = cv2.imread("example.jpg")
detections = detector.detect(frame)
depth_map = depth_estimator.estimate_depth(frame)
segmentation = segmenter.segment_with_boxes(frame, [d[0] for d in detections])
# Create 3D visualization
for detection, seg_result in zip(detections, segmentation):
bbox, score, class_id, obj_id = detection
depth_value = depth_estimator.get_depth_in_region(depth_map, bbox)
box_3d = {'bbox_2d': bbox, 'depth_value': depth_value, 'class_name': detector.get_class_names()[class_id], 'mask': seg_result['mask']}
frame = bbox3d_estimator.draw_box_3d(frame, box_3d)YOLO-3D/
│
├── yolo3d_gui.py # GUI application
├── run.py # Command line entry point
├── detection_model.py # YOLOv11 detector implementation
├── depth_model.py # Depth Anything v2 implementation
├── segmentation_model.py # SAM 2.0 implementation
├── bbox3d_utils.py # 3D bounding box and BEV utilities
├── requirements.txt # Project dependencies
├── assets/ # Demo images/videos
└── README.md # This file
For better performance:
- Use smaller models (nano/small) for real-time applications
- Enable frame skipping for segmentation (every 2-3 frames)
- Process at a lower resolution (640x480) for faster inference
- Use CUDA GPU for acceleration (10-20x faster than CPU)
- Consider batch processing for offline video analysis
-
Yolo-3d for the initial scripts
-
Ultralytics for YOLOv11 and SAM implementations
-
Depth Anything for the depth estimation model
-
PyQt5 for the GUI framework
This project is released under the MIT License. See the LICENSE file for details.
If you find this project useful, consider giving it a star! For issues, feature requests, or contributions, please open an issue or pull request.
