YOLO-3D: Monocular 3D Object Detection System

YOLO-3D is an integrated computer vision system that combines YOLOv11 for object detection, Depth Anything v2 for depth estimation, and Segment Anything Model (SAM 2.0) for instance segmentation. This creates a comprehensive 3D scene understanding pipeline from a single camera input.

Features

Multi-model Integration: Combines state-of-the-art models for detection, depth, and segmentation
3D Visualization: Renders 3D bounding boxes with accurate depth perception
Bird's Eye View: Top-down spatial visualization of detected objects
Instance Segmentation: Precise object masks using SAM 2.0
Object Tracking: Track objects across video frames
Modern GUI: User-friendly interface for video processing and visualization
Multi-view Display: View original, detection, depth, segmentation, and 3D results simultaneously

Requirements

Python 3.8+
PyTorch 2.0+
CUDA compatible GPU recommended (though CPU mode works)
Other dependencies listed in requirements.txt

Installation

Clone this repository:

https://github.com/Pavankunchala/Yolo-3d-GUI.git
cd YOLO-3d-GUI

Install dependencies:
```
pip install -r requirements.txt
```

Models will be downloaded automatically on first run or you can download them manually:

# YOLOv11 Nano model
wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov11n.pt

# SAM 2.0 Base model
wget https://github.com/ultralytics/assets/releases/download/v8.1.0/sam2_b.pt

Usage

GUI Interface

The easiest way to use YOLO-3D is through the GUI:

python yolo3d_gui.py

This will open the interface where you can:

Select video files or webcam input
Choose models and adjust parameters
Toggle different features
View all visualization modes
Save processed video or individual frames

Command Line

For batch processing or integration into other pipelines, you can use the command line:

python run.py --source path/to/video.mp4 --output output.mp4 --yolo nano --depth small --sam sam2_b.pt

API Usage

You can also use the components in your own Python code:

from detection_model import ObjectDetector
from depth_model import DepthEstimator
from segmentation_model import SegmentationModel
from bbox3d_utils import BBox3DEstimator, BirdEyeView

# Initialize models
detector = ObjectDetector(model_size="nano")
depth_estimator = DepthEstimator(model_size="small")
segmenter = SegmentationModel(model_name="sam2_b.pt")
bbox3d_estimator = BBox3DEstimator()
bev = BirdEyeView(scale=60, size=(300, 300))

# Process a frame
frame = cv2.imread("example.jpg")
detections = detector.detect(frame)
depth_map = depth_estimator.estimate_depth(frame)
segmentation = segmenter.segment_with_boxes(frame, [d[0] for d in detections])

# Create 3D visualization
for detection, seg_result in zip(detections, segmentation):
    bbox, score, class_id, obj_id = detection
    depth_value = depth_estimator.get_depth_in_region(depth_map, bbox)
    box_3d = {'bbox_2d': bbox, 'depth_value': depth_value, 'class_name': detector.get_class_names()[class_id], 'mask': seg_result['mask']}
    frame = bbox3d_estimator.draw_box_3d(frame, box_3d)

Project Structure

YOLO-3D/
│
├── yolo3d_gui.py           # GUI application
├── run.py                  # Command line entry point
├── detection_model.py      # YOLOv11 detector implementation
├── depth_model.py          # Depth Anything v2 implementation
├── segmentation_model.py   # SAM 2.0 implementation
├── bbox3d_utils.py         # 3D bounding box and BEV utilities
├── requirements.txt        # Project dependencies
├── assets/                 # Demo images/videos
└── README.md               # This file

Performance Optimization

For better performance:

Use smaller models (nano/small) for real-time applications
Enable frame skipping for segmentation (every 2-3 frames)
Process at a lower resolution (640x480) for faster inference
Use CUDA GPU for acceleration (10-20x faster than CPU)
Consider batch processing for offline video analysis

Acknowledgments

Yolo-3d for the initial scripts
Ultralytics for YOLOv11 and SAM implementations
Depth Anything for the depth estimation model
PyQt5 for the GUI framework

License

This project is released under the MIT License. See the LICENSE file for details.

If you find this project useful, consider giving it a star! For issues, feature requests, or contributions, please open an issue or pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLO-3D: Monocular 3D Object Detection System

Features

Requirements

Installation

Usage

GUI Interface

Command Line

API Usage

Project Structure

Performance Optimization

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
README.md		README.md
bbox3d_utils.py		bbox3d_utils.py
depth_model.py		depth_model.py
detection_model.py		detection_model.py
load_camera_params.py		load_camera_params.py
requirements.txt		requirements.txt
run.py		run.py
segmentation_model.py		segmentation_model.py
yolo3d_gui.py		yolo3d_gui.py

Folders and files

Latest commit

History

Repository files navigation

YOLO-3D: Monocular 3D Object Detection System

Features

Requirements

Installation

Usage

GUI Interface

Command Line

API Usage

Project Structure

Performance Optimization

Acknowledgments

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages