Revolutionary Single-Class Object Detection
Ultra-fast YOLO architecture optimized exclusively for single-class detection
Quick Start β’ Benchmarks β’ Documentation β’ Examples
β οΈ WORK IN PROGRESS - YOLO-One is currently under active development. The architecture is functional with promising initial results. Star β this repository to stay updated on our progress!
While existing YOLO models excel at multi-class detection, most real-world applications only need to detect ONE type of object. YOLO-One is the first YOLO architecture designed from the ground up for single-class detection:
- β‘ Faster inference with optimized single-class architecture
- π¦ 3x smaller model size (1.9MB vs 6.2MB YOLOv8n)
- π― Same accuracy for single-class tasks (target)
- π Lower power consumption (mobile/edge optimized)
- πΎ Reduced memory usage (streamlined architecture)
- π Multi-platform deployment ready
Model Size: 2MB (vs 6.2MB YOLOv8n) π¦ 3x smaller
Parameters: 750K (vs ~3M YOLOv8n) β‘ 4x fewer params
Inference Time: 2.56ms (vs 9ms YOLOv8n) π 5x faster
Channels: 5 per scale (vs 85 COCO) π― Single-class optimized
Memory Format: Float32 (FP16/INT8 planned) πΎ Further optimization ready| Platform | Resolution | Current FPS | Target (Optimized) | Improvement Path |
|---|---|---|---|---|
| Development GPU | 640x640 | 140 FPS | 400+ FPS | +TensorRT +FP16 |
| Inference Time | 640x640 | 2.56ms | ~0.5ms | +Optimizations |
# Performance Projection Pipeline
Current: 140 FPS # PyTorch Float32
Step 1: 250+ FPS # + torch.compile
Step 2: 350+ FPS # + TensorRT
Step 3: 600+ FPS # + FP16 precision
Mobile: TBD # Core ML / TFLite- iOS: Core ML export (in development)
- Android: TensorFlow Lite export (in development)
- Edge Devices: ONNX export ready
- ARM Optimization: Native support planned
- Windows: β PyTorch ready, TensorRT planned
- Linux: β PyTorch ready, CUDA optimization
- macOS: β PyTorch ready, Metal planned
- Docker: Container-ready architecture
- ONNX: Export capability implemented
- TensorRT: High-priority optimization
- Serverless: Lightweight deployment ready
pip install git+https://github.com/IAtrax/YOLO-One.git
# Run architecture test script to validate installation
python tests/test_architecture.pyimport torch
from yolo_one.models import YoloOne
# Create model
model = YoloOne(model_size='nano')
# Test inference
input_tensor = torch.randn(1, 3, 640, 640)
predictions = model(input_tensor)
print(f"Model size: {model.count_parameters():,} parameters")
# Output: Model size: 485,312 parameters# Traditional YOLO (multi-class)
output_channels = 4 + 1 + num_classes # bbox + conf + classes
# Example: 4 + 1 + 80 = 85 channels for COCO
# YOLO-One (single-class)
output_channels = 4 + 1 # bbox + fused_confidence
# Always: 5 channels only! π―- No class probability computation (always 1 class)
- Simplified NMS (no per-class separation)
- Fused confidence (objectness + class probability)
- Streamlined post-processing pipeline
- Optimized loss function for single-class
| Metric | Current (Nano) | Target (Optimized) | YOLOv8n Baseline |
|---|---|---|---|
| Model Size | β 800KB | 500KB | 6.2MB |
| Parameters | β 750K | 255K | ~3M |
| Inference (GPU) | 140 FPS | 500+ FPS | ~110 FPS |
| Memory Usage | <100MB | <50MB | ~2GB |
| Accuracy (mAP) | TBD | Same | Baseline |
- Core Architecture: Backbone + Neck + Head
- Multi-scale Detection: P3, P4, P5 outputs
- Single-class Optimization: 5-channel output
- Model Variants: nano, small, medium, large
- Basic Testing: Architecture validation
- Multi-resolution: 320-640px support
- Training Pipeline: Loss function + trainer
- Benchmark Suite: vs YOLOv8n comparison
- Export Pipeline: ONNX, TensorRT, Core ML
- Mobile Optimization: INT8 quantization
- Documentation: Complete API docs
- Training Pipeline - Validate accuracy claims
- TensorRT Export - Achieve speed targets
- Benchmark Suite - Automated comparisons
- Mobile Deployment - iOS/Android support
# Component breakdown
Backbone: ~700K params (82%) # Feature extraction
Neck: ~40K params (14%) # Feature fusion
Head: ~1K params (3%) # Detection output
Total: ~750K params # Ultra-lightweight# Multi-resolution memory usage
640x640: ~5MB GPU memory
# Scales efficiently with resolutionto launch training pipeline, run the following command:
Train with custom parameters
python train.py \
--data path/to/dataset/directory \
--config path/to/config/file \
--model-size nano \
--epochs 500 \
--batch-size 16 \
--lr 0.001 \
--output-dir path/to/output \
--device cudato launch the inference pipepline, use de following command:
python detect.py \
--source path/to/image.jpg \
--weights path/to/model.pt \
--device cuda
--output-dir runs/detect
--model-size nano
--input-size 640
--conf 0.5
--iou 0.5We welcome contributions! Key areas:
- Training Pipeline Development
- Mobile Optimization
- Benchmark Implementation
- Documentation & Examples
- Export Format Support
See CONTRIBUTING.md for guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
π Ready to revolutionize single-class object detection?
β Star this repository β’ π Fork and contribute β’ π Read the docs
Built with β€οΈ by the IAtrax team