Skip to content

levipereira/deepstream-sahi

Repository files navigation

DeepStream SAHI

License DeepStream TensorRT

GStreamer plugins that bring SAHI slicing to NVIDIA DeepStream. The project keeps slicing, inference, and merge steps inside the DeepStream pipeline, using nvinfer for TensorRT execution and NvDsObjectMeta for post-processing.

Overview

This repository provides two plugins:

  • nvsahipreprocess: computes frame slices, crops them on GPU, and prepares the input for nvinfer
  • nvsahipostprocess: merges overlapping detections produced at slice boundaries using GreedyNMM

Typical pipeline:

nvstreammux -> nvsahipreprocess -> nvinfer -> nvsahipostprocess -> nvtracker -> nvdsosd

Architecture

Most SAHI integrations around DeepStream run outside the pipeline, often in Python. This project keeps the workflow inside GStreamer so it can work with standard DeepStream components such as tracking, analytics, message brokers, and display elements.

Key points:

  • SAHI slicing implemented as DeepStream plugins
  • TensorRT inference handled by nvinfer
  • Support for DeepStream 8.x and 9.x
  • Test scripts and sample models included in the repository

Compatibility

Component DeepStream 8.0 DeepStream 9.0
DeepStream SDK 8.0 9.0
CUDA Toolkit 12.8 13.1
TensorRT 10.9.0 10.14.1
GStreamer 1.24.2 1.24.2
Python bindings pyds 1.2.2 built from source

The install.sh script detects the installed DeepStream version for Python bindings, builds the SAHI GStreamer plugins, and builds and installs libnvds_infer_yolo.so from deepstream_source/libs/nvdsinfer_yolo. That library is required to run the bundled ONNX models: they use TensorRT’s EfficientNMS post-processing, and this project’s parser decodes that output (it is not NVIDIA’s stock sample parser). Core TensorRT execution still uses the SDK’s libnvds_infer.so with nvinfer (not rebuilt here). Licensing: see LICENSE at the repository root and in deepstream_source/libs/nvdsinfer_yolo/.

Quick Start

This repository uses Git LFS for ONNX model files.

git lfs install
git clone https://github.com/levipereira/deepstream-sahi.git
cd deepstream-sahi

Run a DeepStream container:

docker run -it --name deepstream-sahi --net=host --gpus all \
    -v `pwd`:/apps/deepstream-sahi \
    -w /apps/deepstream-sahi \
    nvcr.io/nvidia/deepstream:9.0-triton-multiarch

Inside the container:

/apps/deepstream-sahi/install.sh
source /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/pyds/bin/activate
cd /apps/deepstream-sahi/python_test/deepstream-test-sahi
python3 deepstream_test_sahi.py --model visdrone-full-640 --no-display --csv -i ../videos/aerial_crowding_01.mp4

Test videos are available on Google Drive and should be placed in python_test/videos/.

For container variants, display notes, rebuild mode, and environment details, see docs/INSTALL.md.

Documentation

Document Description
Installation Guide container setup, dependencies, plugin build
Usage Guide pipeline execution, CLI arguments, result comparison
Plugin Reference plugin properties and behavior
Training Guide training workflow for sliced models
Test Results evaluation data and charts
Parameter Tests — Vehicles postprocess parameter validation (moderate density)
Parameter Tests — Dense Crowd postprocess parameter validation (high density)

Repository Structure

deepstream-sahi/
├── deepstream_source/
│   ├── gst-plugins/
│   │   ├── gst-nvsahipreprocess/
│   │   └── gst-nvsahipostprocess/
│   └── libs/
│       └── nvdsinfer_yolo/
├── python_test/
│   ├── common/
│   ├── deepstream-test-sahi/
│   └── videos/
├── train_yolov9_visdrone/
├── test_results/
├── scripts/
│   └── test_postprocess_params.sh
├── docs/
├── install.sh
└── README.md

Included Components

nvsahipreprocess

  • computes slice windows for each frame
  • crops and rescales slices with NvBufSurfTransform
  • forwards the resulting data to nvinfer

nvsahipostprocess (v1.2)

  • reads detections from NvDsObjectMeta
  • merges duplicates created by overlapping slices using a two-phase GreedyNMM algorithm
  • supports IoU and IoS based matching with spatial hash grid indexing
  • merges instance-segmentation masks (element-wise maximum)
  • supports multiple GIE targeting (gie-ids="1;3;5")
  • parallel per-frame processing via OpenMP
  • configurable merge strategy (union / weighted / largest)

Inference (nvinfer) and nvdsinfer_yolo

  • nvinfer / libnvds_infer.so: from the DeepStream SDK (unchanged by this repo).
  • libnvds_infer_yolo.so: built from this repository’s deepstream_source/libs/nvdsinfer_yolo/. Not optional for the default pipelines: the shipped models are exported with EfficientNMS (EfficientNMS_TRT / related ops), and this custom parser implements the bounding-box decoding for that layout. The sample PGIE configs set custom-lib-path to .../libnvds_infer_yolo.so.

Results Summary

Test setup:

  • input video resolution: 2560x1440
  • GPU: NVIDIA RTX 5080
  • precision: FP16
  • batch size: 16

Detection Counts Per Frame

visdrone-full-640

Video No SAHI SAHI Change
aerial_crowding_01 13.8 84.2 +510%
aerial_crowding_02 206.2 664.7 +222%
aerial_vehicles 92.3 252.5 +174%

visdrone-sliced-448

Video No SAHI SAHI Change
aerial_crowding_01 2.3 85.3 +3619%
aerial_crowding_02 35.9 614.9 +1613%
aerial_vehicles 28.6 226.7 +694%

Full-Frame Training vs Sliced Training With SAHI

Video full-640 + SAHI sliced-448 + SAHI Difference
aerial_crowding_01 84.2 85.3 +1.3%
aerial_crowding_02 664.7 614.9 -7.5%
aerial_vehicles 252.5 226.7 -10.2%

For the complete benchmark, see docs/TEST_RESULTS.md.

Example Charts

Dense Pedestrian Crowd

Total objects per frame for aerial_crowding_01

Class comparison for aerial_crowding_01

Very Dense Crowd

Total objects per frame for aerial_crowding_02

Class comparison for aerial_crowding_02

Dense Vehicle Traffic

Total objects per frame for aerial_vehicles

Class comparison for aerial_vehicles

Video Demos

Dense Pedestrian Crowd Very Dense Crowd Dense Vehicle Traffic
Dense Pedestrian Crowd Very Dense Crowd Dense Vehicle Traffic

Training Notes

The repository includes both full-frame and slice-oriented training artifacts. The current results indicate that SAHI allows smaller model inputs to recover object scale on high-resolution video, which can help balance accuracy and throughput.

Training details are documented in docs/TRAINING.md.

Plugin Parameter Validation

All nvsahipostprocess parameters have been validated with automated tests across two density regimes:

Video Detections/frame Scene Tests
aerial_vehicles.mp4 ~311 Moderate — vehicles 21/21 passed
aerial_crowding_02.mp4 ~1312 Very dense — pedestrians + motorcycles 21/21 passed

Key findings:

  • match-metric: IoS suppresses more duplicates than IoU (recommended for SAHI)
  • match-threshold: monotonic — lower threshold → more aggressive suppression
  • class-agnostic=true: +36% more suppression on vehicles, +13% on dense crowds
  • enable-merge=false: reliably produces zero merges (pure NMS mode)
  • max-detections: exact cap — removes 789 extra detections in dense scenes
  • PERF profiling: GST_DEBUG=nvsahipostprocess:4 shows latency summary every ~1s

Pipeline Throughput (RTX 5080, FP16, 9 slices/frame, 2560×1440)

Video Dets/frame Pipeline FPS Postprocess ms/frame Postprocess overhead
aerial_vehicles ~311 29.9 fps 0.35 ms 1.0%
aerial_crowding_02 ~1,312 24.4 fps 1.55 ms 3.8%

The postprocess NMM is never the bottleneck — TensorRT inference on 9 slices dominates. At 4× more detections, postprocess latency scales sub-linearly (spatial grid indexing).

Run the automated test suite:

# Default video (aerial_vehicles.mp4)
scripts/test_postprocess_params.sh

# Custom video
scripts/test_postprocess_params.sh python_test/videos/aerial_crowding_02.mp4

Full results: Parameter Tests — Vehicles | Parameter Tests — Dense Crowd

Limitations

  • The bidirectional NMM algorithm (non-greedy, transitive merge chains) is not implemented. GreedyNMM covers real-time use-cases adequately.
  • Merged mask resolution is capped at 512x512 to prevent excessive memory allocation.
  • Only single-source pipelines have been validated end-to-end; multi-source is supported via OpenMP parallelism but has not been benchmarked.

See docs/PLUGINS.md for the full property reference and algorithm details.

License

The project is distributed under the terms of the NVIDIA DeepStream SDK License Agreement. See the LICENSE file at the repository root.

Some components carry additional notices in source headers (for example, derivative works of NVIDIA DeepStream samples in gst-nvsahipreprocess, or third-party copyright lines in python_test/common/). Preserve those notices when redistributing.

About

Native GStreamer plugins that integrate SAHI (Slicing Aided Hyper Inference) into NVIDIA DeepStream for real-time small object detection in high-resolution video streams.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors