GStreamer plugins that bring SAHI slicing to NVIDIA DeepStream. The project keeps slicing, inference, and merge steps inside the DeepStream pipeline, using nvinfer for TensorRT execution and NvDsObjectMeta for post-processing.
This repository provides two plugins:
nvsahipreprocess: computes frame slices, crops them on GPU, and prepares the input fornvinfernvsahipostprocess: merges overlapping detections produced at slice boundaries using GreedyNMM
Typical pipeline:
nvstreammux -> nvsahipreprocess -> nvinfer -> nvsahipostprocess -> nvtracker -> nvdsosd
Most SAHI integrations around DeepStream run outside the pipeline, often in Python. This project keeps the workflow inside GStreamer so it can work with standard DeepStream components such as tracking, analytics, message brokers, and display elements.
Key points:
- SAHI slicing implemented as DeepStream plugins
- TensorRT inference handled by
nvinfer - Support for DeepStream 8.x and 9.x
- Test scripts and sample models included in the repository
| Component | DeepStream 8.0 | DeepStream 9.0 |
|---|---|---|
| DeepStream SDK | 8.0 | 9.0 |
| CUDA Toolkit | 12.8 | 13.1 |
| TensorRT | 10.9.0 | 10.14.1 |
| GStreamer | 1.24.2 | 1.24.2 |
| Python bindings | pyds 1.2.2 |
built from source |
The install.sh script detects the installed DeepStream version for Python bindings, builds the SAHI GStreamer plugins, and builds and installs libnvds_infer_yolo.so from deepstream_source/libs/nvdsinfer_yolo. That library is required to run the bundled ONNX models: they use TensorRT’s EfficientNMS post-processing, and this project’s parser decodes that output (it is not NVIDIA’s stock sample parser). Core TensorRT execution still uses the SDK’s libnvds_infer.so with nvinfer (not rebuilt here). Licensing: see LICENSE at the repository root and in deepstream_source/libs/nvdsinfer_yolo/.
This repository uses Git LFS for ONNX model files.
git lfs install
git clone https://github.com/levipereira/deepstream-sahi.git
cd deepstream-sahiRun a DeepStream container:
docker run -it --name deepstream-sahi --net=host --gpus all \
-v `pwd`:/apps/deepstream-sahi \
-w /apps/deepstream-sahi \
nvcr.io/nvidia/deepstream:9.0-triton-multiarchInside the container:
/apps/deepstream-sahi/install.sh
source /opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/pyds/bin/activate
cd /apps/deepstream-sahi/python_test/deepstream-test-sahi
python3 deepstream_test_sahi.py --model visdrone-full-640 --no-display --csv -i ../videos/aerial_crowding_01.mp4Test videos are available on Google Drive and should be placed in python_test/videos/.
For container variants, display notes, rebuild mode, and environment details, see docs/INSTALL.md.
| Document | Description |
|---|---|
| Installation Guide | container setup, dependencies, plugin build |
| Usage Guide | pipeline execution, CLI arguments, result comparison |
| Plugin Reference | plugin properties and behavior |
| Training Guide | training workflow for sliced models |
| Test Results | evaluation data and charts |
| Parameter Tests — Vehicles | postprocess parameter validation (moderate density) |
| Parameter Tests — Dense Crowd | postprocess parameter validation (high density) |
deepstream-sahi/
├── deepstream_source/
│ ├── gst-plugins/
│ │ ├── gst-nvsahipreprocess/
│ │ └── gst-nvsahipostprocess/
│ └── libs/
│ └── nvdsinfer_yolo/
├── python_test/
│ ├── common/
│ ├── deepstream-test-sahi/
│ └── videos/
├── train_yolov9_visdrone/
├── test_results/
├── scripts/
│ └── test_postprocess_params.sh
├── docs/
├── install.sh
└── README.md
- computes slice windows for each frame
- crops and rescales slices with
NvBufSurfTransform - forwards the resulting data to
nvinfer
- reads detections from
NvDsObjectMeta - merges duplicates created by overlapping slices using a two-phase GreedyNMM algorithm
- supports IoU and IoS based matching with spatial hash grid indexing
- merges instance-segmentation masks (element-wise maximum)
- supports multiple GIE targeting (
gie-ids="1;3;5") - parallel per-frame processing via OpenMP
- configurable merge strategy (union / weighted / largest)
nvinfer/libnvds_infer.so: from the DeepStream SDK (unchanged by this repo).libnvds_infer_yolo.so: built from this repository’sdeepstream_source/libs/nvdsinfer_yolo/. Not optional for the default pipelines: the shipped models are exported with EfficientNMS (EfficientNMS_TRT/ related ops), and this custom parser implements the bounding-box decoding for that layout. The sample PGIE configs setcustom-lib-pathto.../libnvds_infer_yolo.so.
Test setup:
- input video resolution:
2560x1440 - GPU:
NVIDIA RTX 5080 - precision:
FP16 - batch size:
16
| Video | No SAHI | SAHI | Change |
|---|---|---|---|
aerial_crowding_01 |
13.8 | 84.2 | +510% |
aerial_crowding_02 |
206.2 | 664.7 | +222% |
aerial_vehicles |
92.3 | 252.5 | +174% |
| Video | No SAHI | SAHI | Change |
|---|---|---|---|
aerial_crowding_01 |
2.3 | 85.3 | +3619% |
aerial_crowding_02 |
35.9 | 614.9 | +1613% |
aerial_vehicles |
28.6 | 226.7 | +694% |
| Video | full-640 + SAHI |
sliced-448 + SAHI |
Difference |
|---|---|---|---|
aerial_crowding_01 |
84.2 | 85.3 | +1.3% |
aerial_crowding_02 |
664.7 | 614.9 | -7.5% |
aerial_vehicles |
252.5 | 226.7 | -10.2% |
For the complete benchmark, see docs/TEST_RESULTS.md.
| Dense Pedestrian Crowd | Very Dense Crowd | Dense Vehicle Traffic |
|---|---|---|
![]() |
![]() |
![]() |
The repository includes both full-frame and slice-oriented training artifacts. The current results indicate that SAHI allows smaller model inputs to recover object scale on high-resolution video, which can help balance accuracy and throughput.
Training details are documented in docs/TRAINING.md.
All nvsahipostprocess parameters have been validated with automated tests across two
density regimes:
| Video | Detections/frame | Scene | Tests |
|---|---|---|---|
aerial_vehicles.mp4 |
~311 | Moderate — vehicles | 21/21 passed |
aerial_crowding_02.mp4 |
~1312 | Very dense — pedestrians + motorcycles | 21/21 passed |
Key findings:
- match-metric: IoS suppresses more duplicates than IoU (recommended for SAHI)
- match-threshold: monotonic — lower threshold → more aggressive suppression
- class-agnostic=true: +36% more suppression on vehicles, +13% on dense crowds
- enable-merge=false: reliably produces zero merges (pure NMS mode)
- max-detections: exact cap — removes 789 extra detections in dense scenes
- PERF profiling:
GST_DEBUG=nvsahipostprocess:4shows latency summary every ~1s
| Video | Dets/frame | Pipeline FPS | Postprocess ms/frame | Postprocess overhead |
|---|---|---|---|---|
aerial_vehicles |
~311 | 29.9 fps | 0.35 ms | 1.0% |
aerial_crowding_02 |
~1,312 | 24.4 fps | 1.55 ms | 3.8% |
The postprocess NMM is never the bottleneck — TensorRT inference on 9 slices dominates. At 4× more detections, postprocess latency scales sub-linearly (spatial grid indexing).
Run the automated test suite:
# Default video (aerial_vehicles.mp4)
scripts/test_postprocess_params.sh
# Custom video
scripts/test_postprocess_params.sh python_test/videos/aerial_crowding_02.mp4Full results: Parameter Tests — Vehicles | Parameter Tests — Dense Crowd
- The bidirectional NMM algorithm (non-greedy, transitive merge chains) is not implemented. GreedyNMM covers real-time use-cases adequately.
- Merged mask resolution is capped at 512x512 to prevent excessive memory allocation.
- Only single-source pipelines have been validated end-to-end; multi-source is supported via OpenMP parallelism but has not been benchmarked.
See docs/PLUGINS.md for the full property reference and algorithm details.
The project is distributed under the terms of the NVIDIA DeepStream SDK License Agreement. See the LICENSE file at the repository root.
Some components carry additional notices in source headers (for example, derivative works of NVIDIA DeepStream samples in gst-nvsahipreprocess, or third-party copyright lines in python_test/common/). Preserve those notices when redistributing.








