Skip to content

t-teja/deepstream-apriltag-cuda

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deepstream-apriltag-cuda

A custom NVIDIA DeepStream GStreamer plugin for AprilTag detection entirely on the GPU using CUDA, targeting Jetson Orin NX / AGX with DeepStream 7.1.

Tested with 8 simultaneous 1080p RTSP streams on a Jetson Orin NX 16 GB — CPU usage stays below 20%, GPU usage up to ~40%.

Target platform: NVIDIA Jetson Orin NX / AGX — JetPack 6.x, DeepStream 7.1, CUDA 12.x


Based on apriltags_cuda by FRC Team 766 / Team 971

The GPU detection core is built on top of the excellent apriltags_cuda library by FRC Team 766 / Team 971, which implements the entire AprilTag detection pipeline (thresholding, connected-component labeling, quad fitting, tag decoding) as CUDA kernels — no CPU involvement in the detection path.

This repository wraps that library as a native NVIDIA DeepStream GStreamer plugin (nvdsapriltagcuda) and includes a set of modifications that were necessary to make it build and run inside a DeepStream Docker container on Jetson without a full Jetson SDK on the host machine.

Modifications over upstream

Change Details
CMakeLists_minimal.txt Strips WPILib, NetworkTables, OpenCV, and seasocks — builds with just CUDA + apriltag C lib + glog
EGL / NVMM interop CudaEglFrame maps NvBufSurface NVMM buffers directly into CUDA without any CPU copy
RGBA to YUYV kernel Custom rgba_to_yuyv_kernel — the frc971 detector expects YUYV input; DeepStream delivers RGBA
Docker stub libraries Mock .so files for libnvbufsurface allow link-time resolution inside Docker without a Jetson SDK
DeepStream metadata Results written as NvDsObjectMeta (class ID, object ID, bounding boxes, text labels)
FIFO JSON IPC Each detection serialised as JSON to /tmp/apriltag_detections_cam_<CAMERA_ID> for companion processes
detection-interval property GObject property to detect every N frames and reuse cached results in between

Directory layout

nvds_apriltag_cuda/
├── plugin/                     # GStreamer DeepStream plugin
│   ├── gstnvdsapriltagcuda.cu  # Main GStreamer element (CUDA TU)
│   ├── cuda_utils.cu           # EGL interop + RGBA->YUYV CUDA kernel
│   ├── cuda_utils.h
│   └── CMakeLists.txt
├── apriltags_cuda/             # Modified GPU detection library
│   ├── CMakeLists_minimal.txt  # Key contribution: minimal build for DeepStream/Docker
│   ├── install_deps.sh         # Native host dependency installer
│   └── src/                   # Core GPU detection pipeline (CUDA kernels)
├── Dockerfile                  # Full build: apriltag lib + plugin in one image
└── README.md

How it works

The element is a GstBaseTransform in-place plugin registered as nvdsapriltagcuda.

  1. The incoming NvBufSurface (NVMM/RGBA) is mapped as an EGL image via NvBufSurfaceMapEglImage
  2. CudaEglFrame registers it as a CUDA graphics resource (cuGraphicsEGLRegisterImage)
  3. A custom CUDA kernel (rgba_to_yuyv_kernel) converts RGBA → YUYV in-GPU
  4. frc971::apriltag::GpuDetector::Detect() runs the full detection pipeline on-device
  5. Results are written as NvDsObjectMeta (class/object IDs, bounding boxes, text labels)
  6. Each detection is also serialised as JSON and written to a named FIFO pipe for consumption by a companion process

The detection-interval property lets you run detection every N frames and reuse cached results in between, reducing GPU load on high-framerate streams.


Prerequisites

Dependency Version
Jetson platform Orin NX or AGX (sm_87), or any CUDA GPU
JetPack 6.x
DeepStream 7.1
CUDA toolkit 12.x (nvcc)
CMake 3.18+
libgoogle-glog any recent version
apriltag C library 3.3.0 (cgpadwick fork)
NVIDIA CCCL v2.3.2 (CUB / Thrust headers)

Build option A: Docker (recommended)

The Dockerfile handles every step: CMake install, apriltag library, CCCL headers, stub libs, and the plugin itself.

# Build from the nvds_apriltag_cuda/ directory
docker build -t nvds-apriltag-cuda:latest .

# Run with GPU access
docker run --rm --runtime=nvidia --gpus all nvds-apriltag-cuda:latest

CUDA architecture is set to 87 (Jetson Orin). Change -DCMAKE_CUDA_ARCHITECTURES=87 in the Dockerfile if targeting a different GPU.


Build option B: Native on device

Step 1: Install system dependencies

sudo apt-get install -y \
  build-essential pkg-config git wget \
  libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev \
  libgoogle-glog-dev

Step 2: Build and install the apriltag C library

git clone --depth 1 --branch 3.3.0 https://github.com/cgpadwick/apriltag.git
cd apriltag && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXAMPLES=OFF ..
make -j$(nproc) && sudo make install && sudo ldconfig

Step 3: Get CCCL headers

git clone --depth 1 --branch v2.3.2 https://github.com/NVIDIA/cccl.git /opt/cccl

Step 4: Build the apriltag_cuda library

cd apriltags_cuda
cp CMakeLists_minimal.txt CMakeLists.txt
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_CUDA_ARCHITECTURES=87 \
      -DCCCL_DIR=/opt/cccl \
      ..
make -j$(nproc) && sudo make install && sudo ldconfig

Step 5: Build the GStreamer plugin

cd plugin
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install

The .so installs to /opt/nvidia/deepstream/deepstream/lib/gst-plugins/.

export GST_PLUGIN_PATH=/opt/nvidia/deepstream/deepstream/lib/gst-plugins:$GST_PLUGIN_PATH
gst-inspect-1.0 nvdsapriltagcuda

GStreamer pipeline example

gst-launch-1.0 \
  rtspsrc location=rtsp://<ip>:<port>/<path> latency=100 ! \
  rtph264depay ! h264parse ! nvv4l2decoder ! \
  nvstreammux name=mux batch-size=1 width=1920 height=1080 ! \
  nvdsapriltagcuda detection-interval=3 ! \
  nvdsosd ! \
  nvegltransform ! nveglglessink

Set detection-interval to 1 to detect on every frame, or higher values (e.g. 3-5) to reduce GPU load on high-framerate streams while still reusing the last cached result for OSD rendering.


IPC / FIFO output

Every detection is serialised as a JSON object and written to a named pipe at:

/tmp/apriltag_detections_cam_<CAMERA_ID>

where CAMERA_ID is read from the CAMERA_ID environment variable (defaults to 0). A companion process can open this pipe for reading to consume detections in real time without modifying the GStreamer pipeline. The FIFO path is currently hardcoded in gstnvdsapriltagcuda.cu; if you need it to be runtime-configurable, expose it as a GObject property.


Measured performance (Jetson Orin NX 16 GB)

Tested with 8 simultaneous 1080p RTSP streams on a Jetson Orin NX 16 GB running JetPack 6 / DeepStream 7.1:

Metric Measured value
CPU usage (all cores) < 20%
GPU usage up to ~40%
RAM usage 6 - 8 GB
Streams 8 x 1080p RTSP
Detection interval 3 (detect every 3rd frame)

The low CPU usage is the key advantage over CPU-based AprilTag libraries: thresholding, connected-component labeling, quad fitting, and tag decoding all run as CUDA kernels. The CPU is only involved in GStreamer buffer management and metadata writing.


Related


Credits

GPU detection core: apriltags_cuda by FRC Team 766 / Team 971, originally based on the apriltag work from FRC Team 971 (Spartan Robotics). The original library is licensed under the MIT License.

DeepStream integration, EGL/NVMM interop, RGBA-to-YUYV kernel, Docker stub library technique, and CMakeLists_minimal.txt by t-teja.


Keywords

nvidia jetson jetson-orin jetson-orin-nx jetson-orin-agx deepstream gstreamer apriltag apriltag-detection cuda gpu-accelerated computer-vision multi-stream rtsp jetpack embedded-vision edge-ai frc robotics nvmm egl-interop

About

NVIDIA DeepStream GStreamer plugin for GPU-accelerated AprilTag detection on Jetson Orin NX/AGX using CUDA. Based on apriltags_cuda by FRC Team 766. Supports 8 simultaneous RTSP streams with <20% CPU usage.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors