deepstream-apriltag-cuda

A custom NVIDIA DeepStream GStreamer plugin for AprilTag detection entirely on the GPU using CUDA, targeting Jetson Orin NX / AGX with DeepStream 7.1.

Tested with 8 simultaneous 1080p RTSP streams on a Jetson Orin NX 16 GB — CPU usage stays below 20%, GPU usage up to ~40%.

Target platform: NVIDIA Jetson Orin NX / AGX — JetPack 6.x, DeepStream 7.1, CUDA 12.x

Based on apriltags_cuda by FRC Team 766 / Team 971

The GPU detection core is built on top of the excellent apriltags_cuda library by FRC Team 766 / Team 971, which implements the entire AprilTag detection pipeline (thresholding, connected-component labeling, quad fitting, tag decoding) as CUDA kernels — no CPU involvement in the detection path.

This repository wraps that library as a native NVIDIA DeepStream GStreamer plugin (nvdsapriltagcuda) and includes a set of modifications that were necessary to make it build and run inside a DeepStream Docker container on Jetson without a full Jetson SDK on the host machine.

Modifications over upstream

Change	Details
`CMakeLists_minimal.txt`	Strips WPILib, NetworkTables, OpenCV, and seasocks — builds with just CUDA + apriltag C lib + glog
EGL / NVMM interop	`CudaEglFrame` maps `NvBufSurface` NVMM buffers directly into CUDA without any CPU copy
RGBA to YUYV kernel	Custom `rgba_to_yuyv_kernel` — the frc971 detector expects YUYV input; DeepStream delivers RGBA
Docker stub libraries	Mock `.so` files for `libnvbufsurface` allow link-time resolution inside Docker without a Jetson SDK
DeepStream metadata	Results written as `NvDsObjectMeta` (class ID, object ID, bounding boxes, text labels)
FIFO JSON IPC	Each detection serialised as JSON to `/tmp/apriltag_detections_cam_<CAMERA_ID>` for companion processes
`detection-interval` property	GObject property to detect every N frames and reuse cached results in between

Directory layout

nvds_apriltag_cuda/
├── plugin/                     # GStreamer DeepStream plugin
│   ├── gstnvdsapriltagcuda.cu  # Main GStreamer element (CUDA TU)
│   ├── cuda_utils.cu           # EGL interop + RGBA->YUYV CUDA kernel
│   ├── cuda_utils.h
│   └── CMakeLists.txt
├── apriltags_cuda/             # Modified GPU detection library
│   ├── CMakeLists_minimal.txt  # Key contribution: minimal build for DeepStream/Docker
│   ├── install_deps.sh         # Native host dependency installer
│   └── src/                   # Core GPU detection pipeline (CUDA kernels)
├── Dockerfile                  # Full build: apriltag lib + plugin in one image
└── README.md

How it works

The element is a GstBaseTransform in-place plugin registered as nvdsapriltagcuda.

The incoming NvBufSurface (NVMM/RGBA) is mapped as an EGL image via NvBufSurfaceMapEglImage
CudaEglFrame registers it as a CUDA graphics resource (cuGraphicsEGLRegisterImage)
A custom CUDA kernel (rgba_to_yuyv_kernel) converts RGBA → YUYV in-GPU
frc971::apriltag::GpuDetector::Detect() runs the full detection pipeline on-device
Results are written as NvDsObjectMeta (class/object IDs, bounding boxes, text labels)
Each detection is also serialised as JSON and written to a named FIFO pipe for consumption by a companion process

The detection-interval property lets you run detection every N frames and reuse cached results in between, reducing GPU load on high-framerate streams.

Prerequisites

Dependency	Version
Jetson platform	Orin NX or AGX (sm_87), or any CUDA GPU
JetPack	6.x
DeepStream	7.1
CUDA toolkit	12.x (nvcc)
CMake	3.18+
libgoogle-glog	any recent version
apriltag C library	3.3.0 (cgpadwick fork)
NVIDIA CCCL	v2.3.2 (CUB / Thrust headers)

Build option A: Docker (recommended)

The Dockerfile handles every step: CMake install, apriltag library, CCCL headers, stub libs, and the plugin itself.

# Build from the nvds_apriltag_cuda/ directory
docker build -t nvds-apriltag-cuda:latest .

# Run with GPU access
docker run --rm --runtime=nvidia --gpus all nvds-apriltag-cuda:latest

CUDA architecture is set to 87 (Jetson Orin). Change -DCMAKE_CUDA_ARCHITECTURES=87 in the Dockerfile if targeting a different GPU.

Build option B: Native on device

Step 1: Install system dependencies

sudo apt-get install -y \
  build-essential pkg-config git wget \
  libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev \
  libgoogle-glog-dev

Step 2: Build and install the apriltag C library

git clone --depth 1 --branch 3.3.0 https://github.com/cgpadwick/apriltag.git
cd apriltag && mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_EXAMPLES=OFF ..
make -j$(nproc) && sudo make install && sudo ldconfig

Step 3: Get CCCL headers

git clone --depth 1 --branch v2.3.2 https://github.com/NVIDIA/cccl.git /opt/cccl

Step 4: Build the apriltag_cuda library

cd apriltags_cuda
cp CMakeLists_minimal.txt CMakeLists.txt
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_CUDA_ARCHITECTURES=87 \
      -DCCCL_DIR=/opt/cccl \
      ..
make -j$(nproc) && sudo make install && sudo ldconfig

Step 5: Build the GStreamer plugin

cd plugin
mkdir build && cd build
cmake ..
make -j$(nproc)
sudo make install

The .so installs to /opt/nvidia/deepstream/deepstream/lib/gst-plugins/.

export GST_PLUGIN_PATH=/opt/nvidia/deepstream/deepstream/lib/gst-plugins:$GST_PLUGIN_PATH
gst-inspect-1.0 nvdsapriltagcuda

GStreamer pipeline example

gst-launch-1.0 \
  rtspsrc location=rtsp://<ip>:<port>/<path> latency=100 ! \
  rtph264depay ! h264parse ! nvv4l2decoder ! \
  nvstreammux name=mux batch-size=1 width=1920 height=1080 ! \
  nvdsapriltagcuda detection-interval=3 ! \
  nvdsosd ! \
  nvegltransform ! nveglglessink

Set detection-interval to 1 to detect on every frame, or higher values (e.g. 3-5) to reduce GPU load on high-framerate streams while still reusing the last cached result for OSD rendering.

IPC / FIFO output

Every detection is serialised as a JSON object and written to a named pipe at:

/tmp/apriltag_detections_cam_<CAMERA_ID>

where CAMERA_ID is read from the CAMERA_ID environment variable (defaults to 0). A companion process can open this pipe for reading to consume detections in real time without modifying the GStreamer pipeline. The FIFO path is currently hardcoded in gstnvdsapriltagcuda.cu; if you need it to be runtime-configurable, expose it as a GObject property.

Measured performance (Jetson Orin NX 16 GB)

Tested with 8 simultaneous 1080p RTSP streams on a Jetson Orin NX 16 GB running JetPack 6 / DeepStream 7.1:

Metric	Measured value
CPU usage (all cores)	< 20%
GPU usage	up to ~40%
RAM usage	6 - 8 GB
Streams	8 x 1080p RTSP
Detection interval	3 (detect every 3rd frame)

The low CPU usage is the key advantage over CPU-based AprilTag libraries: thresholding, connected-component labeling, quad fitting, and tag decoding all run as CUDA kernels. The CPU is only involved in GStreamer buffer management and metadata writing.

Credits

GPU detection core: apriltags_cuda by FRC Team 766 / Team 971, originally based on the apriltag work from FRC Team 971 (Spartan Robotics). The original library is licensed under the MIT License.

DeepStream integration, EGL/NVMM interop, RGBA-to-YUYV kernel, Docker stub library technique, and CMakeLists_minimal.txt by t-teja.

Keywords

nvidia jetson jetson-orin jetson-orin-nx jetson-orin-agx deepstream gstreamer apriltag apriltag-detection cuda gpu-accelerated computer-vision multi-stream rtsp jetpack embedded-vision edge-ai frc robotics nvmm egl-interop

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deepstream-apriltag-cuda

Based on apriltags_cuda by FRC Team 766 / Team 971

Modifications over upstream

Directory layout

How it works

Prerequisites

Build option A: Docker (recommended)

Build option B: Native on device

Step 1: Install system dependencies

Step 2: Build and install the apriltag C library

Step 3: Get CCCL headers

Step 4: Build the apriltag_cuda library

Step 5: Build the GStreamer plugin

GStreamer pipeline example

IPC / FIFO output

Measured performance (Jetson Orin NX 16 GB)

Related

Credits

Keywords

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
apriltags_cuda		apriltags_cuda
plugin		plugin
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

deepstream-apriltag-cuda

Based on apriltags_cuda by FRC Team 766 / Team 971

Modifications over upstream

Directory layout

How it works

Prerequisites

Build option A: Docker (recommended)

Build option B: Native on device

Step 1: Install system dependencies

Step 2: Build and install the apriltag C library

Step 3: Get CCCL headers

Step 4: Build the apriltag_cuda library

Step 5: Build the GStreamer plugin

GStreamer pipeline example

IPC / FIFO output

Measured performance (Jetson Orin NX 16 GB)

Related

Credits

Keywords

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages