Skip to content

Latest commit

 

History

History
110 lines (81 loc) · 4.71 KB

File metadata and controls

110 lines (81 loc) · 4.71 KB

Video Processing Pipeline Analysis Results

This document records the performance benchmarks and hardware initialization status of the DE10-Nano video processing pipeline.

1. DMA# Performance Benchmark Results

⬅️ Back to README (2026-02-12)

Test Case Size Software (cycles) Hardware (cycles) MB/s (HW) Speedup
OCM to DDR 4KB x 100 4,185,427 166,211 117.5 25 x
DDR to DDR 1MB 207,071,817 393,942 126.9 525 x

Note

DMA (Burst Master 4) significantly offloads the CPU, providing over 500x speedup for 1MB transfers.

2. Hardware Initialization Status

  • HDMI PLL: Locked at ~37.8 MHz (960x540p60 target)
  • ADV7513 IC: Configured via I2C successfully
  • Memory Map: Nios II & DMA isolated at 0x20000000 (512MB offset)
  • Modular Filter: 4-bit mode CSR verified, 60fps throughput confirmed

3. Official Execution Log

--- [TEST 1] OCM to DDR DMA (burst_master_0) ---
Starting SW Copy (4KB x 100)... Done (4185427 cycles, ~4.6 MB/s)
Starting HW DMA (4KB x 100)... Done (166211 cycles, ~117.5 MB/s)
SUCCESS: OCM to DDR Verified!

--- [TEST 2] DDR to DDR DMA (Burst Master 4) ---
Starting HW DMA (1MB)... Done (393942 cycles, ~126.9 MB/s)
SUCCESS: DDR to DDR Verified!

--- [TEST 3] Real-time Modular Filter Verified (2026-02-20) ---
- **Resolution**: 960x540p @ 60fps (Stable)
- **Filter Pipeline**: 
    - Mode 1: Grayscale (Verified)
    - Mode 3: Color Blur (Verified via Cocotb & Hardware)
    - Mode 5: Color Edge (Verified)
    - Mode 6: Emboss (Verified via Cocotb)
    - Mode 7: Sharpen (Verified via Cocotb)
- **Verification**: Cocotb Simulation processed 518,400 pixels in 98.21s (Sim time).
- **Latency**: 3-clock pipeline delay matched across all modes.
- **Visuals**: Confirmed zero jitter and correct spatial convolution.

RTL Hardware Verification (Cocotb)

RTL Sharpen (Mode 7)

4. Filter Algorithm (Python Simulation) Results

The sim_filters.py script provides a high-level reference implementation.

Original Grayscale
Original Grayscale
Blur Edge (Gray)
Blur Edge Gray
Edge (Color) Emboss
Edge Color Emboss
Sharpen
Sharpen

5. 2-Stage Hybrid Dithering Results

The pipeline implements a Bit-Split Hybrid Architecture: 2-bit Temporal LSB scrambling followed by 4-bit spatial Floyd-Steinberg Error Diffusion.

Quantitative Visual Quality Assessment (PSNR)

Comparison of the 2-Stage Hybrid algorithm against simple 4-bit truncation using a dog portrait test image.

Evaluation Region Truncation (Baseline) Proposed Hybrid Improvement
Whole Image 29.16 dB 32.46 dB +3.30 dB
Near-Black (< 0x20) 29.15 dB 32.44 dB +3.29 dB

PSNR Metric Graph

Visual Comparison

Original (8-bit) 4-bit Truncated (Banding) Hybrid 2-Stage (Result)
Original Truncated Hybrid

Important

The +3.3dB PSNR improvement in near-black regions scientifically confirms the effectiveness of our "Seamless Error Propagation" logic. This architecture achieves 3D-like spatiotemporal quality without any external frame buffer (0 MB/s DDR bandwidth).

6. 12-bit High-Precision Linear Path Verification (2026-02-22)

To fundamentally solve color banding in dark regions during 3x3 gamut correction, we upgraded the internal precision to 12-bit.

Test Case: 12-bit Color Chain (De-gamma → 3x3 Matrix → Gamma)

Verification conducted via bit-accurate Cocotb simulation.

Metric Target Result Status
Internal Bit-depth 12-bit (4096 levels) 12-bit Verified ✅ PASSED
Roundtrip Error Max 1 LSB (8-bit space) 1 LSB Verified ✅ PASSED
Shadow detail preservation No Crushing (< 0.05% error) Verified OK ✅ PASSED
Pipeline Throughput 37.8 MHz (60fps) 37.8 MHz Verified ✅ PASSED

Analysis: By using 12-bit for the 3x3 matrix stage, we preserved the distinct steps of the lowest sRGB gradations (sRGB 1-15) that were previously crushed to zero in an 8-bit pipeline. This ensures a smooth, artifact-free transition in shadows even after complex gamut remapping.


Created by Nios II Performance Monitoring Unit.