Skip to content

Pr/compressor integration#1560

Draft
sgerber-main wants to merge 10 commits intoXilinx:devfrom
sgerber-main:pr/compressor-integration
Draft

Pr/compressor integration#1560
sgerber-main wants to merge 10 commits intoXilinx:devfrom
sgerber-main:pr/compressor-integration

Conversation

@sgerber-main
Copy link
Copy Markdown

No description provided.

Port of compressor-python for efficient low-bitwidth dot products
using LUT6CY primitives instead of DSP blocks.

Supports:
- 7-Series (CARRY4), UltraScale+ (CARRY8), Versal (LOOKAHEAD8)
- Signed/unsigned operands up to 4-bit
- Fused accumulation with constant absorption
- Two optimization paths: dotp_comp and add_multi
- mvu_vvu_axi.sv: USE_COMPRESSOR gating, MAX_IN_FLIGHT safety floor
- add_multi.sv: CATCH_COMP macro for lane reduction compressors
- mvu_vvu_axi_wrapper.v: COMP_PIPELINE_DEPTH parameter propagation
- Add compressor eligibility checks (_is_dotp_comp_eligible, etc.)
- Call generate_dotp_comp() and generate_add_multi_comps()
- Add generated files to RTL file list
- Propagate COMP_PIPELINE_DEPTH and USE_COMPRESSOR template vars
- Core compressor unit tests (run_tests.sh): 21 XSim-based configs
- dotp_comp integration tests: 8 configs (Versal + 7-Series)
- add_multi integration tests: 8 configs (DSP lane reduction)
- MVU synthesis tests for timing closure validation
- REPORT.md: Status, known issues, benchmarking results
- TEST_GUIDE.md: Testing procedures and troubleshooting
Correct LUT6_2 predicate order and CARRY4 carry-in wiring for
MuxCYRippleSum and MuxCYPredAdder counters on 7-Series FPGAs.

Key fixes:
- Swap O5/O6 outputs to match hardware behavior
- Add initial carry-in=0 for first LUT in chain
- Implement MuxCYPredAdder for horizontal multi-column absorption

Re-enables gate absorption optimization for 7-Series targets.
Compressor path handles full weight range (no narrow weight restriction).
DSP path uses NARROW_WEIGHTS module parameter for range adjustment.

This enables RTL MVAU on 7-Series (DSP48E1) with full weight ranges.
Add COMP_PIPELINE_DEPTH and USE_COMPRESSOR template vars to VVU
for consistency with MVU. VVU does not use compressor (different
compute pattern), so USE_COMPRESSOR=0.
Aggregates CATCH_COMP entries from all MVAU nodes into a single
add_multi.sv file during out-of-context synthesis. Required for
add_multi compressor path to work in full builds.
Ignore Claude Code workspace files and benchmark results directories.
Remove test scripts and templates that are redundant with pytest coverage:
- run_add_multi_comp_tests.sh + add_multi_comp_tb_template.sv/.tcl
- run_mvu_add_multi_comp_tests.sh + mvu_add_multi_comp_tb_template.sv/.tcl
- run_mvu_comp_synth_tests.sh + mvu_comp_synth_tb_template.tcl

These remain available in the compressor-benchmarking branch for
detailed performance analysis. Core validation covered by:
- run_tests.sh (compressor unit tests)
- run_dotp_comp_tests.sh (dotp_comp integration)
- run_mvu_comp_tests.sh (full MVU integration)
- pytest test_fpgadataflow_rtl_mvau (FINN node tests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant