Skip to content

Merge changes for cuda-bindings 13.1.0 release#1315

Merged
rwgk merged 7 commits intoNVIDIA:mainfrom
rwgk:merge-for-13.1
Dec 4, 2025
Merged

Merge changes for cuda-bindings 13.1.0 release#1315
rwgk merged 7 commits intoNVIDIA:mainfrom
rwgk:merge-for-13.1

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Dec 4, 2025

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Dec 4, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Dec 4, 2025

/ok to test

@github-actions

This comment has been minimized.

@rwgk rwgk self-assigned this Dec 4, 2025
@rwgk rwgk added P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module cuda.pathfinder Everything related to the cuda.pathfinder module labels Dec 4, 2025
@rwgk rwgk marked this pull request as ready for review December 4, 2025 20:14
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Dec 4, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Dec 4, 2025

User prompt

We're at the final stage of making the cuda-bindings v13.1.0 release.

The PR is under merge-for-13.1 here.

CI passes. One teammate approved already.

merge-for-13.1 is in sync with the main branch here.

Could you please make yourself aware of the diff between main and merge-for-13.1, then scan the selected CI logs here for anything suspicious?

~/cuda-python_ci_run_19941271439

This is only for sanity checking.

cuda-python_ci_run_19941271439/
├── build-linux-64.txt  [2025-12-04 12:34]  (1,321,111 bytes)
├── build-linux-aarch64.txt  [2025-12-04 12:34]  (1,320,103 bytes)
├── build-win-64.txt  [2025-12-04 12:35]  (2,194,293 bytes)
├── test-linux-64.txt  [2025-12-04 12:31]  (642,109 bytes)
├── test-linux-aarch64.txt  [2025-12-04 12:32]  (644,421 bytes)
└── test-win-64.txt  [2025-12-04 12:30]  (498,781 bytes)
    ____________________________________________________________
    Number of subdirs: 0
    Sum of file sizes: (6,620,818 bytes)

Assistant response (Diff sanity check)

Diff sanity check (main ↔ merge-for-13.1)

  • Scope of changes:
    • CI/config: ci/test-matrix.yml adds coverage for CUDA 13.1.0 across Linux/Windows, x86_64/arm64, multiple Python versions; ci/versions.yml bumps the CUDA build version from 13.0.2 → 13.1.0.
    • Bindings: Large updates in cuda_bindings/cuda/bindings/* (driver/runtime/nvrtc/cufile/etc.) to track CTK 13.1.0, including new APIs (e.g. multicast *_v2 variants) and adjusted types/flags; _version.py updated accordingly.
    • Docs/release notes: New/renamed release notes files for 12.9.5 and 13.1.0, and module docs (driver.rst, runtime.rst, nvrtc.rst) updated for new APIs and behavior.
    • Support matrix pruning: supported_nvidia_libs.py and related pathfinder data drop older 11.x-era libs and some very old DLLs, keeping 12/13-era sonames and DLLs.
    • cuda-pathfinder: Version bumped 1.3.3a01.3.3, with new release notes and an extra cu13 dependency (nvidia-cufftmp-cu13), plus updated nv-versions JSON.
    • Minor infra: .gitattributes extended (e.g. *.cmd EOL, *.svg binary), .spdx-ignore extended for a Draw.io SVG.

Nothing in the diff itself looks out-of-character for a CTK 13.1.0 cut: it’s the expected combination of new APIs, doc updates, CI matrix expansion, and support-matrix cleanup.

CI logs: build jobs

From build-linux-64.txt, build-linux-aarch64.txt, build-win-64.txt:

  • Errors / failures:

    • Grep for FAILED, ERROR, Traceback, E across all build logs returned no matches indicative of build failures or exceptions.
    • The only hits on that pattern were:
      • except OSError: around a symlink helper (which then falls back to shutil.copy2), and
      • C/C++ compiler warnings in generated cufile.cpp (see below), not errors.
  • Notable warnings (non-fatal, but worth being aware of):

    • cufile stream type warnings (Linux x86_64 & aarch64):
      • Several warning: invalid conversion from 'void*' to 'CUstream' in generated cuda/bindings/cufile.cpp for:
        • cuFileReadAsync
        • cuFileWriteAsync
        • cuFileStreamRegister
        • cuFileStreamDeregister
      • These are compile-time warnings; builds still complete successfully and produce the .so modules.
      • Semantically, they reflect passing a void*-typed stream handle to functions expecting CUstream; if that’s intentional in your Cython layer, this is noisy but benign. If not, it might be something to tighten later, but it’s not blocking CI.
    • Windows loader warnings in build-win-64.txt:
      • MSVC C4996 for wcstombs and strncat in cuda/bindings/_bindings/loader.cpp (“unsafe” CRT functions). Typical for older-style C APIs; compilation still succeeds.
      • C4551 warnings in cynvrtc.cpp about “function call missing argument list” — also warnings only; module still links and is used by tests.
    • Setuptools packaging warnings:
      • _Warning: Package 'cuda.bindings' is absent from the packages configuration (and similarly for cuda.bindings._bindings, _internal, _lib) during wheel build on Windows. This pattern is consistent with using ext_modules and non-standard package layouts; since the build completes and tests run against installed wheels, these appear to be expected noise in your existing configuration.
    • Node.js deprecation warnings:
      • Several [DEP0040] DeprecationWarning: The 'punycode' module is deprecated messages, coming from GitHub Actions tooling, not from your project.

Nothing in the build logs suggests incomplete artifacts, missing symbols, or link errors for the new 13.1.0 APIs.

CI logs: test jobs

From test-linux-64.txt, test-linux-aarch64.txt, test-win-64.txt:

  • Errors / failures:

    • Grep for FAILED, ERROR, Traceback, and typical pytest failure-summary regex across all three test logs found:
      • No failures or errors in the two Linux test logs.
      • In test-win-64.txt, only a clean pytest summary with skips:
        • 190 passed, 4 skipped in 5.49s
        • Skips were:
          • tests/test_cufile.py: could not import cuda.bindings.cufile (no module in that environment).
          • tests/test_cudart.py: “cudart dynamic lib not available”.
          • tests/test_graphics_apis.py: GL context/texture could not be created (AttributeError: opengl32).
        • These match the usual pattern of environment-dependent tests (file I/O libs, cudart dynlib, OpenGL) being skipped when prerequisites aren’t present, not new regressions.
  • Warnings:

    • No additional suspicious warnings (beyond the Node deprecations already mentioned for the build jobs) surfaced in the test logs.
    • Pytest’s short-test-summary is clean other than the above skips; no failed/errors entries.

Overall sanity verdict

  • Diff review: Changes between main and merge-for-13.1 align with a normal CTK bump and binding refresh (new 13.1.0 APIs, CI matrix updates, docs and version bumps, and cleanup of old libs); nothing stands out as an obviously risky, off-topic, or accidental change.
  • CI logs:
    • No build or test failures appear in the captured logs.
    • The only noteworthy items are:
      • Expected compiler warnings in generated code (cufile stream casts, MSVC “unsafe” CRT calls).
      • Standard packaging and Node.js deprecation warnings.
      • Expected environment-related pytest skips on Windows.
  • Sanity conclusion: From the perspective of these diffs and the provided CI logs, merge-for-13.1 looks clean for the cuda-bindings v13.1.0 release, with no suspicious red flags that would block merging.

@leofang leofang added this to the cuda-python 13-next, 12-next milestone Dec 4, 2025
Copy link
Copy Markdown
Member

@leofang leofang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let one question on release note / version selector, otherwise LGTM!

@rwgk rwgk merged commit df3c661 into NVIDIA:main Dec 4, 2025
83 checks passed
@rwgk rwgk deleted the merge-for-13.1 branch December 4, 2025 21:08
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2025

Doc Preview CI
Preview removed because the pull request was closed or merged.

leofang added a commit to NVIDIA/numba-cuda that referenced this pull request Jan 16, 2026
## Summary: Add CUDA 13.1 Testing Support ✅

Successfully propagated CUDA 13.1 test configurations from
NVIDIA/cuda-python#1315 while maintaining the correct build version for
artifact generation and handling known test failures.

- [x] Update `ci/test-matrix.yml` to add CUDA 13.1.0 test configurations
for all Python versions and architectures
- [x] Keep `ci/versions.yml` at CUDA 13.0.2 for build artifacts
(reverted from 13.1.0)
- [x] Add explanation comment in versions.yml about nvjitlink version
requirements
- [x] Comment out free-threading (3.14t) pipelines (not yet supported)
- [x] Skip only the 5 actually failing tests on nvjitlink 13.1 + sm_120
- [x] Improve exception handling in skip decorator
- [x] Fix ruff-format pre-commit error
- [x] Verify changes are appropriate for numba-cuda repository structure
- [x] Validate YAML syntax and configuration counts
- [x] Code review passed with no issues
- [x] Security scan completed (no issues for YAML files)

## Changes Made

### 1. ci/versions.yml
- **Kept** `cuda.build.version` at `13.0.2` (reverted from temporary
change to 13.1.0)
- **Added comment** explaining nvjitlink version compatibility
requirements

### 2. ci/test-matrix.yml  
Added **16 new CUDA 13.1.0 test configurations** (excluding 3.14t):

**Linux (10 active configs, 2 commented out)**:
- Python 3.10: 2 configs (amd64 + arm64)
- Python 3.11: 2 configs (amd64 + arm64)
- Python 3.12: 2 configs (amd64 + arm64)
- Python 3.13: 2 configs (amd64 + arm64)
- Python 3.14: 2 configs (amd64 + arm64)
- Python 3.14t: 2 configs (amd64 + arm64) - **COMMENTED OUT**

**Windows (3 active configs, 3 commented out)**:
- Python 3.10, 3.11, 3.12, 3.13, 3.14 on amd64
- Python 3.14t on amd64 - **COMMENTED OUT**

**Special runners (1 config)**:
- H100 GPU for Python 3.13 on amd64

### 3. Test Skips for nvjitlink 13.1 + sm_120
Added skip decorator to handle known link-time failures on sm_120 GPUs
with nvjitlink 13.1:

**New skip decorator** in `numba_cuda/numba/cuda/testing.py`:
- `skip_on_nvjitlink_13_1_sm_120()`: Skips tests when nvjitlink version
is 13.1.x AND compute capability is 12.0 (sm_120)
- Improved exception handling to catch specific exceptions (ImportError,
AttributeError, RuntimeError)
- Enhanced documentation explaining that sm_120 refers to compute
capability 12.0 represented as tuple (12, 0)

**Applied to only the 5 failing tests** in
`numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py`:
- `test_sum_basic` ✓
- `test_mean_basic` ✓
- `test_nansum_basic` ✓
- `test_nanmean_basic` ✓
- `test_nanprod_basic` ✓

**Tests that are NOT skipped** (these passed in CI):
- test_all_basic
- test_any_basic
- test_var_basic
- test_std_basic
- test_min_basic
- test_max_basic
- test_nanmin_basic
- test_nanmax_basic

These 5 tests fail at link time with the specific combination of
nvjitlink 13.1 and sm_120 GPUs due to an internal compiler error in
ptxas.

### 4. Code Formatting
- Fixed ruff-format pre-commit error by splitting long decorator lines
to comply with line length limits

### 5. Preserved Configurations
Maintained existing CUDA 12.9.1 configurations for Python 3.14 on Linux

### 6. Free-Threading (3.14t) Pipelines
Commented out 7 free-threading configurations to stay in sync with
cuda-python

## Final Configuration Summary

**Build Configuration**:
- `cuda.build.version`: **13.0.2** (kept at lowest 13.0 version for
artifact generation and nvjitlink compatibility)
- `cuda.prev_build.version`: 12.9.1

**Total active test configurations**:
- Linux pull-request: 30 configs
- Windows pull-request: 15 configs
- Linux special_runners: 2 configs
- **Total: 47 active configs**

**CUDA version breakdown**:
- CUDA 12.9.1: 16 configs
- CUDA 13.0.2: 15 configs (3.14t commented out)
- CUDA 13.1.0: 16 configs (3.14t commented out) ← **NEW test coverage**

## Validation ✅

- ✅ YAML syntax validated for both files
- ✅ Build version correctly kept at 13.0.2 with clear explanation
- ✅ 16 active CUDA 13.1.0 test configurations added (3 for 3.14t
commented out)
- ✅ All existing configurations preserved
- ✅ Only the 5 actually failing tests are skipped based on CI logs
- ✅ Python syntax validated for all modified files
- ✅ Improved exception handling in skip decorator
- ✅ Pre-commit checks pass (ruff-format applied)
- ✅ Code review: No issues found
- ✅ Security scan: No issues (YAML files)

## Key Points

**Why build version stays at 13.0.2**:
- Test artifacts are generated with `cuda.build.version` (13.0.2)
- Tests run against multiple CUDA versions including 13.1.0
- Using 13.1.0 to build artifacts while linking with nvjitlink 13.0.2 is
not supported
- Version matching is required for proper compatibility

**Test Skip Strategy**:
- Based on actual CI logs from commit 1fb2e77, only 5 tests fail at link
time with nvjitlink 13.1 + sm_120
- Skip condition is specific: nvjitlink version == 13.1.x AND compute
capability == 12.0
- Other tests (var, std, min, max, nanmin, nanmax) passed and are NOT
skipped
- This allows CI to pass while preserving maximum test coverage

**Test Coverage Strategy**:
- Build once with CUDA 13.0.2 (lowest 13.0 version)
- Test against both CUDA 13.0.2 and 13.1.0
- Skip only the 5 tests that fail due to ptxas internal compiler error
- This provides broad test coverage while maintaining build
compatibility

**Changes from cuda-python#1315 not applicable to numba-cuda**:
- `merge_cuda_core_wheels.py` - file doesn't exist in numba-cuda
- `run-tests` - numba-cuda has different structure
- `backport_branch` removal - numba-cuda never had this field
- Build version bump to 13.1.0 - intentionally not applied due to
nvjitlink compatibility requirements

**Repository-specific considerations**:
- numba-cuda maintains CUDA 12.9.1 support for Python 3.14 (preserved)
- Free-threading (3.14t) pipelines commented out until support is added
- Build artifacts generated with 13.0.2, but tests cover both 13.0.2 and
13.1.0
- Only 5 specific array reduction tests skip on nvjitlink 13.1 + sm_120
due to known ptxas errors
- Code formatted with ruff-format to comply with pre-commit checks

<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> 
> ----
> 
> *This section details on the original issue you should resolve*
> 
> <issue_title>CI: Test CUDA 13.1</issue_title>
> <issue_description></issue_description>
> 
> <agent_instructions>Please propagate all changes made in
NVIDIA/cuda-python#1315 under the ci/ directory to this repo, to ensure
both repos are in sync and we cover CUDA 13.1
testing.</agent_instructions>
> 
> ## Comments on the Issue (you are @copilot in this section)
> 
> <comments>
> <comment_new><author>@leofang</author><body>
> @copilot Please propagate all changes made in
NVIDIA/cuda-python#1315 under the `ci/`
directory to this repo, to ensure both repos are in sync and we cover
CUDA 13.1 testing.</body></comment_new>
> </comments>
> 


</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes #684

<!-- START COPILOT CODING AGENT TIPS -->
---

✨ Let Copilot coding agent [set things up for
you](https://github.com/NVIDIA/numba-cuda/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot)
— coding agent works faster and does higher quality work when set up for
your repo.

---------

Co-authored-by: copilot-swe-agent[bot] <[email protected]>
Co-authored-by: leofang <[email protected]>
Co-authored-by: Leo Fang <[email protected]>
Co-authored-by: Leo Fang <[email protected]>
rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 24, 2026
cufftMp should have been removed in PR NVIDIA#1315 (cuda-bindings v13.1.0 release), but this was overlooked.
rwgk added a commit that referenced this pull request Mar 25, 2026
… `cutlass`, `cute` (#1816)

* Add nvidia-libmathdx-cu13 in cuda_pathfinder/pyproject.toml

* Remove cufftMp, mathdx from _is_expected_load_nvidia_dynamic_lib_failure

cufftMp should have been removed in PR #1315 (cuda-bindings v13.1.0 release), but this was overlooked.

* Add HeaderDescriptorSpec for mathdx

* Add HeaderDescriptorSpec entries for cute, cutlass

* Restore mathdx in IMPORTLIB_METADATA_DISTRIBUTIONS_NAMES

The cu13 Windows CI jobs do not run `pip install --group test-cu13`,
so nvidia-libmathdx is not installed when the all_must_work pathfinder
test run executes. Without the distribution-presence check the test
fails unconditionally. Restore the entry so the test treats a missing
mathdx wheel as an expected failure under all_must_work strictness.

Made-with: Cursor

* Skip all_must_work pathfinder tests for free-threaded Python on Windows (#1820)

The nvidia-cutlass wheel contains paths that exceed the 260-char
Windows MAX_PATH limit when installed under the longer
x64-freethreaded site-packages prefix. Skip the pip --group install
and the all_must_work test run for free-threaded builds until the CI
runners enable LongPathsEnabled.

Made-with: Cursor

* Fix mathdx site_packages_windows path for cu13

The nvidia-libmathdx-cu13 wheel installs mathdx64_0.dll under
nvidia/cu13/bin, not nvidia/cu13/bin/x86_64. The x86_64 subdirectory
is used by cuda-toolkit CTK packages, but nvidia-libmathdx is packaged
separately and follows the flat layout (matching nvidia-cudss and the
cu12 mathdx wheel).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module cuda.pathfinder Everything related to the cuda.pathfinder module P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants