Merge changes for cuda-bindings 13.1.0 release by rwgk · Pull Request #1315 · NVIDIA/cuda-python

rwgk · 2025-12-04T19:19:51Z

Add support for new APIs introduced in CUDA 13.1
- https://developer.download.nvidia.com/compute/cuda/redist/redistrib_13.1.0.json
Prepare for cuda-pathfinder 1.3.3 release
- Purge support for CTK 11 (those became unsupported with the CTK 13.0 release in August 2025).
- Remove cudart64_65.dll and cudart64_101.dll from supported_nvidia_libs.py; these outdated DLLs were included accidentally.

copy-pr-bot · 2025-12-04T19:19:54Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…ions.json, with 12.9.5 release notes)

rwgk · 2025-12-04T19:24:48Z

/ok to test

.spdx-ignore

copy-pr-bot · 2025-12-04T20:14:20Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2025-12-04T20:48:17Z

User prompt

We're at the final stage of making the cuda-bindings v13.1.0 release.

The PR is under merge-for-13.1 here.

CI passes. One teammate approved already.

merge-for-13.1 is in sync with the main branch here.

Could you please make yourself aware of the diff between main and merge-for-13.1, then scan the selected CI logs here for anything suspicious?

~/cuda-python_ci_run_19941271439

This is only for sanity checking.

cuda-python_ci_run_19941271439/
├── build-linux-64.txt  [2025-12-04 12:34]  (1,321,111 bytes)
├── build-linux-aarch64.txt  [2025-12-04 12:34]  (1,320,103 bytes)
├── build-win-64.txt  [2025-12-04 12:35]  (2,194,293 bytes)
├── test-linux-64.txt  [2025-12-04 12:31]  (642,109 bytes)
├── test-linux-aarch64.txt  [2025-12-04 12:32]  (644,421 bytes)
└── test-win-64.txt  [2025-12-04 12:30]  (498,781 bytes)
    ____________________________________________________________
    Number of subdirs: 0
    Sum of file sizes: (6,620,818 bytes)

Assistant response (Diff sanity check)

Diff sanity check (main ↔ `merge-for-13.1`)

Scope of changes:
- CI/config: ci/test-matrix.yml adds coverage for CUDA 13.1.0 across Linux/Windows, x86_64/arm64, multiple Python versions; ci/versions.yml bumps the CUDA build version from 13.0.2 → 13.1.0.
- Bindings: Large updates in cuda_bindings/cuda/bindings/* (driver/runtime/nvrtc/cufile/etc.) to track CTK 13.1.0, including new APIs (e.g. multicast *_v2 variants) and adjusted types/flags; _version.py updated accordingly.
- Docs/release notes: New/renamed release notes files for 12.9.5 and 13.1.0, and module docs (driver.rst, runtime.rst, nvrtc.rst) updated for new APIs and behavior.
- Support matrix pruning: supported_nvidia_libs.py and related pathfinder data drop older 11.x-era libs and some very old DLLs, keeping 12/13-era sonames and DLLs.
- cuda-pathfinder: Version bumped 1.3.3a0 → 1.3.3, with new release notes and an extra cu13 dependency (nvidia-cufftmp-cu13), plus updated nv-versions JSON.
- Minor infra: .gitattributes extended (e.g. *.cmd EOL, *.svg binary), .spdx-ignore extended for a Draw.io SVG.

Nothing in the diff itself looks out-of-character for a CTK 13.1.0 cut: it’s the expected combination of new APIs, doc updates, CI matrix expansion, and support-matrix cleanup.

CI logs: build jobs

From build-linux-64.txt, build-linux-aarch64.txt, build-win-64.txt:

Errors / failures:
- Grep for FAILED, ERROR, Traceback, E across all build logs returned no matches indicative of build failures or exceptions.
- The only hits on that pattern were:
  - except OSError: around a symlink helper (which then falls back to shutil.copy2), and
  - C/C++ compiler warnings in generated cufile.cpp (see below), not errors.
Notable warnings (non-fatal, but worth being aware of):
- cufile stream type warnings (Linux x86_64 & aarch64):
  - Several warning: invalid conversion from 'void*' to 'CUstream' in generated cuda/bindings/cufile.cpp for:
    - cuFileReadAsync
    - cuFileWriteAsync
    - cuFileStreamRegister
    - cuFileStreamDeregister
  - These are compile-time warnings; builds still complete successfully and produce the .so modules.
  - Semantically, they reflect passing a void*-typed stream handle to functions expecting CUstream; if that’s intentional in your Cython layer, this is noisy but benign. If not, it might be something to tighten later, but it’s not blocking CI.
- Windows loader warnings in build-win-64.txt:
  - MSVC C4996 for wcstombs and strncat in cuda/bindings/_bindings/loader.cpp (“unsafe” CRT functions). Typical for older-style C APIs; compilation still succeeds.
  - C4551 warnings in cynvrtc.cpp about “function call missing argument list” — also warnings only; module still links and is used by tests.
- Setuptools packaging warnings:
  - _Warning: Package 'cuda.bindings' is absent from the packages configuration (and similarly for cuda.bindings._bindings, _internal, _lib) during wheel build on Windows. This pattern is consistent with using ext_modules and non-standard package layouts; since the build completes and tests run against installed wheels, these appear to be expected noise in your existing configuration.
- Node.js deprecation warnings:
  - Several [DEP0040] DeprecationWarning: The 'punycode' module is deprecated messages, coming from GitHub Actions tooling, not from your project.

Nothing in the build logs suggests incomplete artifacts, missing symbols, or link errors for the new 13.1.0 APIs.

CI logs: test jobs

From test-linux-64.txt, test-linux-aarch64.txt, test-win-64.txt:

Errors / failures:
- Grep for FAILED, ERROR, Traceback, and typical pytest failure-summary regex across all three test logs found:
  - No failures or errors in the two Linux test logs.
  - In test-win-64.txt, only a clean pytest summary with skips:
    - 190 passed, 4 skipped in 5.49s
    - Skips were:
      - tests/test_cufile.py: could not import cuda.bindings.cufile (no module in that environment).
      - tests/test_cudart.py: “cudart dynamic lib not available”.
      - tests/test_graphics_apis.py: GL context/texture could not be created (AttributeError: opengl32).
    - These match the usual pattern of environment-dependent tests (file I/O libs, cudart dynlib, OpenGL) being skipped when prerequisites aren’t present, not new regressions.
Warnings:
- No additional suspicious warnings (beyond the Node deprecations already mentioned for the build jobs) surfaced in the test logs.
- Pytest’s short-test-summary is clean other than the above skips; no failed/errors entries.

Overall sanity verdict

Diff review: Changes between main and merge-for-13.1 align with a normal CTK bump and binding refresh (new 13.1.0 APIs, CI matrix updates, docs and version bumps, and cleanup of old libs); nothing stands out as an obviously risky, off-topic, or accidental change.
CI logs:
- No build or test failures appear in the captured logs.
- The only noteworthy items are:
  - Expected compiler warnings in generated code (cufile stream casts, MSVC “unsafe” CRT calls).
  - Standard packaging and Node.js deprecation warnings.
  - Expected environment-related pytest skips on Windows.
Sanity conclusion: From the perspective of these diffs and the provided CI logs, merge-for-13.1 looks clean for the cuda-bindings v13.1.0 release, with no suspicious red flags that would block merging.

cuda_bindings/docs/nv-versions.json

leofang

Let one question on release note / version selector, otherwise LGTM!

github-actions · 2025-12-04T21:21:05Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang

## Summary: Add CUDA 13.1 Testing Support ✅ Successfully propagated CUDA 13.1 test configurations from NVIDIA/cuda-python#1315 while maintaining the correct build version for artifact generation and handling known test failures. - [x] Update `ci/test-matrix.yml` to add CUDA 13.1.0 test configurations for all Python versions and architectures - [x] Keep `ci/versions.yml` at CUDA 13.0.2 for build artifacts (reverted from 13.1.0) - [x] Add explanation comment in versions.yml about nvjitlink version requirements - [x] Comment out free-threading (3.14t) pipelines (not yet supported) - [x] Skip only the 5 actually failing tests on nvjitlink 13.1 + sm_120 - [x] Improve exception handling in skip decorator - [x] Fix ruff-format pre-commit error - [x] Verify changes are appropriate for numba-cuda repository structure - [x] Validate YAML syntax and configuration counts - [x] Code review passed with no issues - [x] Security scan completed (no issues for YAML files) ## Changes Made ### 1. ci/versions.yml - **Kept** `cuda.build.version` at `13.0.2` (reverted from temporary change to 13.1.0) - **Added comment** explaining nvjitlink version compatibility requirements ### 2. ci/test-matrix.yml Added **16 new CUDA 13.1.0 test configurations** (excluding 3.14t): **Linux (10 active configs, 2 commented out)**: - Python 3.10: 2 configs (amd64 + arm64) - Python 3.11: 2 configs (amd64 + arm64) - Python 3.12: 2 configs (amd64 + arm64) - Python 3.13: 2 configs (amd64 + arm64) - Python 3.14: 2 configs (amd64 + arm64) - Python 3.14t: 2 configs (amd64 + arm64) - **COMMENTED OUT** **Windows (3 active configs, 3 commented out)**: - Python 3.10, 3.11, 3.12, 3.13, 3.14 on amd64 - Python 3.14t on amd64 - **COMMENTED OUT** **Special runners (1 config)**: - H100 GPU for Python 3.13 on amd64 ### 3. Test Skips for nvjitlink 13.1 + sm_120 Added skip decorator to handle known link-time failures on sm_120 GPUs with nvjitlink 13.1: **New skip decorator** in `numba_cuda/numba/cuda/testing.py`: - `skip_on_nvjitlink_13_1_sm_120()`: Skips tests when nvjitlink version is 13.1.x AND compute capability is 12.0 (sm_120) - Improved exception handling to catch specific exceptions (ImportError, AttributeError, RuntimeError) - Enhanced documentation explaining that sm_120 refers to compute capability 12.0 represented as tuple (12, 0) **Applied to only the 5 failing tests** in `numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py`: - `test_sum_basic` ✓ - `test_mean_basic` ✓ - `test_nansum_basic` ✓ - `test_nanmean_basic` ✓ - `test_nanprod_basic` ✓ **Tests that are NOT skipped** (these passed in CI): - test_all_basic - test_any_basic - test_var_basic - test_std_basic - test_min_basic - test_max_basic - test_nanmin_basic - test_nanmax_basic These 5 tests fail at link time with the specific combination of nvjitlink 13.1 and sm_120 GPUs due to an internal compiler error in ptxas. ### 4. Code Formatting - Fixed ruff-format pre-commit error by splitting long decorator lines to comply with line length limits ### 5. Preserved Configurations Maintained existing CUDA 12.9.1 configurations for Python 3.14 on Linux ### 6. Free-Threading (3.14t) Pipelines Commented out 7 free-threading configurations to stay in sync with cuda-python ## Final Configuration Summary **Build Configuration**: - `cuda.build.version`: **13.0.2** (kept at lowest 13.0 version for artifact generation and nvjitlink compatibility) - `cuda.prev_build.version`: 12.9.1 **Total active test configurations**: - Linux pull-request: 30 configs - Windows pull-request: 15 configs - Linux special_runners: 2 configs - **Total: 47 active configs** **CUDA version breakdown**: - CUDA 12.9.1: 16 configs - CUDA 13.0.2: 15 configs (3.14t commented out) - CUDA 13.1.0: 16 configs (3.14t commented out) ← **NEW test coverage** ## Validation ✅ - ✅ YAML syntax validated for both files - ✅ Build version correctly kept at 13.0.2 with clear explanation - ✅ 16 active CUDA 13.1.0 test configurations added (3 for 3.14t commented out) - ✅ All existing configurations preserved - ✅ Only the 5 actually failing tests are skipped based on CI logs - ✅ Python syntax validated for all modified files - ✅ Improved exception handling in skip decorator - ✅ Pre-commit checks pass (ruff-format applied) - ✅ Code review: No issues found - ✅ Security scan: No issues (YAML files) ## Key Points **Why build version stays at 13.0.2**: - Test artifacts are generated with `cuda.build.version` (13.0.2) - Tests run against multiple CUDA versions including 13.1.0 - Using 13.1.0 to build artifacts while linking with nvjitlink 13.0.2 is not supported - Version matching is required for proper compatibility **Test Skip Strategy**: - Based on actual CI logs from commit 1fb2e77, only 5 tests fail at link time with nvjitlink 13.1 + sm_120 - Skip condition is specific: nvjitlink version == 13.1.x AND compute capability == 12.0 - Other tests (var, std, min, max, nanmin, nanmax) passed and are NOT skipped - This allows CI to pass while preserving maximum test coverage **Test Coverage Strategy**: - Build once with CUDA 13.0.2 (lowest 13.0 version) - Test against both CUDA 13.0.2 and 13.1.0 - Skip only the 5 tests that fail due to ptxas internal compiler error - This provides broad test coverage while maintaining build compatibility **Changes from cuda-python#1315 not applicable to numba-cuda**: - `merge_cuda_core_wheels.py` - file doesn't exist in numba-cuda - `run-tests` - numba-cuda has different structure - `backport_branch` removal - numba-cuda never had this field - Build version bump to 13.1.0 - intentionally not applied due to nvjitlink compatibility requirements **Repository-specific considerations**: - numba-cuda maintains CUDA 12.9.1 support for Python 3.14 (preserved) - Free-threading (3.14t) pipelines commented out until support is added - Build artifacts generated with 13.0.2, but tests cover both 13.0.2 and 13.1.0 - Only 5 specific array reduction tests skip on nvjitlink 13.1 + sm_120 due to known ptxas errors - Code formatted with ruff-format to comply with pre-commit checks  <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>CI: Test CUDA 13.1</issue_title> > <issue_description></issue_description> > > <agent_instructions>Please propagate all changes made in NVIDIA/cuda-python#1315 under the ci/ directory to this repo, to ensure both repos are in sync and we cover CUDA 13.1 testing.</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@leofang</author><body> > @copilot Please propagate all changes made in NVIDIA/cuda-python#1315 under the `ci/` directory to this repo, to ensure both repos are in sync and we cover CUDA 13.1 testing.</body></comment_new> > </comments> > </details>  - Fixes #684  --- ✨ Let Copilot coding agent [set things up for you](https://github.com/NVIDIA/numba-cuda/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: leofang <[email protected]> Co-authored-by: Leo Fang <[email protected]> Co-authored-by: Leo Fang <[email protected]>

cufftMp should have been removed in PR NVIDIA#1315 (cuda-bindings v13.1.0 release), but this was overlooked.

… `cutlass`, `cute` (#1816) * Add nvidia-libmathdx-cu13 in cuda_pathfinder/pyproject.toml * Remove cufftMp, mathdx from _is_expected_load_nvidia_dynamic_lib_failure cufftMp should have been removed in PR #1315 (cuda-bindings v13.1.0 release), but this was overlooked. * Add HeaderDescriptorSpec for mathdx * Add HeaderDescriptorSpec entries for cute, cutlass * Restore mathdx in IMPORTLIB_METADATA_DISTRIBUTIONS_NAMES The cu13 Windows CI jobs do not run `pip install --group test-cu13`, so nvidia-libmathdx is not installed when the all_must_work pathfinder test run executes. Without the distribution-presence check the test fails unconditionally. Restore the entry so the test treats a missing mathdx wheel as an expected failure under all_must_work strictness. Made-with: Cursor * Skip all_must_work pathfinder tests for free-threaded Python on Windows (#1820) The nvidia-cutlass wheel contains paths that exceed the 260-char Windows MAX_PATH limit when installed under the longer x64-freethreaded site-packages prefix. Skip the pip --group install and the all_must_work test run for free-threaded builds until the CI runners enable LongPathsEnabled. Made-with: Cursor * Fix mathdx site_packages_windows path for cu13 The nvidia-libmathdx-cu13 wheel installs mathdx64_0.dll under nvidia/cu13/bin, not nvidia/cu13/bin/x86_64. The x86_64 subdirectory is used by cuda-toolkit CTK packages, but nvidia-libmathdx is packaged separately and follows the flat layout (matching nvidia-cudss and the cu12 mathdx wheel). Made-with: Cursor

rwgk added 3 commits December 4, 2025 08:47

automatic code-gen changes: driver, runtime, nvrtc

1eaebec

automatic code-gen changes: cufile, nvjitlink, nvvm

540f9a3

accumulated changes from QA testing period

23490cc

rwgk added 2 commits December 4, 2025 11:21

cuda_pathfinder release notes and version

056af39

cuda_bindings release notes (without changing cuda_bindings/docs/vers…

636ca47

…ions.json, with 12.9.5 release notes)

rwgk force-pushed the merge-for-13.1 branch from 1966bf9 to 636ca47 Compare December 4, 2025 19:22

rwgk added 2 commits December 4, 2025 11:22

ci/ changes: adds 19 test jobs (for a new total of 52 test jobs)

57684c7

Add nvidia-cufftmp-cu13 in cuda_pathfinder/pyproject.toml

0d50ae2

kkraus14 reviewed Dec 4, 2025

View reviewed changes

.spdx-ignore Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rwgk self-assigned this Dec 4, 2025

rwgk added P0 High priority - Must do! cuda.bindings Everything related to the cuda.bindings module cuda.pathfinder Everything related to the cuda.pathfinder module labels Dec 4, 2025

rwgk marked this pull request as ready for review December 4, 2025 20:14

kkraus14 approved these changes Dec 4, 2025

View reviewed changes

leofang added this to the cuda-python 13-next, 12-next milestone Dec 4, 2025

leofang reviewed Dec 4, 2025

View reviewed changes

cuda_bindings/docs/nv-versions.json Show resolved Hide resolved

leofang approved these changes Dec 4, 2025

View reviewed changes

rwgk merged commit df3c661 into NVIDIA:main Dec 4, 2025
83 checks passed

rwgk deleted the merge-for-13.1 branch December 4, 2025 21:08

rwgk mentioned this pull request Dec 5, 2025

Add cuda_python/docs/source/release/13.1.0-notes.rst and update version selector #1325

Merged

leofang mentioned this pull request Dec 6, 2025

CI: Restore the old test matrix size #1335

Closed

leofang mentioned this pull request Jan 7, 2026

CI: Test CUDA 13.1 NVIDIA/numba-cuda#684

Closed

Copilot AI mentioned this pull request Jan 7, 2026

CI: Add CUDA 13.1 testing support NVIDIA/numba-cuda#705

Merged

11 tasks

rwgk added a commit to rwgk/cuda-python that referenced this pull request Mar 24, 2026

Remove cufftMp, mathdx from _is_expected_load_nvidia_dynamic_lib_failure

16427f8

cufftMp should have been removed in PR NVIDIA#1315 (cuda-bindings v13.1.0 release), but this was overlooked.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge changes for cuda-bindings 13.1.0 release#1315

Merge changes for cuda-bindings 13.1.0 release#1315
rwgk merged 7 commits intoNVIDIA:mainfrom
rwgk:merge-for-13.1

rwgk commented Dec 4, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

rwgk commented Dec 4, 2025

Uh oh!

Uh oh!

This comment has been minimized.

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

rwgk commented Dec 4, 2025

Uh oh!

Uh oh!

leofang left a comment

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rwgk commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

rwgk commented Dec 4, 2025

Uh oh!

Uh oh!

This comment has been minimized.

copy-pr-bot bot commented Dec 4, 2025

Uh oh!

rwgk commented Dec 4, 2025

User prompt

Assistant response (Diff sanity check)

Diff sanity check (main ↔ merge-for-13.1)

CI logs: build jobs

CI logs: test jobs

Overall sanity verdict

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rwgk commented Dec 4, 2025 •

edited

Loading

Diff sanity check (main ↔ `merge-for-13.1`)