Merge changes for cuda-bindings 13.1.0 release#1315
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
…ions.json, with 12.9.5 release notes)
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
User promptWe're at the final stage of making the cuda-bindings v13.1.0 release. The PR is under merge-for-13.1 here. CI passes. One teammate approved already. merge-for-13.1 is in sync with the main branch here. Could you please make yourself aware of the diff between main and merge-for-13.1, then scan the selected CI logs here for anything suspicious? ~/cuda-python_ci_run_19941271439 This is only for sanity checking. Assistant response (Diff sanity check)Diff sanity check (main ↔
|
leofang
left a comment
There was a problem hiding this comment.
Let one question on release note / version selector, otherwise LGTM!
|
## Summary: Add CUDA 13.1 Testing Support ✅ Successfully propagated CUDA 13.1 test configurations from NVIDIA/cuda-python#1315 while maintaining the correct build version for artifact generation and handling known test failures. - [x] Update `ci/test-matrix.yml` to add CUDA 13.1.0 test configurations for all Python versions and architectures - [x] Keep `ci/versions.yml` at CUDA 13.0.2 for build artifacts (reverted from 13.1.0) - [x] Add explanation comment in versions.yml about nvjitlink version requirements - [x] Comment out free-threading (3.14t) pipelines (not yet supported) - [x] Skip only the 5 actually failing tests on nvjitlink 13.1 + sm_120 - [x] Improve exception handling in skip decorator - [x] Fix ruff-format pre-commit error - [x] Verify changes are appropriate for numba-cuda repository structure - [x] Validate YAML syntax and configuration counts - [x] Code review passed with no issues - [x] Security scan completed (no issues for YAML files) ## Changes Made ### 1. ci/versions.yml - **Kept** `cuda.build.version` at `13.0.2` (reverted from temporary change to 13.1.0) - **Added comment** explaining nvjitlink version compatibility requirements ### 2. ci/test-matrix.yml Added **16 new CUDA 13.1.0 test configurations** (excluding 3.14t): **Linux (10 active configs, 2 commented out)**: - Python 3.10: 2 configs (amd64 + arm64) - Python 3.11: 2 configs (amd64 + arm64) - Python 3.12: 2 configs (amd64 + arm64) - Python 3.13: 2 configs (amd64 + arm64) - Python 3.14: 2 configs (amd64 + arm64) - Python 3.14t: 2 configs (amd64 + arm64) - **COMMENTED OUT** **Windows (3 active configs, 3 commented out)**: - Python 3.10, 3.11, 3.12, 3.13, 3.14 on amd64 - Python 3.14t on amd64 - **COMMENTED OUT** **Special runners (1 config)**: - H100 GPU for Python 3.13 on amd64 ### 3. Test Skips for nvjitlink 13.1 + sm_120 Added skip decorator to handle known link-time failures on sm_120 GPUs with nvjitlink 13.1: **New skip decorator** in `numba_cuda/numba/cuda/testing.py`: - `skip_on_nvjitlink_13_1_sm_120()`: Skips tests when nvjitlink version is 13.1.x AND compute capability is 12.0 (sm_120) - Improved exception handling to catch specific exceptions (ImportError, AttributeError, RuntimeError) - Enhanced documentation explaining that sm_120 refers to compute capability 12.0 represented as tuple (12, 0) **Applied to only the 5 failing tests** in `numba_cuda/numba/cuda/tests/cudapy/test_array_reductions.py`: - `test_sum_basic` ✓ - `test_mean_basic` ✓ - `test_nansum_basic` ✓ - `test_nanmean_basic` ✓ - `test_nanprod_basic` ✓ **Tests that are NOT skipped** (these passed in CI): - test_all_basic - test_any_basic - test_var_basic - test_std_basic - test_min_basic - test_max_basic - test_nanmin_basic - test_nanmax_basic These 5 tests fail at link time with the specific combination of nvjitlink 13.1 and sm_120 GPUs due to an internal compiler error in ptxas. ### 4. Code Formatting - Fixed ruff-format pre-commit error by splitting long decorator lines to comply with line length limits ### 5. Preserved Configurations Maintained existing CUDA 12.9.1 configurations for Python 3.14 on Linux ### 6. Free-Threading (3.14t) Pipelines Commented out 7 free-threading configurations to stay in sync with cuda-python ## Final Configuration Summary **Build Configuration**: - `cuda.build.version`: **13.0.2** (kept at lowest 13.0 version for artifact generation and nvjitlink compatibility) - `cuda.prev_build.version`: 12.9.1 **Total active test configurations**: - Linux pull-request: 30 configs - Windows pull-request: 15 configs - Linux special_runners: 2 configs - **Total: 47 active configs** **CUDA version breakdown**: - CUDA 12.9.1: 16 configs - CUDA 13.0.2: 15 configs (3.14t commented out) - CUDA 13.1.0: 16 configs (3.14t commented out) ← **NEW test coverage** ## Validation ✅ - ✅ YAML syntax validated for both files - ✅ Build version correctly kept at 13.0.2 with clear explanation - ✅ 16 active CUDA 13.1.0 test configurations added (3 for 3.14t commented out) - ✅ All existing configurations preserved - ✅ Only the 5 actually failing tests are skipped based on CI logs - ✅ Python syntax validated for all modified files - ✅ Improved exception handling in skip decorator - ✅ Pre-commit checks pass (ruff-format applied) - ✅ Code review: No issues found - ✅ Security scan: No issues (YAML files) ## Key Points **Why build version stays at 13.0.2**: - Test artifacts are generated with `cuda.build.version` (13.0.2) - Tests run against multiple CUDA versions including 13.1.0 - Using 13.1.0 to build artifacts while linking with nvjitlink 13.0.2 is not supported - Version matching is required for proper compatibility **Test Skip Strategy**: - Based on actual CI logs from commit 1fb2e77, only 5 tests fail at link time with nvjitlink 13.1 + sm_120 - Skip condition is specific: nvjitlink version == 13.1.x AND compute capability == 12.0 - Other tests (var, std, min, max, nanmin, nanmax) passed and are NOT skipped - This allows CI to pass while preserving maximum test coverage **Test Coverage Strategy**: - Build once with CUDA 13.0.2 (lowest 13.0 version) - Test against both CUDA 13.0.2 and 13.1.0 - Skip only the 5 tests that fail due to ptxas internal compiler error - This provides broad test coverage while maintaining build compatibility **Changes from cuda-python#1315 not applicable to numba-cuda**: - `merge_cuda_core_wheels.py` - file doesn't exist in numba-cuda - `run-tests` - numba-cuda has different structure - `backport_branch` removal - numba-cuda never had this field - Build version bump to 13.1.0 - intentionally not applied due to nvjitlink compatibility requirements **Repository-specific considerations**: - numba-cuda maintains CUDA 12.9.1 support for Python 3.14 (preserved) - Free-threading (3.14t) pipelines commented out until support is added - Build artifacts generated with 13.0.2, but tests cover both 13.0.2 and 13.1.0 - Only 5 specific array reduction tests skip on nvjitlink 13.1 + sm_120 due to known ptxas errors - Code formatted with ruff-format to comply with pre-commit checks <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>CI: Test CUDA 13.1</issue_title> > <issue_description></issue_description> > > <agent_instructions>Please propagate all changes made in NVIDIA/cuda-python#1315 under the ci/ directory to this repo, to ensure both repos are in sync and we cover CUDA 13.1 testing.</agent_instructions> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > <comment_new><author>@leofang</author><body> > @copilot Please propagate all changes made in NVIDIA/cuda-python#1315 under the `ci/` directory to this repo, to ensure both repos are in sync and we cover CUDA 13.1 testing.</body></comment_new> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #684 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/NVIDIA/numba-cuda/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: leofang <[email protected]> Co-authored-by: Leo Fang <[email protected]> Co-authored-by: Leo Fang <[email protected]>
cufftMp should have been removed in PR NVIDIA#1315 (cuda-bindings v13.1.0 release), but this was overlooked.
… `cutlass`, `cute` (#1816) * Add nvidia-libmathdx-cu13 in cuda_pathfinder/pyproject.toml * Remove cufftMp, mathdx from _is_expected_load_nvidia_dynamic_lib_failure cufftMp should have been removed in PR #1315 (cuda-bindings v13.1.0 release), but this was overlooked. * Add HeaderDescriptorSpec for mathdx * Add HeaderDescriptorSpec entries for cute, cutlass * Restore mathdx in IMPORTLIB_METADATA_DISTRIBUTIONS_NAMES The cu13 Windows CI jobs do not run `pip install --group test-cu13`, so nvidia-libmathdx is not installed when the all_must_work pathfinder test run executes. Without the distribution-presence check the test fails unconditionally. Restore the entry so the test treats a missing mathdx wheel as an expected failure under all_must_work strictness. Made-with: Cursor * Skip all_must_work pathfinder tests for free-threaded Python on Windows (#1820) The nvidia-cutlass wheel contains paths that exceed the 260-char Windows MAX_PATH limit when installed under the longer x64-freethreaded site-packages prefix. Skip the pip --group install and the all_must_work test run for free-threaded builds until the CI runners enable LongPathsEnabled. Made-with: Cursor * Fix mathdx site_packages_windows path for cu13 The nvidia-libmathdx-cu13 wheel installs mathdx64_0.dll under nvidia/cu13/bin, not nvidia/cu13/bin/x86_64. The x86_64 subdirectory is used by cuda-toolkit CTK packages, but nvidia-libmathdx is packaged separately and follows the flat layout (matching nvidia-cudss and the cu12 mathdx wheel). Made-with: Cursor
Add support for new APIs introduced in CUDA 13.1
Prepare for cuda-pathfinder 1.3.3 release
cudart64_65.dllandcudart64_101.dllfrom supported_nvidia_libs.py; these outdated DLLs were included accidentally.