Add NVRTC PCH runtime APIs to cuda.core.Program by cpcloud · Pull Request #1689 · NVIDIA/cuda-python

cpcloud · 2026-02-25T14:58:54Z

Summary

Closes #670.

Exposes NVRTC 12.8+ precompiled header runtime APIs as Program methods:
get_pch_create_status(), get_pch_heap_size_required(), get_pch_heap_size() (static),
and set_pch_heap_size() (static).
All methods gate on the NVRTC backend and bindings version, raising clear RuntimeError
when unavailable.
The NVRTC 12.9 caching feature is already supported via the existing no_cache compile option
in ProgramOptions.

Test plan

test_cpp_program_pch_runtime_apis — compiles with create_pch, validates get_pch_create_status() and get_pch_heap_size_required() return values (skipped if NVRTC < 12.8)
test_cpp_program_pch_heap_size_apis — exercises get_pch_heap_size() / set_pch_heap_size() round-trip (skipped if NVRTC < 12.8)
test_cpp_program_pch_set_heap_size_rejects_negative — validates ValueError on negative input
test_cpp_program_pch_runtime_apis_require_nvrtc_backend — verifies RuntimeError when called on a non-NVRTC program

Made with Cursor

copy-pr-bot · 2026-02-25T14:58:58Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2026-02-25T15:30:12Z

/ok to test

kkraus14 · 2026-02-25T16:02:28Z

It feels like we're missing an opportunity to do a bit higher level of an abstraction here. Is there a way we could make using these PCH features more intuitive and delightful for Python developers?

cpcloud · 2026-02-25T16:23:23Z

Yep, playing around with it now.

cpcloud · 2026-02-25T17:21:34Z

/ok to test

cpcloud · 2026-02-25T19:14:57Z

/ok to test

cpcloud · 2026-02-25T20:56:02Z

@kkraus14 Thoughts on the DX now?

I've reworked it with a higher-level design:

Auto-retry on heap exhaustion: When create_pch is set in ProgramOptions, compile() now automatically checks the PCH creation status. If the heap was too small, it queries the required size, resizes the heap, creates a fresh NVRTC program, and retries. Users just set create_pch and it works.

pch_status property: Instead of exposing raw nvrtcResult enum values, program.pch_status returns a clean string: "created", "not_attempted", "failed", or None (when PCH wasn't requested or the program hasn't been compiled yet).

The workflow is now just:

program = Program(code, "c++", ProgramOptions(create_pch="my.pch"))
obj = program.compile("ptx")
assert program.pch_status == "created"

cuda_core/cuda/core/_program.pyx

cpcloud · 2026-02-26T10:19:50Z

/ok to test

cpcloud · 2026-02-26T12:55:51Z

/ok to test

cpcloud · 2026-02-26T15:19:28Z

@kkraus14 Ready for another round.

cuda_core/cuda/core/_program.pyx

kkraus14 · 2026-02-26T17:26:43Z

cuda_core/cuda/core/_program.pyx

+cdef bint _has_nvrtc_pch_apis():
+    global _nvrtc_pch_apis_cached
+    if _nvrtc_pch_apis_cached < 0:
+        _nvrtc_pch_apis_cached = hasattr(nvrtc, "nvrtcGetPCHCreateStatus")


I'm not sure if this is the right approach in hindsight. Someone could have cuda.bindings v12.9.5 that was built against a sufficiently new toolkit and then run it in an environment with an older libnvrtc.so, in which case I think this attribute exists on the nvrtc module, but returns a RuntimeError from failing to find the symbol at runtime.

Maybe we need to catch that potential RuntimeError somewhere and present something gracefully to the user?

The internal _inspect_function_pointers autogenerated by both codegens would serve this need. It offers the source of truth (if the function exists and can be loaded).

cuda_core/cuda/core/_program.pyx

cpcloud · 2026-02-26T21:20:52Z

/ok to test

When `create_pch` is set in ProgramOptions, compile() now automatically resizes the NVRTC PCH heap and retries with a fresh program when PCH creation fails due to heap exhaustion. The `pch_status` property reports the outcome ("created", "not_attempted", "failed", or None). Made-with: Cursor

Avoid repeated hasattr() calls on every compile by caching the result in a module-level sentinel. Made-with: Cursor

PCH is only relevant for code_type="c++" programs using NVRTC. Make this explicit in the docstring so PTX/NVVM users aren't confused. Made-with: Cursor

nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without error checking during the auto-retry. Route them through HANDLE_RETURN_NVRTC so failures raise NVRTCError. Made-with: Cursor

The --pch flag (automatic PCH mode) can also trigger PCH creation, not just --create-pch. Check both options when deciding whether to query PCH status and attempt auto-retry. Made-with: Cursor

When cuda.bindings is built against a newer toolkit but runs with an older libnvrtc.so that lacks the PCH C symbols, the binding wrappers exist (hasattr passes) but the actual call raises RuntimeError from failing to resolve the function pointer at runtime. Extract PCH status/retry logic into _pch_status_and_retry() and wrap the call in try/except RuntimeError so we gracefully degrade to pch_status=None instead of crashing. Made-with: Cursor

cpcloud · 2026-02-26T23:30:36Z

/ok to test

github-actions · 2026-02-27T03:02:19Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

cuda_core/docs/source/release/0.6.0-notes.rst

leofang · 2026-02-27T03:58:23Z

cuda_core/cuda/core/_program.pyx

+        return None  # sentinel: caller should auto-retry
+    if err == cynvrtc.nvrtcResult.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED:
+        return _PCH_STATUS_NOT_ATTEMPTED
+    return _PCH_STATUS_FAILED


I assume this return refers to NVRTC_ERROR_PCH_CREATE

leofang · 2026-02-27T04:01:44Z

cc @seberg for vis (who has been playing with PCH for CuPy: cupy/cupy#9714)

* Add PCH support to cuda.core.Program (NVIDIA#670) When `create_pch` is set in ProgramOptions, compile() now automatically resizes the NVRTC PCH heap and retries with a fresh program when PCH creation fails due to heap exhaustion. The `pch_status` property reports the outcome ("created", "not_attempted", "failed", or None). Made-with: Cursor * Cache _has_nvrtc_pch_apis() result Avoid repeated hasattr() calls on every compile by caching the result in a module-level sentinel. Made-with: Cursor * Document that pch_status returns None for non-NVRTC backends PCH is only relevant for code_type="c++" programs using NVRTC. Make this explicit in the docstring so PTX/NVVM users aren't confused. Made-with: Cursor * Check errors on PCH heap resize in retry path nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without error checking during the auto-retry. Route them through HANDLE_RETURN_NVRTC so failures raise NVRTCError. Made-with: Cursor * Check pch option in addition to create_pch for PCH status/retry The --pch flag (automatic PCH mode) can also trigger PCH creation, not just --create-pch. Check both options when deciding whether to query PCH status and attempt auto-retry. Made-with: Cursor * Catch RuntimeError from missing PCH symbols in old libnvrtc When cuda.bindings is built against a newer toolkit but runs with an older libnvrtc.so that lacks the PCH C symbols, the binding wrappers exist (hasattr passes) but the actual call raises RuntimeError from failing to resolve the function pointer at runtime. Extract PCH status/retry logic into _pch_status_and_retry() and wrap the call in try/except RuntimeError so we gracefully degrade to pch_status=None instead of crashing. Made-with: Cursor * chore: fix toml

cpcloud force-pushed the issue-670 branch from c963e20 to a398370 Compare February 25, 2026 15:28

cpcloud force-pushed the issue-670 branch from 73e250d to 045b33a Compare February 25, 2026 15:54

cpcloud force-pushed the issue-670 branch from 1e5aee0 to 579b86c Compare February 25, 2026 17:18

This comment has been minimized.

Sign in to view

cpcloud force-pushed the issue-670 branch from 579b86c to c5608f5 Compare February 25, 2026 19:14

cpcloud force-pushed the issue-670 branch from c5608f5 to b6991e7 Compare February 25, 2026 20:53

kkraus14 reviewed Feb 25, 2026

View reviewed changes

cuda_core/cuda/core/_program.pyx Outdated Show resolved Hide resolved

cuda_core/cuda/core/_program.pyx Outdated Show resolved Hide resolved

cuda_core/cuda/core/_program.pyx Outdated Show resolved Hide resolved

kkraus14 reviewed Feb 25, 2026

View reviewed changes

cuda_core/cuda/core/_program.pyx Outdated Show resolved Hide resolved

cpcloud force-pushed the issue-670 branch from b6991e7 to 35b5213 Compare February 26, 2026 10:01

cpcloud requested a review from kkraus14 February 26, 2026 16:14

kkraus14 reviewed Feb 26, 2026

View reviewed changes

cpcloud force-pushed the issue-670 branch from 1400b6b to b7b9259 Compare February 26, 2026 21:03

kkraus14 reviewed Feb 26, 2026

View reviewed changes

cuda_core/cuda/core/_program.pyx Show resolved Hide resolved

kkraus14 approved these changes Feb 26, 2026

View reviewed changes

cpcloud enabled auto-merge (squash) February 26, 2026 21:21

cpcloud force-pushed the issue-670 branch from c143407 to b7b9259 Compare February 26, 2026 22:01

cpcloud added 3 commits February 26, 2026 18:30

Cache _has_nvrtc_pch_apis() result

04feb00

Avoid repeated hasattr() calls on every compile by caching the result in a module-level sentinel. Made-with: Cursor

Document that pch_status returns None for non-NVRTC backends

871634c

PCH is only relevant for code_type="c++" programs using NVRTC. Make this explicit in the docstring so PTX/NVVM users aren't confused. Made-with: Cursor

cpcloud added 3 commits February 26, 2026 18:30

Check errors on PCH heap resize in retry path

3046fca

nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without error checking during the auto-retry. Route them through HANDLE_RETURN_NVRTC so failures raise NVRTCError. Made-with: Cursor

Check pch option in addition to create_pch for PCH status/retry

12d2551

The --pch flag (automatic PCH mode) can also trigger PCH creation, not just --create-pch. Check both options when deciding whether to query PCH status and attempt auto-retry. Made-with: Cursor

cpcloud force-pushed the issue-670 branch from b7b9259 to 886dcdd Compare February 26, 2026 23:30

cpcloud merged commit 7a36d70 into NVIDIA:main Feb 27, 2026
86 checks passed

leofang reviewed Feb 27, 2026

View reviewed changes

cuda_core/docs/source/release/0.6.0-notes.rst Show resolved Hide resolved

leofang reviewed Feb 27, 2026

View reviewed changes

leofang added this to the cuda.core v0.7.0 milestone Apr 7, 2026

leofang mentioned this pull request Apr 8, 2026

Prepare cuda.core v0.7.0 release #1877

Merged

3 tasks

Conversation

cpcloud commented Feb 25, 2026

Summary

Test plan

Uh oh!

copy-pr-bot bot commented Feb 25, 2026

Uh oh!

cpcloud commented Feb 25, 2026

Uh oh!

kkraus14 commented Feb 25, 2026

Uh oh!

cpcloud commented Feb 25, 2026

Uh oh!

cpcloud commented Feb 25, 2026

Uh oh!

This comment has been minimized.

cpcloud commented Feb 25, 2026

Uh oh!

cpcloud commented Feb 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cpcloud commented Feb 26, 2026

Uh oh!

cpcloud commented Feb 26, 2026

Uh oh!

cpcloud commented Feb 26, 2026

Uh oh!

Uh oh!

kkraus14 Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cpcloud commented Feb 26, 2026

Uh oh!

cpcloud commented Feb 26, 2026

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2026

Uh oh!

Uh oh!

leofang Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

leofang commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants