Skip to content

Add NVRTC PCH runtime APIs to cuda.core.Program#1689

Merged
cpcloud merged 6 commits intoNVIDIA:mainfrom
cpcloud:issue-670
Feb 27, 2026
Merged

Add NVRTC PCH runtime APIs to cuda.core.Program#1689
cpcloud merged 6 commits intoNVIDIA:mainfrom
cpcloud:issue-670

Conversation

@cpcloud
Copy link
Copy Markdown
Contributor

@cpcloud cpcloud commented Feb 25, 2026

Summary

Closes #670.

  • Exposes NVRTC 12.8+ precompiled header runtime APIs as Program methods:
    get_pch_create_status(), get_pch_heap_size_required(), get_pch_heap_size() (static),
    and set_pch_heap_size() (static).
  • All methods gate on the NVRTC backend and bindings version, raising clear RuntimeError
    when unavailable.
  • The NVRTC 12.9 caching feature is already supported via the existing no_cache compile option
    in ProgramOptions.

Test plan

  • test_cpp_program_pch_runtime_apis — compiles with create_pch, validates get_pch_create_status() and get_pch_heap_size_required() return values (skipped if NVRTC < 12.8)
  • test_cpp_program_pch_heap_size_apis — exercises get_pch_heap_size() / set_pch_heap_size() round-trip (skipped if NVRTC < 12.8)
  • test_cpp_program_pch_set_heap_size_rejects_negative — validates ValueError on negative input
  • test_cpp_program_pch_runtime_apis_require_nvrtc_backend — verifies RuntimeError when called on a non-NVRTC program

Made with Cursor

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Feb 25, 2026

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 25, 2026

/ok to test

@kkraus14
Copy link
Copy Markdown
Collaborator

It feels like we're missing an opportunity to do a bit higher level of an abstraction here. Is there a way we could make using these PCH features more intuitive and delightful for Python developers?

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 25, 2026

Yep, playing around with it now.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 25, 2026

/ok to test

@github-actions

This comment has been minimized.

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 25, 2026

/ok to test

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 25, 2026

@kkraus14 Thoughts on the DX now?

I've reworked it with a higher-level design:

Auto-retry on heap exhaustion: When create_pch is set in ProgramOptions, compile() now automatically checks the PCH creation status. If the heap was too small, it queries the required size, resizes the heap, creates a fresh NVRTC program, and retries. Users just set create_pch and it works.

pch_status property: Instead of exposing raw nvrtcResult enum values, program.pch_status returns a clean string: "created", "not_attempted", "failed", or None (when PCH wasn't requested or the program hasn't been compiled yet).

The workflow is now just:

program = Program(code, "c++", ProgramOptions(create_pch="my.pch"))
obj = program.compile("ptx")
assert program.pch_status == "created"

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 26, 2026

/ok to test

1 similar comment
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 26, 2026

/ok to test

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 26, 2026

@kkraus14 Ready for another round.

@cpcloud cpcloud requested a review from kkraus14 February 26, 2026 16:14
cdef bint _has_nvrtc_pch_apis():
global _nvrtc_pch_apis_cached
if _nvrtc_pch_apis_cached < 0:
_nvrtc_pch_apis_cached = hasattr(nvrtc, "nvrtcGetPCHCreateStatus")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the right approach in hindsight. Someone could have cuda.bindings v12.9.5 that was built against a sufficiently new toolkit and then run it in an environment with an older libnvrtc.so, in which case I think this attribute exists on the nvrtc module, but returns a RuntimeError from failing to find the symbol at runtime.

Maybe we need to catch that potential RuntimeError somewhere and present something gracefully to the user?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal _inspect_function_pointers autogenerated by both codegens would serve this need. It offers the source of truth (if the function exists and can be loaded).

@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 26, 2026

/ok to test

@cpcloud cpcloud enabled auto-merge (squash) February 26, 2026 21:21
When `create_pch` is set in ProgramOptions, compile() now automatically
resizes the NVRTC PCH heap and retries with a fresh program when PCH
creation fails due to heap exhaustion. The `pch_status` property reports
the outcome ("created", "not_attempted", "failed", or None).

Made-with: Cursor
Avoid repeated hasattr() calls on every compile by caching the result
in a module-level sentinel.

Made-with: Cursor
PCH is only relevant for code_type="c++" programs using NVRTC. Make
this explicit in the docstring so PTX/NVVM users aren't confused.

Made-with: Cursor
nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without
error checking during the auto-retry. Route them through
HANDLE_RETURN_NVRTC so failures raise NVRTCError.

Made-with: Cursor
The --pch flag (automatic PCH mode) can also trigger PCH creation,
not just --create-pch. Check both options when deciding whether to
query PCH status and attempt auto-retry.

Made-with: Cursor
When cuda.bindings is built against a newer toolkit but runs with an
older libnvrtc.so that lacks the PCH C symbols, the binding wrappers
exist (hasattr passes) but the actual call raises RuntimeError from
failing to resolve the function pointer at runtime.

Extract PCH status/retry logic into _pch_status_and_retry() and wrap
the call in try/except RuntimeError so we gracefully degrade to
pch_status=None instead of crashing.

Made-with: Cursor
@cpcloud
Copy link
Copy Markdown
Contributor Author

cpcloud commented Feb 26, 2026

/ok to test

@cpcloud cpcloud merged commit 7a36d70 into NVIDIA:main Feb 27, 2026
86 checks passed
@github-actions
Copy link
Copy Markdown

Doc Preview CI
Preview removed because the pull request was closed or merged.

return None # sentinel: caller should auto-retry
if err == cynvrtc.nvrtcResult.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED:
return _PCH_STATUS_NOT_ATTEMPTED
return _PCH_STATUS_FAILED
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this return refers to NVRTC_ERROR_PCH_CREATE

@leofang
Copy link
Copy Markdown
Member

leofang commented Feb 27, 2026

cc @seberg for vis (who has been playing with PCH for CuPy: cupy/cupy#9714)

cpcloud added a commit to cpcloud/cuda-python that referenced this pull request Mar 3, 2026
* Add PCH support to cuda.core.Program (NVIDIA#670)

When `create_pch` is set in ProgramOptions, compile() now automatically
resizes the NVRTC PCH heap and retries with a fresh program when PCH
creation fails due to heap exhaustion. The `pch_status` property reports
the outcome ("created", "not_attempted", "failed", or None).

Made-with: Cursor

* Cache _has_nvrtc_pch_apis() result

Avoid repeated hasattr() calls on every compile by caching the result
in a module-level sentinel.

Made-with: Cursor

* Document that pch_status returns None for non-NVRTC backends

PCH is only relevant for code_type="c++" programs using NVRTC. Make
this explicit in the docstring so PTX/NVVM users aren't confused.

Made-with: Cursor

* Check errors on PCH heap resize in retry path

nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without
error checking during the auto-retry. Route them through
HANDLE_RETURN_NVRTC so failures raise NVRTCError.

Made-with: Cursor

* Check pch option in addition to create_pch for PCH status/retry

The --pch flag (automatic PCH mode) can also trigger PCH creation,
not just --create-pch. Check both options when deciding whether to
query PCH status and attempt auto-retry.

Made-with: Cursor

* Catch RuntimeError from missing PCH symbols in old libnvrtc

When cuda.bindings is built against a newer toolkit but runs with an
older libnvrtc.so that lacks the PCH C symbols, the binding wrappers
exist (hasattr passes) but the actual call raises RuntimeError from
failing to resolve the function pointer at runtime.

Extract PCH status/retry logic into _pch_status_and_retry() and wrap
the call in try/except RuntimeError so we gracefully degrade to
pch_status=None instead of crashing.

Made-with: Cursor

* chore: fix toml
@leofang leofang added this to the cuda.core v0.7.0 milestone Apr 7, 2026
@leofang leofang mentioned this pull request Apr 8, 2026
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA]: Support latest NVRTC features in cuda.core.Program

3 participants