Add NVRTC PCH runtime APIs to cuda.core.Program#1689
Conversation
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
|
It feels like we're missing an opportunity to do a bit higher level of an abstraction here. Is there a way we could make using these PCH features more intuitive and delightful for Python developers? |
|
Yep, playing around with it now. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
/ok to test |
|
@kkraus14 Thoughts on the DX now? I've reworked it with a higher-level design: Auto-retry on heap exhaustion: When
The workflow is now just: program = Program(code, "c++", ProgramOptions(create_pch="my.pch"))
obj = program.compile("ptx")
assert program.pch_status == "created" |
|
/ok to test |
1 similar comment
|
/ok to test |
|
@kkraus14 Ready for another round. |
| cdef bint _has_nvrtc_pch_apis(): | ||
| global _nvrtc_pch_apis_cached | ||
| if _nvrtc_pch_apis_cached < 0: | ||
| _nvrtc_pch_apis_cached = hasattr(nvrtc, "nvrtcGetPCHCreateStatus") |
There was a problem hiding this comment.
I'm not sure if this is the right approach in hindsight. Someone could have cuda.bindings v12.9.5 that was built against a sufficiently new toolkit and then run it in an environment with an older libnvrtc.so, in which case I think this attribute exists on the nvrtc module, but returns a RuntimeError from failing to find the symbol at runtime.
Maybe we need to catch that potential RuntimeError somewhere and present something gracefully to the user?
There was a problem hiding this comment.
The internal _inspect_function_pointers autogenerated by both codegens would serve this need. It offers the source of truth (if the function exists and can be loaded).
|
/ok to test |
When `create_pch` is set in ProgramOptions, compile() now automatically
resizes the NVRTC PCH heap and retries with a fresh program when PCH
creation fails due to heap exhaustion. The `pch_status` property reports
the outcome ("created", "not_attempted", "failed", or None).
Made-with: Cursor
Avoid repeated hasattr() calls on every compile by caching the result in a module-level sentinel. Made-with: Cursor
PCH is only relevant for code_type="c++" programs using NVRTC. Make this explicit in the docstring so PTX/NVVM users aren't confused. Made-with: Cursor
nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without error checking during the auto-retry. Route them through HANDLE_RETURN_NVRTC so failures raise NVRTCError. Made-with: Cursor
The --pch flag (automatic PCH mode) can also trigger PCH creation, not just --create-pch. Check both options when deciding whether to query PCH status and attempt auto-retry. Made-with: Cursor
When cuda.bindings is built against a newer toolkit but runs with an older libnvrtc.so that lacks the PCH C symbols, the binding wrappers exist (hasattr passes) but the actual call raises RuntimeError from failing to resolve the function pointer at runtime. Extract PCH status/retry logic into _pch_status_and_retry() and wrap the call in try/except RuntimeError so we gracefully degrade to pch_status=None instead of crashing. Made-with: Cursor
|
/ok to test |
|
| return None # sentinel: caller should auto-retry | ||
| if err == cynvrtc.nvrtcResult.NVRTC_ERROR_NO_PCH_CREATE_ATTEMPTED: | ||
| return _PCH_STATUS_NOT_ATTEMPTED | ||
| return _PCH_STATUS_FAILED |
There was a problem hiding this comment.
I assume this return refers to NVRTC_ERROR_PCH_CREATE
|
cc @seberg for vis (who has been playing with PCH for CuPy: cupy/cupy#9714) |
* Add PCH support to cuda.core.Program (NVIDIA#670) When `create_pch` is set in ProgramOptions, compile() now automatically resizes the NVRTC PCH heap and retries with a fresh program when PCH creation fails due to heap exhaustion. The `pch_status` property reports the outcome ("created", "not_attempted", "failed", or None). Made-with: Cursor * Cache _has_nvrtc_pch_apis() result Avoid repeated hasattr() calls on every compile by caching the result in a module-level sentinel. Made-with: Cursor * Document that pch_status returns None for non-NVRTC backends PCH is only relevant for code_type="c++" programs using NVRTC. Make this explicit in the docstring so PTX/NVVM users aren't confused. Made-with: Cursor * Check errors on PCH heap resize in retry path nvrtcGetPCHHeapSizeRequired and nvrtcSetPCHHeapSize were called without error checking during the auto-retry. Route them through HANDLE_RETURN_NVRTC so failures raise NVRTCError. Made-with: Cursor * Check pch option in addition to create_pch for PCH status/retry The --pch flag (automatic PCH mode) can also trigger PCH creation, not just --create-pch. Check both options when deciding whether to query PCH status and attempt auto-retry. Made-with: Cursor * Catch RuntimeError from missing PCH symbols in old libnvrtc When cuda.bindings is built against a newer toolkit but runs with an older libnvrtc.so that lacks the PCH C symbols, the binding wrappers exist (hasattr passes) but the actual call raises RuntimeError from failing to resolve the function pointer at runtime. Extract PCH status/retry logic into _pch_status_and_retry() and wrap the call in try/except RuntimeError so we gracefully degrade to pch_status=None instead of crashing. Made-with: Cursor * chore: fix toml
Summary
Closes #670.
Programmethods:get_pch_create_status(),get_pch_heap_size_required(),get_pch_heap_size()(static),and
set_pch_heap_size()(static).RuntimeErrorwhen unavailable.
no_cachecompile optionin
ProgramOptions.Test plan
test_cpp_program_pch_runtime_apis— compiles withcreate_pch, validatesget_pch_create_status()andget_pch_heap_size_required()return values (skipped if NVRTC < 12.8)test_cpp_program_pch_heap_size_apis— exercisesget_pch_heap_size()/set_pch_heap_size()round-trip (skipped if NVRTC < 12.8)test_cpp_program_pch_set_heap_size_rejects_negative— validatesValueErroron negative inputtest_cpp_program_pch_runtime_apis_require_nvrtc_backend— verifiesRuntimeErrorwhen called on a non-NVRTC programMade with Cursor