[WIP] Guard host managed-memory access on concurrentManagedAccess=0 by rwgk · Pull Request #1769 · NVIDIA/cuda-python

rwgk · 2026-03-16T17:39:37Z

This PR is:

Guard host managed-memory access on CMA=0
Add a small helper (in helpers/buffers.py) that calls Device.sync() (or
otherwise ensures no work is in flight) before any host memset/memcmp of
managed memory when concurrentManagedAccess == 0. This is targeted and
keeps behavior unchanged on CMA=1 systems.

Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the device before touching managed allocations. Made-with: Cursor

copy-pr-bot · 2026-03-16T17:39:46Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

rwgk · 2026-03-16T20:30:02Z

There are no flakes in 100 trials with this PR at commit b611a87:

smc120-0009.ipp2a2.colossus.nvidia.com:/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_pr1769_commit_001_b611a870 $ analyze_qa_tests_logs.py trial*log.txt
================================================================================
QA Test Logs Analysis Summary
================================================================================

Total files analyzed: 100
Files with no flakes (all passed): 100
Files with failures: 0
Files with errors: 0
Files with crashes: 0

✓ All files have no flakes, errors, or crashes - all tests passed!

================================================================================
Overall Statistics
================================================================================

Total tests passed (across all files): 325916
Total tests failed (across all files): 0
Total tests skipped (across all files): 28084
Total test errors (across all files): 0

================================================================================
SKIPPED Summary
================================================================================

   800  SKIPPED [1] tests\test_nvfatbin.py:304 - nvcc found on PATH but failed to compile a trivial input.
   600  SKIPPED [1] tests\example_tests\utils.py:43: skip C - \Users\rgrossekunst\wrk\forked\cuda-python\cuda_core\tests\example_tests\..\..\examples\strided_memory_view_cpu.py
   200  SKIPPED [1] tests\example_tests\utils.py:37 - torch not installed, skipping related tests
   200  SKIPPED [1] tests\nvml\test_compute_mode.py:20 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_device.py:148 - No permission to set power management limit
   200  SKIPPED [1] tests\nvml\test_device.py:165 - No permission to set temperature threshold
   200  SKIPPED [1] tests\nvml\test_init.py:38 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:47 - device doesn't support ECC for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:75 - page_retirement not supported for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_pynvml.py:53 - No MIG devices found
   200  SKIPPED [1] tests\test_cufile.py:19: could not import 'cuda.bindings.cufile' - No module named 'cuda.bindings.cufile'
   200  SKIPPED [2] tests\nvml\test_pynvml.py:66 - Not supported on WSL or Windows
   200  SKIPPED [2] tests\nvml\test_pynvml.py:77 - Not supported on WSL or Windows
   200  SKIPPED [6] tests\example_tests\utils.py:37 - cupy not installed, skipping related tests
   200  SKIPPED [9] cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED (Two or more
   100  SKIPPED [18] tests\test_utils.py:486: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] examples\0_Introduction\simpleP2P_test.py:48 - Two or more GPUs with Peer-to-Peer access capability are required
   100  SKIPPED [1] examples\0_Introduction\systemWideAtomics_test.py:172 - Atomics not supported on Windows
   100  SKIPPED [1] tests\graph\test_device_launch.py:133 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\graph\test_device_launch.py:81 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\memory_ipc\test_event_ipc.py:98 - Device does not support IPC
   100  SKIPPED [1] tests\memory_ipc\test_peer_access.py:23 - Test requires at least N GPUs
   100  SKIPPED [1] tests\system\test_nvml_context.py:54 - Probably a non-WSL system
   100  SKIPPED [1] tests\system\test_system_device.py:107 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:263 - Events not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:313 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:343 - Persistence mode not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:407 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:462 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:477 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:492 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:502 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:97 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_events.py:16 - System events not supported on WSL or Windows
   100  SKIPPED [1] tests\test_device.py:367 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_device.py:415 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_launcher.py:123 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_launcher.py:93 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_linker.py:114 - nvjitlink requires lto for ptx linking
   100  SKIPPED [1] tests\test_linker.py:204 - driver backend test
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:105 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:137 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:58 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_memory.py:1188 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1272 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1299 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1449 - Driver rejects IPC-enabled mempool creation on this platform
   100  SKIPPED [1] tests\test_memory.py:864 - This test requires a device that doesn't support GPU Direct RDMA
   100  SKIPPED [1] tests\test_memory_peer_access.py:14 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:147 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:51 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:84 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_module.py:405 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:101 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:124 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:22 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:49 - Device does not support IPC
   100  SKIPPED [1] tests\test_program.py:211 - device_float128 requires sm_100 or later
   100  SKIPPED [1] tests\test_utils.py - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:220 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:510: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] tests\test_utils.py:618 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils_env_vars.py:135 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [1] tests\test_utils_env_vars.py:173 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [24] tests\memory_ipc\test_leaks.py:82 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [27] tests\conftest.py:57 - Device does not support managed memory pool operations
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:114 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:21 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_ipc_duplicate_import.py:64 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_leaks.py:26 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:111 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:162 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:18 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:60 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_peer_access.py:62 - Test requires at least N GPUs
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:20 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:72 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:138 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:26 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:82 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:112 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:30 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:67 - Device does not support IPC
   100  SKIPPED [2] tests\test_launcher.py:285 - cupy not installed
   100  SKIPPED [2] tests\test_module.py:390 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [2] tests\test_object_protocols.py:317 - requires multi-GPU
   100  SKIPPED [2] tests\test_object_protocols.py:357 - requires multi-GPU
   100  SKIPPED [2] tests\test_utils.py:453 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:634 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:665 - PyTorch is not installed
   100  SKIPPED [2] tests\test_utils.py:702 - CuPy is not installed
   100  SKIPPED [3] tests\graph\test_capture_alloc.py:149 - auto_free_on_launch not supported on Windows
   100  SKIPPED [3] tests\test_utils.py - got empty parameter set for (in_arr, use_stream)
   100  SKIPPED [4] tests\test_utils.py:416 - CuPy is not installed
   100  SKIPPED [6] ..\cuda_bindings\cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED [7] tests\test_module.py:346 - Test requires numba to be installed
   100  SKIPPED [8] tests\memory_ipc\test_errors.py:22 - Device does not support IPC
   100  SKIPPED [8] tests\memory_ipc\test_event_ipc.py:132 - Device does not support IPC
    73  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: TypeError - Argument 'itemsize' has incorrect type (expected int, got getset_descriptor)
    30  SKIPPED [1] tests\test_graphics.py:126: Could not create GL context/texture: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     7  SKIPPED [3] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [12] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [4] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     5  SKIPPED [5] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [15] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [2] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [7] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [8] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [6] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [11] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [13] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [9] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [10] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [14] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.

Additional sanity check: grep -a OSError *.txt (no output)

rwgk · 2026-03-16T22:20:37Z

Surprise: There are also no flakes with main at commit 3ed5217 (what this PR is based on):

smc120-0009.ipp2a2.colossus.nvidia.com:/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_cuda-python_main_at_3ed52171 $ analyze_qa_tests_logs.py trial*log.txt
================================================================================
QA Test Logs Analysis Summary
================================================================================

Total files analyzed: 100
Files with no flakes (all passed): 100
Files with failures: 0
Files with errors: 0
Files with crashes: 0

✓ All files have no flakes, errors, or crashes - all tests passed!

================================================================================
Overall Statistics
================================================================================

Total tests passed (across all files): 325987
Total tests failed (across all files): 0
Total tests skipped (across all files): 28015
Total test errors (across all files): 0

================================================================================
SKIPPED Summary
================================================================================

   800  SKIPPED [1] tests\test_nvfatbin.py:304 - nvcc found on PATH but failed to compile a trivial input.
   600  SKIPPED [1] tests\example_tests\utils.py:43: skip C - \Users\rgrossekunst\wrk\forked\cuda-python\cuda_core\tests\example_tests\..\..\examples\thread_block_cluster.py
   200  SKIPPED [1] tests\example_tests\utils.py:37 - torch not installed, skipping related tests
   200  SKIPPED [1] tests\nvml\test_compute_mode.py:20 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_device.py:148 - No permission to set power management limit
   200  SKIPPED [1] tests\nvml\test_device.py:165 - No permission to set temperature threshold
   200  SKIPPED [1] tests\nvml\test_init.py:38 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:47 - device doesn't support ECC for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:75 - page_retirement not supported for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_pynvml.py:53 - No MIG devices found
   200  SKIPPED [1] tests\test_cufile.py:19: could not import 'cuda.bindings.cufile' - No module named 'cuda.bindings.cufile'
   200  SKIPPED [2] tests\nvml\test_pynvml.py:66 - Not supported on WSL or Windows
   200  SKIPPED [2] tests\nvml\test_pynvml.py:77 - Not supported on WSL or Windows
   200  SKIPPED [6] tests\example_tests\utils.py:37 - cupy not installed, skipping related tests
   200  SKIPPED [9] cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED (Two or more
   100  SKIPPED [18] tests\test_utils.py:486: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] examples\0_Introduction\simpleP2P_test.py:48 - Two or more GPUs with Peer-to-Peer access capability are required
   100  SKIPPED [1] examples\0_Introduction\systemWideAtomics_test.py:172 - Atomics not supported on Windows
   100  SKIPPED [1] tests\graph\test_device_launch.py:133 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\graph\test_device_launch.py:81 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\memory_ipc\test_event_ipc.py:98 - Device does not support IPC
   100  SKIPPED [1] tests\memory_ipc\test_peer_access.py:23 - Test requires at least N GPUs
   100  SKIPPED [1] tests\system\test_nvml_context.py:54 - Probably a non-WSL system
   100  SKIPPED [1] tests\system\test_system_device.py:107 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:263 - Events not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:313 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:343 - Persistence mode not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:407 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:462 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:477 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:492 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:502 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:97 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_events.py:16 - System events not supported on WSL or Windows
   100  SKIPPED [1] tests\test_device.py:367 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_device.py:415 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_launcher.py:123 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_launcher.py:93 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_linker.py:114 - nvjitlink requires lto for ptx linking
   100  SKIPPED [1] tests\test_linker.py:204 - driver backend test
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:105 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:137 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:58 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_memory.py:1188 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1272 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1299 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1449 - Driver rejects IPC-enabled mempool creation on this platform
   100  SKIPPED [1] tests\test_memory.py:864 - This test requires a device that doesn't support GPU Direct RDMA
   100  SKIPPED [1] tests\test_memory_peer_access.py:14 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:147 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:51 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:84 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_module.py:405 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:101 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:124 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:22 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:49 - Device does not support IPC
   100  SKIPPED [1] tests\test_program.py:211 - device_float128 requires sm_100 or later
   100  SKIPPED [1] tests\test_utils.py - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:220 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:510: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] tests\test_utils.py:618 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils_env_vars.py:135 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [1] tests\test_utils_env_vars.py:173 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [24] tests\memory_ipc\test_leaks.py:82 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [27] tests\conftest.py:57 - Device does not support managed memory pool operations
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:114 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:21 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_ipc_duplicate_import.py:64 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_leaks.py:26 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:111 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:162 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:18 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:60 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_peer_access.py:62 - Test requires at least N GPUs
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:20 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:72 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:138 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:26 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:82 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:112 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:30 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:67 - Device does not support IPC
   100  SKIPPED [2] tests\test_launcher.py:285 - cupy not installed
   100  SKIPPED [2] tests\test_module.py:390 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [2] tests\test_object_protocols.py:317 - requires multi-GPU
   100  SKIPPED [2] tests\test_object_protocols.py:357 - requires multi-GPU
   100  SKIPPED [2] tests\test_utils.py:453 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:634 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:665 - PyTorch is not installed
   100  SKIPPED [2] tests\test_utils.py:702 - CuPy is not installed
   100  SKIPPED [3] tests\graph\test_capture_alloc.py:149 - auto_free_on_launch not supported on Windows
   100  SKIPPED [3] tests\test_utils.py - got empty parameter set for (in_arr, use_stream)
   100  SKIPPED [4] tests\test_utils.py:416 - CuPy is not installed
   100  SKIPPED [6] ..\cuda_bindings\cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED [7] tests\test_module.py:346 - Test requires numba to be installed
   100  SKIPPED [8] tests\memory_ipc\test_errors.py:22 - Device does not support IPC
   100  SKIPPED [8] tests\memory_ipc\test_event_ipc.py:132 - Device does not support IPC
    73  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: TypeError - Argument 'itemsize' has incorrect type (expected int, got getset_descriptor)
    22  SKIPPED [1] tests\test_graphics.py:126: Could not create GL context/texture: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     7  SKIPPED [9] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [4] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [2] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [7] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [10] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [11] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [12] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [3] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [8] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [13] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [14] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [15] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [6] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.

Additional sanity check: grep -a OSError *.txt (no output)

rwgk · 2026-03-16T22:25:31Z

Note:

I did not rebuild between running the tests reported under

[WIP] Guard host managed-memory access on concurrentManagedAccess=0 #1769 (comment) (this PR)
[WIP] Guard host managed-memory access on concurrentManagedAccess=0 #1769 (comment) (main)

I.e. everything was exactly identical, except for the presence/absence of commit b611a87. This is reflected in all log files, e.g.:

/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_pr1769_commit_001_b611a870/trial_001_2026-03-16+105613_log.txt

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager log -n 1
commit b611a8705cded7a6b83ad9fb518198fb71503fcd
Author: Ralf W. Grosse-Kunstleve <[email protected]>
Date:   Mon Mar 16 10:36:36 2026 -0700

    Sync device before host access to managed buffers

    Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the
    device before touching managed allocations.

    Made-with: Cursor

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager status
On branch guard_host_managed-memory_access_on_CMA_zero
Your branch is up to date with 'origin/guard_host_managed-memory_access_on_CMA_zero'.

nothing to commit, working tree clean

/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_cuda-python_main_at_3ed52171/trial_001_2026-03-16+131641_log.txt

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager log -n 1
commit 3ed52171d2fdbbec08a393c00e1d7ae7e5b16d7d
Author: Andy Jost <[email protected]>
Date:   Mon Mar 16 08:39:35 2026 -0700

    Infrastructure changes preparing for explicit graph construction (#1762)

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager status
On branch main
Your branch is up to date with 'upstream/main'.

nothing to commit, working tree clean

rwgk · 2026-03-16T22:29:51Z

I don't know what changed, but I cannot reproduce the flakes anymore. All details are in the log files under /home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs. (See cuda-python-private issues 235 and 245 for pointers to log files with flakes.)

Closing this PR and #1576 for now. If we see the flakes again later, we can come back here.

Sync device before host access to managed buffers

b611a87

Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the device before touching managed allocations. Made-with: Cursor

rwgk closed this Mar 16, 2026

rwgk deleted the guard_host_managed-memory_access_on_CMA_zero branch March 16, 2026 22:30

rwgk mentioned this pull request Mar 16, 2026

[WIP] Skip tests using managed memory if CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS == 0 #1576

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769

[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:guard_host_managed-memory_access_on_CMA_zero

rwgk commented Mar 16, 2026

Uh oh!

copy-pr-bot bot commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rwgk commented Mar 16, 2026

Uh oh!

copy-pr-bot bot commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

rwgk commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant