Skip to content

[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769

Closed
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:guard_host_managed-memory_access_on_CMA_zero
Closed

[WIP] Guard host managed-memory access on concurrentManagedAccess=0#1769
rwgk wants to merge 1 commit intoNVIDIA:mainfrom
rwgk:guard_host_managed-memory_access_on_CMA_zero

Conversation

@rwgk
Copy link
Copy Markdown
Collaborator

@rwgk rwgk commented Mar 16, 2026

xref: #1576 (comment)

This PR is:

  1. Guard host managed-memory access on CMA=0
    Add a small helper (in helpers/buffers.py) that calls Device.sync() (or
    otherwise ensures no work is in flight) before any host memset/memcmp of
    managed memory when concurrentManagedAccess == 0. This is targeted and
    keeps behavior unchanged on CMA=1 systems.

Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the
device before touching managed allocations.

Made-with: Cursor
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Mar 16, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

There are no flakes in 100 trials with this PR at commit b611a87:

smc120-0009.ipp2a2.colossus.nvidia.com:/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_pr1769_commit_001_b611a870 $ analyze_qa_tests_logs.py trial*log.txt
================================================================================
QA Test Logs Analysis Summary
================================================================================

Total files analyzed: 100
Files with no flakes (all passed): 100
Files with failures: 0
Files with errors: 0
Files with crashes: 0

✓ All files have no flakes, errors, or crashes - all tests passed!

================================================================================
Overall Statistics
================================================================================

Total tests passed (across all files): 325916
Total tests failed (across all files): 0
Total tests skipped (across all files): 28084
Total test errors (across all files): 0

================================================================================
SKIPPED Summary
================================================================================

   800  SKIPPED [1] tests\test_nvfatbin.py:304 - nvcc found on PATH but failed to compile a trivial input.
   600  SKIPPED [1] tests\example_tests\utils.py:43: skip C - \Users\rgrossekunst\wrk\forked\cuda-python\cuda_core\tests\example_tests\..\..\examples\strided_memory_view_cpu.py
   200  SKIPPED [1] tests\example_tests\utils.py:37 - torch not installed, skipping related tests
   200  SKIPPED [1] tests\nvml\test_compute_mode.py:20 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_device.py:148 - No permission to set power management limit
   200  SKIPPED [1] tests\nvml\test_device.py:165 - No permission to set temperature threshold
   200  SKIPPED [1] tests\nvml\test_init.py:38 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:47 - device doesn't support ECC for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:75 - page_retirement not supported for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_pynvml.py:53 - No MIG devices found
   200  SKIPPED [1] tests\test_cufile.py:19: could not import 'cuda.bindings.cufile' - No module named 'cuda.bindings.cufile'
   200  SKIPPED [2] tests\nvml\test_pynvml.py:66 - Not supported on WSL or Windows
   200  SKIPPED [2] tests\nvml\test_pynvml.py:77 - Not supported on WSL or Windows
   200  SKIPPED [6] tests\example_tests\utils.py:37 - cupy not installed, skipping related tests
   200  SKIPPED [9] cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED (Two or more
   100  SKIPPED [18] tests\test_utils.py:486: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] examples\0_Introduction\simpleP2P_test.py:48 - Two or more GPUs with Peer-to-Peer access capability are required
   100  SKIPPED [1] examples\0_Introduction\systemWideAtomics_test.py:172 - Atomics not supported on Windows
   100  SKIPPED [1] tests\graph\test_device_launch.py:133 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\graph\test_device_launch.py:81 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\memory_ipc\test_event_ipc.py:98 - Device does not support IPC
   100  SKIPPED [1] tests\memory_ipc\test_peer_access.py:23 - Test requires at least N GPUs
   100  SKIPPED [1] tests\system\test_nvml_context.py:54 - Probably a non-WSL system
   100  SKIPPED [1] tests\system\test_system_device.py:107 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:263 - Events not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:313 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:343 - Persistence mode not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:407 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:462 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:477 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:492 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:502 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:97 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_events.py:16 - System events not supported on WSL or Windows
   100  SKIPPED [1] tests\test_device.py:367 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_device.py:415 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_launcher.py:123 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_launcher.py:93 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_linker.py:114 - nvjitlink requires lto for ptx linking
   100  SKIPPED [1] tests\test_linker.py:204 - driver backend test
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:105 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:137 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:58 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_memory.py:1188 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1272 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1299 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1449 - Driver rejects IPC-enabled mempool creation on this platform
   100  SKIPPED [1] tests\test_memory.py:864 - This test requires a device that doesn't support GPU Direct RDMA
   100  SKIPPED [1] tests\test_memory_peer_access.py:14 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:147 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:51 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:84 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_module.py:405 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:101 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:124 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:22 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:49 - Device does not support IPC
   100  SKIPPED [1] tests\test_program.py:211 - device_float128 requires sm_100 or later
   100  SKIPPED [1] tests\test_utils.py - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:220 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:510: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] tests\test_utils.py:618 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils_env_vars.py:135 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [1] tests\test_utils_env_vars.py:173 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [24] tests\memory_ipc\test_leaks.py:82 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [27] tests\conftest.py:57 - Device does not support managed memory pool operations
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:114 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:21 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_ipc_duplicate_import.py:64 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_leaks.py:26 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:111 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:162 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:18 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:60 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_peer_access.py:62 - Test requires at least N GPUs
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:20 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:72 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:138 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:26 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:82 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:112 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:30 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:67 - Device does not support IPC
   100  SKIPPED [2] tests\test_launcher.py:285 - cupy not installed
   100  SKIPPED [2] tests\test_module.py:390 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [2] tests\test_object_protocols.py:317 - requires multi-GPU
   100  SKIPPED [2] tests\test_object_protocols.py:357 - requires multi-GPU
   100  SKIPPED [2] tests\test_utils.py:453 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:634 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:665 - PyTorch is not installed
   100  SKIPPED [2] tests\test_utils.py:702 - CuPy is not installed
   100  SKIPPED [3] tests\graph\test_capture_alloc.py:149 - auto_free_on_launch not supported on Windows
   100  SKIPPED [3] tests\test_utils.py - got empty parameter set for (in_arr, use_stream)
   100  SKIPPED [4] tests\test_utils.py:416 - CuPy is not installed
   100  SKIPPED [6] ..\cuda_bindings\cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED [7] tests\test_module.py:346 - Test requires numba to be installed
   100  SKIPPED [8] tests\memory_ipc\test_errors.py:22 - Device does not support IPC
   100  SKIPPED [8] tests\memory_ipc\test_event_ipc.py:132 - Device does not support IPC
    73  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: TypeError - Argument 'itemsize' has incorrect type (expected int, got getset_descriptor)
    30  SKIPPED [1] tests\test_graphics.py:126: Could not create GL context/texture: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     7  SKIPPED [3] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [12] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [4] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     5  SKIPPED [5] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [15] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [2] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [7] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [8] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [6] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [11] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [13] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [9] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [10] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [14] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.

Additional sanity check: grep -a OSError *.txt (no output)

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

Surprise: There are also no flakes with main at commit 3ed5217 (what this PR is based on):

smc120-0009.ipp2a2.colossus.nvidia.com:/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_cuda-python_main_at_3ed52171 $ analyze_qa_tests_logs.py trial*log.txt
================================================================================
QA Test Logs Analysis Summary
================================================================================

Total files analyzed: 100
Files with no flakes (all passed): 100
Files with failures: 0
Files with errors: 0
Files with crashes: 0

✓ All files have no flakes, errors, or crashes - all tests passed!

================================================================================
Overall Statistics
================================================================================

Total tests passed (across all files): 325987
Total tests failed (across all files): 0
Total tests skipped (across all files): 28015
Total test errors (across all files): 0

================================================================================
SKIPPED Summary
================================================================================

   800  SKIPPED [1] tests\test_nvfatbin.py:304 - nvcc found on PATH but failed to compile a trivial input.
   600  SKIPPED [1] tests\example_tests\utils.py:43: skip C - \Users\rgrossekunst\wrk\forked\cuda-python\cuda_core\tests\example_tests\..\..\examples\thread_block_cluster.py
   200  SKIPPED [1] tests\example_tests\utils.py:37 - torch not installed, skipping related tests
   200  SKIPPED [1] tests\nvml\test_compute_mode.py:20 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_device.py:148 - No permission to set power management limit
   200  SKIPPED [1] tests\nvml\test_device.py:165 - No permission to set temperature threshold
   200  SKIPPED [1] tests\nvml\test_init.py:38 - Test not supported on Windows
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:47 - device doesn't support ECC for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_page_retirement.py:75 - page_retirement not supported for NNNNNNNNNNNNNNN
   200  SKIPPED [1] tests\nvml\test_pynvml.py:53 - No MIG devices found
   200  SKIPPED [1] tests\test_cufile.py:19: could not import 'cuda.bindings.cufile' - No module named 'cuda.bindings.cufile'
   200  SKIPPED [2] tests\nvml\test_pynvml.py:66 - Not supported on WSL or Windows
   200  SKIPPED [2] tests\nvml\test_pynvml.py:77 - Not supported on WSL or Windows
   200  SKIPPED [6] tests\example_tests\utils.py:37 - cupy not installed, skipping related tests
   200  SKIPPED [9] cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED (Two or more
   100  SKIPPED [18] tests\test_utils.py:486: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] examples\0_Introduction\simpleP2P_test.py:48 - Two or more GPUs with Peer-to-Peer access capability are required
   100  SKIPPED [1] examples\0_Introduction\systemWideAtomics_test.py:172 - Atomics not supported on Windows
   100  SKIPPED [1] tests\graph\test_device_launch.py:133 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\graph\test_device_launch.py:81 - Device-side graph launch requires Hopper (sm_90+) architecture
   100  SKIPPED [1] tests\memory_ipc\test_event_ipc.py:98 - Device does not support IPC
   100  SKIPPED [1] tests\memory_ipc\test_peer_access.py:23 - Test requires at least N GPUs
   100  SKIPPED [1] tests\system\test_nvml_context.py:54 - Probably a non-WSL system
   100  SKIPPED [1] tests\system\test_system_device.py:107 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:263 - Events not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:313 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:343 - Persistence mode not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:407 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:462 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:477 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:492 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:502 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_device.py:97 - Device attributes not supported on WSL or Windows
   100  SKIPPED [1] tests\system\test_system_events.py:16 - System events not supported on WSL or Windows
   100  SKIPPED [1] tests\test_device.py:367 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_device.py:415 - Test requires at least 2 CUDA devices
   100  SKIPPED [1] tests\test_launcher.py:123 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_launcher.py:93 - Driver or GPU not new enough for thread block clusters
   100  SKIPPED [1] tests\test_linker.py:114 - nvjitlink requires lto for ptx linking
   100  SKIPPED [1] tests\test_linker.py:204 - driver backend test
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:105 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:137 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_load_nvidia_dynamic_lib_using_mocker.py:58 - Windows support for cupti not yet implemented
   100  SKIPPED [1] tests\test_memory.py:1188 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1272 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1299 - IPC not implemented for Windows
   100  SKIPPED [1] tests\test_memory.py:1449 - Driver rejects IPC-enabled mempool creation on this platform
   100  SKIPPED [1] tests\test_memory.py:864 - This test requires a device that doesn't support GPU Direct RDMA
   100  SKIPPED [1] tests\test_memory_peer_access.py:14 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:147 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:51 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_memory_peer_access.py:84 - Test requires at least N GPUs
   100  SKIPPED [1] tests\test_module.py:405 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:101 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:124 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:22 - Device does not support IPC
   100  SKIPPED [1] tests\test_multiprocessing_warning.py:49 - Device does not support IPC
   100  SKIPPED [1] tests\test_program.py:211 - device_float128 requires sm_100 or later
   100  SKIPPED [1] tests\test_utils.py - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:220 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils.py:510: could not import 'cupy' - No module named 'cupy'
   100  SKIPPED [1] tests\test_utils.py:618 - CuPy is not installed
   100  SKIPPED [1] tests\test_utils_env_vars.py:135 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [1] tests\test_utils_env_vars.py:173 - Exercising symlinks intentionally omitted for simplicity
   100  SKIPPED [24] tests\memory_ipc\test_leaks.py:82 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [27] tests\conftest.py:57 - Device does not support managed memory pool operations
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:114 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_event_ipc.py:21 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_ipc_duplicate_import.py:64 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_leaks.py:26 - mempool allocation handle is not using fds or psutil is unavailable
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:111 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:162 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:18 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_memory_ipc.py:60 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_peer_access.py:62 - Test requires at least N GPUs
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:20 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_send_buffers.py:72 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:138 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:26 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_serialize.py:82 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:112 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:30 - Device does not support IPC
   100  SKIPPED [2] tests\memory_ipc\test_workerpool.py:67 - Device does not support IPC
   100  SKIPPED [2] tests\test_launcher.py:285 - cupy not installed
   100  SKIPPED [2] tests\test_module.py:390 - Device with compute capability 90 or higher is required for cluster support
   100  SKIPPED [2] tests\test_object_protocols.py:317 - requires multi-GPU
   100  SKIPPED [2] tests\test_object_protocols.py:357 - requires multi-GPU
   100  SKIPPED [2] tests\test_utils.py:453 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:634 - CuPy is not installed
   100  SKIPPED [2] tests\test_utils.py:665 - PyTorch is not installed
   100  SKIPPED [2] tests\test_utils.py:702 - CuPy is not installed
   100  SKIPPED [3] tests\graph\test_capture_alloc.py:149 - auto_free_on_launch not supported on Windows
   100  SKIPPED [3] tests\test_utils.py - got empty parameter set for (in_arr, use_stream)
   100  SKIPPED [4] tests\test_utils.py:416 - CuPy is not installed
   100  SKIPPED [6] ..\cuda_bindings\cuda\bindings\_test_helpers\arch_check.py:55 - Unsupported call for device architecture AMPERE on device 'NVIDIA RTX ANNNN'
   100  SKIPPED [7] tests\test_module.py:346 - Test requires numba to be installed
   100  SKIPPED [8] tests\memory_ipc\test_errors.py:22 - Device does not support IPC
   100  SKIPPED [8] tests\memory_ipc\test_event_ipc.py:132 - Device does not support IPC
    73  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: TypeError - Argument 'itemsize' has incorrect type (expected int, got getset_descriptor)
    22  SKIPPED [1] tests\test_graphics.py:126: Could not create GL context/texture: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     7  SKIPPED [9] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     6  SKIPPED [4] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [1] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [2] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     4  SKIPPED [7] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [10] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [11] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [12] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [3] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     3  SKIPPED [8] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     2  SKIPPED [13] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [14] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [15] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.
     1  SKIPPED [6] tests\test_graphics.py:62: Could not create GL context/buffer: CUDAError: CUDA_ERROR_INVALID_CONTEXT: This most frequently indicates that there is no context bound to the current thread. This can also be returned if the context passed to an API call is not a valid handle (such as a context that has had ::cuCtxDestroy() invoked on it). This can also be returned if a user mixes different API versions (i.e. 3010 context with 3020 API calls). See ::cuCtxGetApiVersion() for more details. This can also be returned if the green context passed to an API call was not converted to a ::CUcontext using : - cuCtxFromGreenCtx API.

Additional sanity check: grep -a OSError *.txt (no output)

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

Note:

I did not rebuild between running the tests reported under

I.e. everything was exactly identical, except for the presence/absence of commit b611a87. This is reflected in all log files, e.g.:

/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_pr1769_commit_001_b611a870/trial_001_2026-03-16+105613_log.txt

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager log -n 1
commit b611a8705cded7a6b83ad9fb518198fb71503fcd
Author: Ralf W. Grosse-Kunstleve <[email protected]>
Date:   Mon Mar 16 10:36:36 2026 -0700

    Sync device before host access to managed buffers

    Guard host-side memset/memcmp in test helpers on CMA=0 by syncing the
    device before touching managed allocations.

    Made-with: Cursor

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager status
On branch guard_host_managed-memory_access_on_CMA_zero
Your branch is up to date with 'origin/guard_host_managed-memory_access_on_CMA_zero'.

nothing to commit, working tree clean

/home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs/qa_tests_multi_cuda-python_main_at_3ed52171/trial_001_2026-03-16+131641_log.txt

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager log -n 1
commit 3ed52171d2fdbbec08a393c00e1d7ae7e5b16d7d
Author: Andy Jost <[email protected]>
Date:   Mon Mar 16 08:39:35 2026 -0700

    Infrastructure changes preparing for explicit graph construction (#1762)

C:\Users\rgrossekunst\wrk\forked\cuda-python>git --no-pager status
On branch main
Your branch is up to date with 'upstream/main'.

nothing to commit, working tree clean

@rwgk
Copy link
Copy Markdown
Collaborator Author

rwgk commented Mar 16, 2026

I don't know what changed, but I cannot reproduce the flakes anymore. All details are in the log files under /home/scratch.rgrossekunst_sw/logs_mirror/rdc-gitbash/logs. (See cuda-python-private issues 235 and 245 for pointers to log files with flakes.)

Closing this PR and #1576 for now. If we see the flakes again later, we can come back here.

@rwgk rwgk closed this Mar 16, 2026
@rwgk rwgk deleted the guard_host_managed-memory_access_on_CMA_zero branch March 16, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant