Skip to content

[LDS] Add pre-check to ensure all copies are DMA-convertible before converting any#23472

Merged
lialan merged 2 commits intomainfrom
users/lialan/async_guard
Feb 18, 2026
Merged

[LDS] Add pre-check to ensure all copies are DMA-convertible before converting any#23472
lialan merged 2 commits intomainfrom
users/lialan/async_guard

Conversation

@lialan
Copy link
Contributor

@lialan lialan commented Feb 12, 2026

This patch ensures we either use DMA for all promoted operands or none. This is to ensure we can have async buffer pipelines.

  • Add isCopyDMAConvertible() to check DMA viability without modifying IR.
  • In GPUConvertToCoalescedDMAPass, collect all linalg.copy ops with use_global_load_dma or derived_thread_config. If ALL are DMA-convertible, upgrade them all to use_global_load_dma. If ANY fails, downgrade them all to derived_thread_config.
  • Add new lit tests for paired copy conversion and mixed attribute scenarios.

@lialan lialan force-pushed the users/lialan/async_guard branch from 51829cb to 4611c5f Compare February 12, 2026 17:35
@lialan lialan changed the title [AMDGPU][LDS] Add pre-check to ensure all copies are DMA-convertible ... [LDS] Add pre-check to ensure all copies are DMA-convertible before converting any Feb 12, 2026
…before converting any.

* Add isCopyDMAConvertible() to check DMA viability without modifying IR.
* In GPUConvertToCoalescedDMAPass, collect all linalg.copy ops with
  use_global_load_dma or derived_thread_config. If ALL are DMA-convertible,
  upgrade them all to use_global_load_dma. If ANY fails, downgrade them all
  to derived_thread_config.
* Add new lit tests for paired copy conversion and mixed attribute scenarios.
@lialan lialan force-pushed the users/lialan/async_guard branch from 4611c5f to d3fe89e Compare February 12, 2026 18:02
@lialan lialan requested a review from jerryyin February 12, 2026 18:29
* Rename checkDMAAlignment to getDMAAlignedSubgroupSize for clarity.
* Add TODO about isCopyDMAConvertible vs tracesToTensorEmpty divergence.
* Make test comments more explicit about minElementsPerTransfer calculation.
* Add negative test for "no DMA intent" path (only derived_thread_config).

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@lialan lialan force-pushed the users/lialan/async_guard branch from 36bc0f0 to a74256e Compare February 12, 2026 18:57
@lialan lialan marked this pull request as ready for review February 12, 2026 19:42
@lialan lialan requested a review from krzysz00 February 12, 2026 19:58
Copy link
Contributor

@qedawkins qedawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

@lialan lialan merged commit 571048d into main Feb 18, 2026
61 checks passed
@lialan lialan deleted the users/lialan/async_guard branch February 18, 2026 17:46
MaheshRavishankar pushed a commit to MaheshRavishankar/iree that referenced this pull request Feb 24, 2026
…onverting any (iree-org#23472)

This patch ensures we either use DMA for all promoted operands or none.
This is to ensure we can have async buffer pipelines.

* Add isCopyDMAConvertible() to check DMA viability without modifying
IR.
* In GPUConvertToCoalescedDMAPass, collect all linalg.copy ops with
use_global_load_dma or derived_thread_config. If ALL are
DMA-convertible, upgrade them all to use_global_load_dma. If ANY fails,
downgrade them all to derived_thread_config.
* Add new lit tests for paired copy conversion and mixed attribute
scenarios.

---------

Co-authored-by: Claude Opus 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants