Reapply "[CPU] Support dynamic attention by tiling K1 when needed." by hanhanW · Pull Request #23544 · iree-org/iree

hanhanW · 2026-02-22T13:27:56Z

The K1 dimension (head_dim) in attention was unconditionally left untiled, which leads large stack allocation when the dimension is dynamic.

K1 is typically small (64/128 per AttentionOpDetail docs), so the original heuristic to leave it untiled was reasonable. The revision sets the tile sizes if the dimension is dynamic or it is not within typical range (<= 128).

E2E tests are added, and they have the same inputs and expected outputs like attention.mlir (which is a static version). Some backends, e.g., AMDGPU, does not support dynamic attention, so we create a new file. The test is enabled on CPU and VMVX backends in the revision.

Fixes #23277

Previously, it triggers a bug on android build. The root cause is that vectorization is not enabled (because masking is not natively supported), and it leads to non-trivial buffer allocation. Although forcing the emulation is not ideal, but it enables the functionality. The performance has not been prioritized for at least two years, so we accept the emulation as a workaround. The evidence is that the e2e tests are not enabled for at least two years for ARM CPU.

iree/tests/e2e/attention/CMakeLists.txt

Lines 1 to 5 in 243fe33

    
           # TODO: (#17751) Add the arm_64 tests when the bug resolved. See: 
        
           #   https://github.com/iree-org/iree/actions/runs/10468944505/job/28990909321#step:4:9815 
        
           if(IREE_ARCH STREQUAL "arm_64") 
        
             return() 
        
           endif()

…ree-org#23313) This reverts commit acbfa27. Signed-off-by: hanhanW <[email protected]>

Signed-off-by: hanhanW <[email protected]>

hanhanW · 2026-02-22T13:29:02Z

cc @banach-space @egebeysel it is a pure improvement as it enables the e2e compilation and execution.

hanhanW · 2026-02-22T13:29:47Z

#23318 removes the workaround, but there are more things to investigate. So I end up with having this workaround for now.

Groverkss · 2026-02-22T17:28:33Z

I don't have context on the cpu side changes, so LGTM because I already approved the attention changes

hanhanW · 2026-02-22T23:22:26Z

I don't have context on the cpu side changes, so LGTM because I already approved the attention changes

Can you help approve the change? Thanks.

hanhanW · 2026-02-22T23:24:43Z

I don't have context on the cpu side changes, so LGTM because I already approved the attention changes

For more context, this is not CPU specific issue. This is the bufferization when you can't vectorize the ops after you convert it to OnlineAttention and lower them to loops like the old issue: #16956

It requires further analysis for bufferization part. We did not fix it in the past, and my other PR improves the analysis. However, I'll need further investigation for the change: #23318

hanhanW requested a review from Groverkss February 22, 2026 13:27

hanhanW requested a review from bjacob as a code owner February 22, 2026 13:27

hanhanW added 2 commits February 22, 2026 05:28

Reapply "[CPU] Support dynamic attention by tiling K1 when needed." (i…

37ad0c6

…ree-org#23313) This reverts commit acbfa27. Signed-off-by: hanhanW <[email protected]>

Workaround for aarch64.

4bcd3e2

Signed-off-by: hanhanW <[email protected]>

hanhanW force-pushed the users/hanhanW/fix-dynamic-attention-2 branch from e0a4173 to 4bcd3e2 Compare February 22, 2026 13:28

Groverkss approved these changes Feb 23, 2026

View reviewed changes

hanhanW merged commit f30a96c into iree-org:main Feb 23, 2026
52 of 57 checks passed

hanhanW deleted the users/hanhanW/fix-dynamic-attention-2 branch February 23, 2026 22:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reapply "[CPU] Support dynamic attention by tiling K1 when needed." #23544

Reapply "[CPU] Support dynamic attention by tiling K1 when needed." #23544
hanhanW merged 2 commits intoiree-org:mainfrom
hanhanW:users/hanhanW/fix-dynamic-attention-2

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

Groverkss commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# TODO: (#17751) Add the arm_64 tests when the bug resolved. See:
	# https://github.com/iree-org/iree/actions/runs/10468944505/job/28990909321#step:4:9815
	if(IREE_ARCH STREQUAL "arm_64")
	return()
	endif()

Conversation

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

Groverkss commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

hanhanW commented Feb 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants