Skip to content

[Impeller] Do not wait for a frame's acquire fence if the frame was never presented#183288

Merged
auto-submit[bot] merged 3 commits intoflutter:masterfrom
jason-simmons:b488786379
Mar 6, 2026
Merged

[Impeller] Do not wait for a frame's acquire fence if the frame was never presented#183288
auto-submit[bot] merged 3 commits intoflutter:masterfrom
jason-simmons:b488786379

Conversation

@jason-simmons
Copy link
Member

This could happen if a previous call to vkAcquireNextImageKHR failed. If that occurs, then AcquireNextDrawable will not return a surface. The caller will not render a frame or call Present, and the fence will never be signaled.

See internal issue b/488786379

…ever presented

This could happen if a previous call to vkAcquireNextImageKHR failed.  If that occurs, then AcquireNextDrawable will not return a surface. The caller will not render a frame or call Present, and the fence will never be signaled.

See internal issue b/488786379
@jason-simmons jason-simmons requested a review from gaaclarke March 5, 2026 23:26
@github-actions github-actions bot added engine flutter/engine related. See also e: labels. e: impeller Impeller rendering backend issues and features requests labels Mar 5, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a potential hang in the Vulkan backend. If vkAcquireNextImageKHR fails, the associated fence is never signaled, which could cause a deadlock on a subsequent frame. The fix introduces a flag, acquire_fence_pending, to track whether a wait on this fence is necessary. This flag is only set when a frame is successfully presented, ensuring that the wait is skipped if the image acquisition failed. The changes also include updates to the Vulkan mocks to allow for testing this scenario, and a new unit test is added to verify that the fence wait is correctly skipped after an acquisition failure.

return VK_SUCCESS;
}

static thread_local std::function<std::remove_pointer_t<PFN_vkWaitForFences>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use absl::NoDestructor

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the benefit of NoDestructor here?

Other similar state in the mock Vulkan implementation does not use NoDestructor. Skipping the destructor call could cause leaks during thread shutdown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a requirement of the c++ style guide. The tradeoffs and decision are enumerated here: https://google.github.io/styleguide/cppguide.html#Static_and_Global_Variables

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could zero it out in a teardown method for your test suite if you are concerned about the leak

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or adding a RAII wrapper...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables are thread-local and do not have static/global duration.

See https://google.github.io/styleguide/cppguide.html#thread_local - specifically the discussion of why removing the destructor call will result in resource leaks when the thread terminates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that the destructor will be called when the thread is killed. The concerns about statics still apply here.

Prefer trivial types, or types that provably run no user-provided code at destruction

The absl::NoDestructor makes the code satisfy that suggestion.

Relying on the thread destruction to clean up your variable is unreliable because the timing is difficult to manage. If instead of relying on the thread destruction to do your cleanup, you have a RAII wrapper or clear it out in the tearDown you have a clear deterministic time when it will happen and any temporal dependencies can be negotiated. It also makes it easier to know there is no leakage between tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thread-local variables used by the Vulkan mocks will be predictably destructed when the thread running the test terminates. The destructors of these objects do not have any dependencies on the order in which they are invoked.

This pattern has been used safely in the Vulkan mocks since your original implementation in flutter/engine@30d0cf8. This is how the parameters of the MockVulkanContextBuilder are made available to the functions exported by the mock Vulkan API resolver.

I agree that refactoring the lifecycle of the mock Vulkan state might be worthwhile (for example, creating a test fixture for mock Vulkan tests that cleans this state after each test case). But that's out of scope for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with jason offline. We are in agreement in what a better setup for mock_vulkan would be and outlined it in a followup issue: #183323

delete reinterpret_cast<MockSemaphore*>(semaphore);
}

static thread_local std::function<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absl::NoDestructor here too

@jason-simmons jason-simmons added the autosubmit Merge PR when tree becomes green via auto submit App label Mar 6, 2026
@auto-submit auto-submit bot added this pull request to the merge queue Mar 6, 2026
Merged via the queue into flutter:master with commit c5d0a84 Mar 6, 2026
184 of 185 checks passed
@flutter-dashboard flutter-dashboard bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Mar 6, 2026
@gaaclarke
Copy link
Member

@jason-simmons fyi I'm getting impeller validation errors when running this new test locally.

$ ../out/host_debug_unopt_arm64/impeller_unittests --gtest_filter="*NoFenceWaitAfterAcquireNextImageFailure*"
[INFO:flutter/testing/test_timeout_listener.cc(75)] Test timeout of 300 seconds per test case will be enforced.
Note: Google Test filter = *NoFenceWaitAfterAcquireNextImageFailure*
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from SwapchainTest
[ RUN      ] SwapchainTest.NoFenceWaitAfterAcquireNextImageFailure
[WARNING:flutter/impeller/renderer/backend/vulkan/driver_info_vk.cc(290)] Unknown GPU Driver Vendor: 0. This is not an error.
[ERROR:flutter/impeller/renderer/backend/vulkan/swapchain/khr/khr_swapchain_impl_vk.cc(396)] Break on 'ImpellerValidationBreak' to inspect point of failure: Could not acquire next swapchain image: ErrorSurfaceLostKHR
[ERROR:flutter/impeller/renderer/backend/vulkan/swapchain/khr/khr_swapchain_impl_vk.cc(396)] Break on 'ImpellerValidationBreak' to inspect point of failure: Could not acquire next swapchain image: ErrorSurfaceLostKHR
[       OK ] SwapchainTest.NoFenceWaitAfterAcquireNextImageFailure (1 ms)
[----------] 1 test from SwapchainTest (2 ms total)

@jason-simmons
Copy link
Member Author

Yes - that is intentional. The test is simulating an unrecoverable error in vkAcquireNextImageKHR that flows through this code path:
https://github.com/flutter/flutter/blob/master/engine/src/flutter/impeller/renderer/backend/vulkan/swapchain/khr/khr_swapchain_impl_vk.cc#L395

Is this causing an issue? Should the style of logging be changed for this class of errors?

@gaaclarke
Copy link
Member

I noticed it when I was refactoring the mockvulkan. I thought I introduced it. I've usually seen it used for non-recoverable errors so it is surprising if we continue executing fine.

At a minimum maybe we should add a log statement that shows up in that test that those errors are expected. That's going to trip people up who are looking at logs trying to figure out what went wrong.

@jason-simmons
Copy link
Member Author

Another option would be changing KHRSwapchainImplVK::AcquireNextDrawable to skip logging if the error code is VK_ERROR_SURFACE_LOST_KHR.

Android appears to be using this error code to represent situations that may be recoverable. In the issue that led to this PR, Android's implementation of vkAcquireNextImageKHR was returning VK_ERROR_SURFACE_LOST_KHR because it was unable to dequeue a buffer for the surface.

Android logged the error as vulkan : dequeueBuffer failed: Try again (-11), implying that it could be recoverable. And with this PR, the app was able to successfully retry the vkAcquireNextImageKHR on later frames and continue rendering.

@gaaclarke
Copy link
Member

Yea, that sounds good to me if it sounds good to you.

jason-simmons added a commit to jason-simmons/flutter that referenced this pull request Mar 7, 2026
…AcquireNextImageKHR

Android may use this error code to indicate situations that are recoverable
(such as a temporary failure to dequeue a buffer).  This PR removes this error
from the validation log to reduce noise in the logs.

See flutter#183288
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 7, 2026
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 7, 2026
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 7, 2026
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 8, 2026
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 8, 2026
engine-flutter-autoroll added a commit to engine-flutter-autoroll/packages that referenced this pull request Mar 9, 2026
auto-submit bot pushed a commit to flutter/packages that referenced this pull request Mar 9, 2026
flutter/flutter@d182143...2ec61af

2026-03-08 [email protected] Roll Fuchsia Linux SDK from P9D6h4D2ks99Ct7TO... to giLoee6arX5CRHuRh... (flutter/flutter#183366)
2026-03-07 [email protected] Roll Skia from 6643c1bd93bb to af994ae4d990 (1 revision) (flutter/flutter#183359)
2026-03-07 [email protected] refactor: remove material import from animated_cross_fade, physical_model_test, pinned_header_sliver_test, spell_check_test (flutter/flutter#183234)
2026-03-07 [email protected] Roll Skia from a69ef43650ee to 6643c1bd93bb (13 revisions) (flutter/flutter#183346)
2026-03-07 [email protected] Roll pub packages (flutter/flutter#183344)
2026-03-07 [email protected] Roll Dart SDK from 0c24edc41e09 to 1604910613c7 (2 revisions) (flutter/flutter#183342)
2026-03-07 [email protected] Roll Fuchsia Test Scripts from nR2ESa1Gd8yPcWo06... to R2EllDf4DgBXVNuiN... (flutter/flutter#183341)
2026-03-07 [email protected] Support BGRA textures in BlitCopyTextureToBufferCommandGLES::Encode and fix improper mapping of BGRA to RGBA in blit_command_gles and texture_gles (flutter/flutter#182397)
2026-03-06 [email protected] Roll Fuchsia Linux SDK from 8ay15_eQOEgPHCypm... to P9D6h4D2ks99Ct7TO... (flutter/flutter#183329)
2026-03-06 [email protected] [doc]add discord channel to ios triage meeting (flutter/flutter#183285)
2026-03-06 [email protected] Roll Dart SDK from 7c7c1e3d024d to 0c24edc41e09 (2 revisions) (flutter/flutter#183328)
2026-03-06 [email protected] Revert "Make HCPP upgrading work for vd/tlhc (#181024)" (flutter/flutter#183310)
2026-03-06 [email protected] Add AI contribution guidelines (flutter/flutter#183326)
2026-03-06 [email protected] [Impeller] Do not wait for a frame's acquire fence if the frame was never presented (flutter/flutter#183288)
2026-03-06 [email protected] Add back in accidentally removed line from `create_updated_flutter_deps.py` (flutter/flutter#183314)
2026-03-06 [email protected] Fix typo in README (flutter/flutter#183245)
2026-03-06 [email protected] Roll pub packages (flutter/flutter#183319)
2026-03-06 [email protected] Add displayCornerRadii support to predictive back transitions. (flutter/flutter#181326)
2026-03-06 [email protected] refactor: remove material from widget_inspector_test, sliver_cross_axis_group_test, editable_text_show_on_screen_test, scrollable_fling_test, selection_container_test (flutter/flutter#182702)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-packages
Please CC [email protected],[email protected] on the revert to ensure that a human
is aware of the problem.

To file a bug in Packages: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
github-merge-queue bot pushed a commit that referenced this pull request Mar 10, 2026
…AcquireNextImageKHR (#183338)

Android may use this error code to indicate situations that are
recoverable (such as a temporary failure to dequeue a buffer). This PR
removes this error from the validation log to reduce noise in the logs.

See #183288
xxxOVALxxx pushed a commit to xxxOVALxxx/flutter that referenced this pull request Mar 10, 2026
…ever presented (flutter#183288)

This could happen if a previous call to vkAcquireNextImageKHR failed. If
that occurs, then AcquireNextDrawable will not return a surface. The
caller will not render a frame or call Present, and the fence will never
be signaled.

See internal issue b/488786379
xxxOVALxxx pushed a commit to xxxOVALxxx/flutter that referenced this pull request Mar 10, 2026
…AcquireNextImageKHR (flutter#183338)

Android may use this error code to indicate situations that are
recoverable (such as a temporary failure to dequeue a buffer). This PR
removes this error from the validation log to reduce noise in the logs.

See flutter#183288
@reidbaker reidbaker added cp: beta cherry pick this pull request to beta release candidate branch cp: stable cherry pick this pull request to stable release candidate branch labels Mar 19, 2026
flutteractionsbot pushed a commit to flutteractionsbot/flutter that referenced this pull request Mar 19, 2026
…ever presented (flutter#183288)

This could happen if a previous call to vkAcquireNextImageKHR failed. If
that occurs, then AcquireNextDrawable will not return a surface. The
caller will not render a frame or call Present, and the fence will never
be signaled.

See internal issue b/488786379
flutteractionsbot pushed a commit to flutteractionsbot/flutter that referenced this pull request Mar 19, 2026
…ever presented (flutter#183288)

This could happen if a previous call to vkAcquireNextImageKHR failed. If
that occurs, then AcquireNextDrawable will not return a surface. The
caller will not render a frame or call Present, and the fence will never
be signaled.

See internal issue b/488786379
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cp: beta cherry pick this pull request to beta release candidate branch cp: stable cherry pick this pull request to stable release candidate branch e: impeller Impeller rendering backend issues and features requests engine flutter/engine related. See also e: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants