Skip to content

Fix race in BackgroundService exception aggregation during Host shutdown#125590

Merged
svick merged 4 commits intodotnet:mainfrom
danmoseley:fix-bgservice-exception-race
Mar 17, 2026
Merged

Fix race in BackgroundService exception aggregation during Host shutdown#125590
svick merged 4 commits intodotnet:mainfrom
danmoseley:fix-bgservice-exception-race

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Mar 15, 2026

Fixes #125589

Problem

When multiple BackgroundService instances fault with BackgroundServiceExceptionBehavior.StopHost, some exceptions can be silently lost.

In real workloads, multiple BackgroundServices commonly fail together — for example, when a shared dependency like a database or message broker goes down. With this bug, only one of those failures is reported; the rest are silently dropped. This makes production incidents harder to diagnose: operators see one service failed but have no indication that others also failed, leading to incomplete root-cause analysis and potentially missing the actual source of the problem.

The BackgroundServiceExceptionTests.BackgroundService_MultipleExceptions_ThrowsAggregateException test is flaky because of this (observed on osx-arm64 Debug).

Root Cause

In StartAsync, TryExecuteBackgroundServiceAsync is fire-and-forget (_ =). This method awaits the service's ExecuteTask and adds any exception to _backgroundServiceExceptions. During StopAsync, BackgroundService.StopAsync also awaits the same ExecuteTask. When the task faults, both continuations are scheduled on the thread pool. If the StopAsync continuation runs first, Host.StopAsync proceeds to read _backgroundServiceExceptions before the monitoring task has added its exception.

Fix

Store the TryExecuteBackgroundServiceAsync tasks and await them in StopAsync (respecting the shutdown timeout) before reading the exception list.

Verification

The original failure was only observed on macOS arm64 and could not be reproduced directly on Windows. However, injecting a 500ms Task.Delay into TryExecuteBackgroundServiceAsync deterministically simulates the thread-pool scheduling that causes the race, providing high-confidence verification on any platform:

  • Without fix + delay: test fails 10/10
  • With fix + delay: test fails 0/10
  • With fix, no delay: all 291 hosting unit tests pass

TryExecuteBackgroundServiceAsync tasks were fire-and-forget, creating a race
where Host.StopAsync could read _backgroundServiceExceptions before the
monitoring tasks had added their exceptions. When multiple BackgroundServices
fault, this caused some exceptions to be silently lost.

The fix stores the monitoring tasks and awaits them (with shutdown timeout)
in StopAsync before reading the exception list.

Fix dotnet#125589

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings March 15, 2026 22:15
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-extensions-hosting
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the internal Host shutdown path to avoid missing BackgroundService failures due to a race between BackgroundService.StopAsync and the host’s background-service monitoring continuation.

Changes:

  • Track background-service monitoring tasks instead of fire-and-forget.
  • During StopAsync, wait for monitoring tasks to finish recording exceptions before reading and rethrowing them.

Use LazyInitializer.EnsureInitialized + lock, matching the existing
pattern used for _backgroundServiceExceptions.

Co-authored-by: Copilot <[email protected]>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Microsoft.Extensions.Hosting’s internal Host implementation to better surface BackgroundService failures during shutdown by tracking the background-service monitoring tasks and (best-effort) waiting for them to finish before aggregating background exceptions in StopAsync.

Changes:

  • Track TryExecuteBackgroundServiceAsync(...) monitor tasks for each BackgroundService started by the host.
  • During StopAsync, wait for these monitor tasks to complete (or for shutdown cancellation) before reading _backgroundServiceExceptions, reducing a race where exceptions could be missed.

@svick svick requested a review from mrek-msft March 16, 2026 12:14
@svick svick requested review from cincuranet and rosebyte and removed request for mrek-msft March 16, 2026 14:09
@danmoseley danmoseley requested a review from Copilot March 16, 2026 23:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a shutdown-time race in Microsoft.Extensions.Hosting where background-service monitoring exceptions could be missed because the fire-and-forget monitoring task hadn’t yet recorded its exception when Host.StopAsync aggregated exceptions.

Changes:

  • Track background-service monitoring tasks created by TryExecuteBackgroundServiceAsync.
  • During Host.StopAsync, wait for background-service monitoring tasks to finish recording exceptions before aggregating and throwing.
  • Add a regression test that deterministically reproduces the lost-exception window using an overridden ExecuteTask.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/libraries/Microsoft.Extensions.Hosting/src/Internal/Host.cs Stores background-service monitoring tasks and waits (with cancellation support) for them during shutdown before reading exception state.
src/libraries/Microsoft.Extensions.Hosting/tests/UnitTests/BackgroundServiceExceptionTests.cs Adds a regression test and a specialized BackgroundService to reproduce the exception-recording race.

@svick
Copy link
Member

svick commented Mar 17, 2026

/ba-g iOS failures seem to be dotnet/dnceng#6473

@svick svick merged commit c9def27 into dotnet:main Mar 17, 2026
85 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BackgroundService_MultipleExceptions_ThrowsAggregateException is racy

4 participants