Managed fixes for async task tracking#123727
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds managed fixes for async task tracking by introducing tick count tracking for runtime async tasks and continuations to support debugging scenarios.
Changes:
- Adds two new ConcurrentDictionary fields to track task and continuation tick counts for debugging
- Implements methods to set, get, update, and remove tick count tracking for both tasks and continuations
- Integrates tick count tracking into the continuation allocation and dispatch flow
- Adds TPL event source tracing for synchronous work begin/end and operation begin/end events
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs | Adds static dictionaries and helper methods for tracking task and continuation tick counts, removes unnecessary blank line |
| src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs | Integrates tick count tracking into continuation lifecycle, adds ContinuationEqualityComparer, extends AsyncDispatcherInfo with Task field, adds TPL event source tracing |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
Added comments to clarify the purpose of the Task field.
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Outdated
Show resolved
Hide resolved
...raries/System.Runtime/tests/System.Threading.Tasks.Tests/System.Threading.Tasks.Tests.csproj
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
|
It failed on Mono legs only since we pass different options to the compiler in Mono builds. It is still a bug. Roslyn should not crash. Opened dotnet/roslyn#82397 (repros on vanilla .NET 11 P1 SDK). |
src/coreclr/System.Private.CoreLib/src/System/Runtime/CompilerServices/AsyncHelpers.CoreCLR.cs
Show resolved
Hide resolved
This PR includes a few additions to the runtime-async handling loop to facilitate inspection by a debugger - specifically, Visual Studio. 1. Timestamp tracking for continuations. In VS there is a feature that lets you see, while F5 debugging, the relative start times and durations of different tasks. There is only one actual Task at the root of an await chain, but we intend to replicate the outward behavior in runtime-async by treating Continuations as "virtual Tasks". As such, we populate a dictionary to track the start times for each logical invocation. 2. TPL events. In Concord in asyncv1 we are notified of Task status changes via TPL events; specifically, we use these events to detect which Tasks are currently active and which thread each Task runs on. There are ways to do this without TPL in Concord; however, I believe a change to this would be across different (non-runtime async) Tasks and beyond the scope of this PR. --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Jan Kotas <[email protected]> Co-authored-by: Steve Pfister <[email protected]>
|
I did some measurements and this regressed suspension/resumption performance by about 7%: using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Threading.Tasks;
namespace OSRPerf;
public class Program
{
static void Main()
{
NullAwaiter na = new NullAwaiter();
Task t = Foo(10_000_000, na);
while (!t.IsCompleted)
{
na.Continue();
}
}
static int s_value;
static async Task Foo(int n, NullAwaiter na)
{
for (int i = 0; i < n; i++)
{
s_value += i;
}
Stopwatch timer = Stopwatch.StartNew();
for (int i = 0; i < 10_000_000; i++)
{
await na;
}
Console.WriteLine("Took {0:F1} ms", timer.Elapsed.TotalMilliseconds);
}
private class NullAwaiter : ICriticalNotifyCompletion
{
public Action Continue;
public NullAwaiter GetAwaiter() => this;
public bool IsCompleted => false;
public void GetResult()
{
}
public void UnsafeOnCompleted(Action continuation)
{
Continue = continuation;
}
public void OnCompleted(Action continuation)
{
throw new NotImplementedException();
}
}
}Before: 792.9 ms Not trying to say anything by this, just keeping track of some of the papercuts. |
|
I will add some additional instrumentation into the dispatcher loop as well, mainly to track activity for an optimized "TPL-Lite" event source, but will make sure to combine all checks into simple conditions that will be highly predictable inside the hot path, but since we will need "instrumentation" in this very hot loop an alternative would be to duplicate the DispatchContinuations with added profiling/eventing checks, then the only change to the current DispatchContinuations would be an highly predictable branch at the start of the method and potentially a check when a continuation complete if the flag changed, and if that happens hit a cold path that would handle next continuation as suspended, breaking out of DispatchContinuations. That would at least make sure we won't get perf regressions inside the DispatchContinuations method at the cost of duplicating that code, ~100 lines. I assume this method won't change that much going forward, so maybe it's acceptable to have the duplication to protect the performance of the very hot dispatcher loop going forward. Having two versions will also make sure they get optimized differently by the JIT, since the profiled version will have a very different execution flow compared to the regular one. @jakobbotsch, thoughts? |
|
Found a way to put this into a close to zero cost abstraction, so instead of duplicating the code there will be instrumentation points in the dispatcher loop, using generics + struct to produce two native version of the dispatch loop method, where the one without profiling support will optimize away all the instrumentation points producing a native version without instrumentation, while the instrumented version will get optimized codegen based on profiling scenarios. I can take a look if some of the changes done into the dispatcher loop in this PR could be refactored into the same pattern, reducing the impact of the change. I will also run the sample provided in this PR above to measure overhead. |
I like the idea, please do feel free to open a PR to try that out. I think it would make it close to zero cost in performance, at the cost of some size, since there is a dispatcher for every |
|
I implemented the generic valuetype version of the DispatchContinuations loop that includes capabilities to "upgrade" to instrumented version on entry and after continuation completion. I optimized as much as possible on what gets left in the none instrumented version of DispatchContinuations and looks like I recovered most of the perf lost in this PR. I compared the generated assembler using
I have most prepared, need to look at some tests when enabling the instrumentation + code size when enabled, but it will probably land close to the old metrics above for that version of the dispatch loop. I will build on this to add more instrumentation logic for async profiling, but now all will happen in the instrumented version of the DispatchContinuations loop, not affecting the performance of none instrumented scenarios. |
This PR includes a few additions to the runtime-async handling loop to facilitate inspection by a debugger - specifically, Visual Studio.
Timestamp tracking for continuations. In VS there is a feature that lets you see, while F5 debugging, the relative start times and durations of different tasks. There is only one actual Task at the root of an await chain, but we intend to replicate the outward behavior in runtime-async by treating Continuations as "virtual Tasks". As such, we populate a dictionary to track the start times for each logical invocation.
TPL events. In Concord in asyncv1 we are notified of Task status changes via TPL events; specifically, we use these events to detect which Tasks are currently active and which thread each Task runs on. There are ways to do this without TPL in Concord; however, I believe a change to this would be across different (non-runtime async) Tasks and beyond the scope of this PR.