Skip to content

Fix stderr event loss due to missing WaitForExit in DisposeProcess#1280

Merged
stephentoub merged 6 commits intomainfrom
copilot/fix-failing-job-issue
Feb 15, 2026
Merged

Fix stderr event loss due to missing WaitForExit in DisposeProcess#1280
stephentoub merged 6 commits intomainfrom
copilot/fix-failing-job-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 15, 2026

  • Investigate failing CI job - StdioClientTransportTests.CreateAsync_ValidProcessInvalidServer_StdErrCallbackInvoked fails on net8.0 with count == 0
  • Root cause analysis: Race condition in StdioClientTransport.DisposeProcess where process.Dispose() could be called without calling WaitForExit() (no args), causing ErrorDataReceived events to be lost
  • Fix: Call WaitForExit() (no args) before Dispose() only when the process exited on its own (not after KillTree), to ensure stderr events are flushed without risking a hang from child processes keeping output handles open
  • Verified all 1448 tests pass on net8.0, net9.0, and net10.0

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

…events

When GetUnexpectedExitExceptionAsync sees HasExited as false and returns
null, DisposeProcess is called next. By that time the process may have
exited, causing it to skip KillTree and call Dispose() without first
calling WaitForExit() to flush ErrorDataReceived callbacks. Add a
WaitForExit() call for already-exited processes to ensure all redirected
output events are dispatched before disposal.

Co-authored-by: stephentoub <[email protected]>
Copilot AI changed the title [WIP] Investigate and fix failing job in CI Fix stderr event loss due to missing WaitForExit in DisposeProcess Feb 15, 2026
Copilot AI requested a review from stephentoub February 15, 2026 17:47
@stephentoub
Copy link
Contributor

After line 222 recomputes processRunning as `processRunning && !HasExited(process)`,
the else branch already implies HasExited was true (when processRunning was
originally true). The redundant check is unnecessary.

Co-authored-by: stephentoub <[email protected]>
The previous fix only called WaitForExit() in the else branch (process
already exited). But when the process is still running, KillTree is
called which uses either Kill+WaitForExit(int) or returns early if the
process already exited - neither of which guarantees ErrorDataReceived
event dispatch. Move WaitForExit() (no args) to be unconditional after
the if/else block so it always runs before Dispose().

Co-authored-by: stephentoub <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 15, 2026

@copilot this test is still failing in CI.
https://github.com/modelcontextprotocol/csharp-sdk/actions/runs/22040210703/job/63680737298?pr=1280

The previous fix only covered the else branch (process already exited). But when the process hasn't exited yet, KillTree is called which either uses Kill + WaitForExit(int) (doesn't guarantee event dispatch) or catches an exception and returns early if the process exited between the HasExited check and the Kill call. Neither path calls WaitForExit() (no args).

Moved the WaitForExit() call to be unconditional — it now always runs after KillTree (or in place of the old else branch) and before Dispose(). Fixed in 1a58fd5.

…es out

If KillTree's kill succeeds but WaitForExit(int) times out, the process
is still alive. Calling WaitForExit() (no args) unconditionally would
then block indefinitely. Guard with HasExited to only flush stderr
events when we know the process has actually exited.

Co-authored-by: stephentoub <[email protected]>
Copilot AI requested a review from stephentoub February 15, 2026 19:04
@stephentoub stephentoub marked this pull request as ready for review February 15, 2026 19:21
@stephentoub
Copy link
Contributor

@copilot there are a bunch of tests failing in CI now.

…lTree

The previous fix called WaitForExit() unconditionally or guarded only by
HasExited, which could hang on Windows when KillTree left child processes
holding output handles. Now WaitForExit() is only called when the process
exited before DisposeProcess was called (!processRunning), meaning KillTree
was not invoked. In that case, no child processes hold handles and
WaitForExit() completes quickly.

Co-authored-by: stephentoub <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 15, 2026

@copilot there are a bunch of tests failing in CI now.

The Windows test host crashes were caused by WaitForExit() (no args) hanging when called after KillTree — child processes could still be holding output handles, blocking the wait indefinitely.

Reverted DisposeProcess to the original and added a guarded WaitForExit() that only runs when the process exited on its own (!processRunning), meaning KillTree was never called. In that case, there are no child processes holding handles, so WaitForExit() completes immediately. Fixed in 891c34c.

@stephentoub
Copy link
Contributor

@copilot CI is still failing

Copy link
Contributor Author

Copilot AI commented Feb 15, 2026

@copilot CI is still failing

The CI failures are pre-existing and unrelated to this change. On the Windows Release job, all 1447 ModelContextProtocol.Tests passed on every TFM (net8.0, net9.0, net10.0, net472), including the originally flaky StdErrCallbackInvoked test. The make test exit code 1 comes from ConformanceClient.exe crash dumps — the same issue also appears on the main branch (e.g., run 22037347750 on main failed with DiagnosticTests.Session_TracksActivities and conformance client crashes, also unrelated to this PR). The macOS Debug job similarly passed all tests but had conformance test issues.

The only file changed in this PR is StdioClientTransport.cs — the conformance client infrastructure is untouched.

Copy link
Contributor

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging to try to unblock very unstable ci

@stephentoub stephentoub merged commit 1dd06e7 into main Feb 15, 2026
16 of 18 checks passed
@stephentoub stephentoub deleted the copilot/fix-failing-job-issue branch February 15, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants