Detect and cleanup leaky processes by dnfield · Pull Request #29196 · flutter/flutter

dnfield · 2019-03-12T01:41:10Z

Description

Today, if a devicelab test leaks a dart process, it isnt' marked as a failure. If it's on Windows, this will cause the subsequent tasks on the bot to fail because they can't clean up files that the dart.exe process is holding.

This PR will check that we don't leak dart processes - if we do, we'll mark the task as failed and kill any dart processes we didn't have to begin with. This will allow subsequent runs to run successfully if they don't leak.

Related Issues

#29141

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]). This will ensure a smooth and quick review process.

I read the Contributor Guide and followed the process outlined there for submitting PRs.
My PR includes tests for all changed/updated/fixed behaviors (See Test Coverage).
All existing and new tests are passing.
I updated/added relevant documentation (doc comments with ///).
The analyzer (flutter analyze --flutter-repo) does not report any problems on my PR.
I read and followed the Flutter Style Guide, including Features we expect every widget to implement.
I signed the CLA.
I am willing to follow-up on review comments in a timely manner.

Breaking Change

Does your PR require Flutter developers to manually update their apps to accommodate your change?

Yes, this is a breaking change (Please read Handling breaking changes). - but only in a good way and it doesn't impact the framework 😉
No, this is not a breaking change.

jonahwilliams

Its possible we have several tests that are leaking processes and/or occasionally leaking processes. To make the process of rolling this out somewhat smoother (especially in the event that these are hard to find) we might want to consider some sort of soft-break initially. I'm not sure if that is possible to do though without updating the agent logic.

jonahwilliams · 2019-03-12T02:06:03Z

dev/devicelab/lib/framework/framework.dart

+      final Set<RunningProcessInfo> beforeRunningDartInstances = await getRunningProcesses(
+        processName: 'dart$exe',
+      ).toSet();
+      beforeRunningDartInstances.forEach(print);


Do these prints end up in the devicelab logs? If so we might want to only print on a failure, and in that case make sure we print both sets together so it is easier to found.

They do. My thought is that this is useful information if anything goes wrong and we need to debug something that's hard to reproduce locally.

dev/devicelab/lib/framework/framework.dart

jonahwilliams · 2019-03-12T02:06:46Z

dev/devicelab/lib/framework/framework.dart

+      ).toList();
+      afterRunningDartInstances.forEach(print);
+      for(final RunningProcessInfo info in afterRunningDartInstances) {
+        if (!beforeRunningDartInstances.contains(info)) {


Above prints seem redundant with this

This one is printing out exactly why we thought we had a matching process. Maybe I'm being paranoid, but this should only add ~5-10 lines extra to the logs.

dev/devicelab/lib/framework/running_processes.dart

dnfield · 2019-03-12T03:30:57Z

I'd be game for a softer failure but we don't really have facilities for that.

This issue is already causing failures in the devicelab on Windows - the main difference here is that this would help us not need manual intervention to continue testing.

Maybe we should only make it fail on Windows for now?

dnfield · 2019-03-12T21:06:42Z

Trying to determine if the devicelab can use an existing orange status for this without needing to reimage the whole thing

jonahwilliams · 2019-03-22T18:32:14Z

dev/devicelab/lib/framework/framework.dart

 }
+
+class TaskResultCheckProcesses extends TaskResult {
+  TaskResultCheckProcesses() : super.success(null);


What does passing null here do?

Doesn't set any data in the task result - leaves it empty. Other callers already use TaskResult.success(null) if they aren't doing a benchmark and have no other data to report.

dev/devicelab/manifest.yaml

jonahwilliams · 2019-03-22T18:38:14Z

dev/devicelab/lib/tasks/run_without_leak.dart

+        stderrDone.complete();
+      });
+
+      await Future.wait<void>(


Is there any sort of timeout process in the devicelab? I worry that if we change "] For a more detailed help message, press "h". To detach, press "d"; to quit, press "q"'" we're gonna get some odd failures.

15 minutes.

I'm not sure of a better line to look for - I'm open to suggestions, but we could also just catch this failure and update it worst case.

As long as it times out eventually this is good for now

jonahwilliams

LGTM modulo question about help message fragility

Detect and cleanup leaky processes

527f47c

dnfield added a: tests "flutter test", flutter_test, or one of our tests team: flakes labels Mar 12, 2019

dnfield requested a review from jonahwilliams March 12, 2019 01:41

googlebot added the cla: yes label Mar 12, 2019

jonahwilliams reviewed Mar 12, 2019

View reviewed changes

review

1858e7b

dnfield added 4 commits March 12, 2019 08:57

fix windows?

88bafa3

fix powershell

2731ffe

linux, windows

1eb2fcf

testing

0ff8def

dnfield force-pushed the device_lab_leaky_processes branch 2 times, most recently from 6534a71 to cebc03f Compare March 12, 2019 20:41

check ps

6a4b980

dnfield force-pushed the device_lab_leaky_processes branch from cebc03f to 6a4b980 Compare March 12, 2019 20:46

dnfield added 2 commits March 12, 2019 14:02

Only run ps if available, do not mark test as failed for now

106479e

Only run ps if available, do not mark test as failed for now

97def28

dnfield added 4 commits March 12, 2019 14:13

analysis issue

34e1c4d

fix tests!

7027c3e

LeakedDartProcesses in data

f608a73

add tests

7ba3b90

dnfield requested a review from jonahwilliams March 19, 2019 21:19

jonahwilliams reviewed Mar 22, 2019

View reviewed changes

dev/devicelab/manifest.yaml Outdated Show resolved Hide resolved

jonahwilliams reviewed Mar 22, 2019

View reviewed changes

jonahwilliams approved these changes Mar 22, 2019

View reviewed changes

newline

4b36493

dnfield merged commit ecfdd7e into flutter:master Mar 22, 2019

dnfield deleted the device_lab_leaky_processes branch March 22, 2019 21:32

dnfield mentioned this pull request Mar 26, 2019

Avoid overwriting task result for non-leak checkers #29989

Merged

10 tasks

github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2021

Conversation

dnfield commented Mar 12, 2019

Description

Related Issues

Checklist

Breaking Change

Uh oh!

jonahwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dnfield commented Mar 12, 2019

Uh oh!

dnfield commented Mar 12, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonahwilliams left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants