Skip to content

Work around the glibc bug that causes rare Chrome crashes#67466

Merged
yjbanov merged 13 commits intoflutter:masterfrom
yjbanov:premature-chrome-exit
Oct 8, 2020
Merged

Work around the glibc bug that causes rare Chrome crashes#67466
yjbanov merged 13 commits intoflutter:masterfrom
yjbanov:premature-chrome-exit

Conversation

@yjbanov
Copy link
Contributor

@yjbanov yjbanov commented Oct 6, 2020

Description

Due to https://sourceware.org/bugzilla/show_bug.cgi?id=19329 once every few thousands times Chrome fails to launch, crashing with exit code 127.

This PR works around that issue by checking for an error message specific to that bug and relaunching Chrome. With this change a test run that launched Chrome 18000 times was 100% green, when previously that many attempts would be 100% red.

Other changes:

  • When in verbose mode print the version of Chromium.
  • Change try_builders.json to always run tests whenever anything under packages/ changes.

Related Issues

Fixes #67454

@flutter-dashboard flutter-dashboard bot added tool Affects the "flutter" command-line tool. See also t: labels. work in progress; do not review labels Oct 6, 2020
@google-cla google-cla bot added the cla: yes label Oct 6, 2020
@yjbanov yjbanov changed the title WIP: watch premature chrome exits Work around the glibc bug that causes rare Chrome crashes Oct 7, 2020
@flutter-dashboard flutter-dashboard bot added the c: contributor-productivity Team-specific productivity, code health, technical debt. label Oct 7, 2020
@yjbanov yjbanov requested a review from jonahwilliams October 7, 2020 19:31
@yjbanov yjbanov marked this pull request as ready for review October 7, 2020 19:31
@yjbanov yjbanov requested a review from godofredoc October 7, 2020 19:33
"task_name":"linux_web_tests",
"enabled":true,
"run_if":["dev/", "packages/flutter/", "packages/flutter_goldens_client/", "packages/flutter_goldens/", "packages/flutter_test/", "packages/flutter_tools/lib/src/test/", "packages/flutter_web_plugins/", "bin/"]
"run_if":["dev/", "packages/", "bin/"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the definition of "right" :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I read this as "dev/packages/bin", lol, disregard


Future<Process> _spawnChromiumProcess(List<String> args) async {
// Keep attempting to launch the browser until one of:
// - We successfully launched it, in which case we just return from the loop.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go/no-we

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.transform(const LineSplitter())
.map((String line) {
_logger.printTrace('[CHROME]:$line');
if (line.contains('Inconsistency detected by ld.so')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider hoisting this into a constant, along with the documentation you added above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

'Will try launching browser again.',
);
return null;
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else isn't necessary due to return above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// A precaution that avoids accumulating browser processes, in case the
// glibc bug doesn't cause the browser to quit and we keep looping and
// launching more processes.
process.kill();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this is successful. What happens if the tool fails to kill it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the tool normally do this check? I can't find an example. I'm not sure what to do if it fails to kill it. I added it only as a precaution. I think it's OK if we fail. That would probably indicate that the system is in a completely busted and unrecoverable state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any advice :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the exited flag, it's a little more idiomatic to put a timeout out process.exitCode, and then do process.kill() if it doesn't finish in time. Following sending the kill, if stdout might be useful for debugging, you'll want to explicitly drain that stream.


if (!hitGlibcBug) {
return process;
} else if (!exited) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!exited) { on the next line instead of else if

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

);
return null;
} else {
throw ToolExit('Failed to spawn: ${args.join(' ')}');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwToolExit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I would stick with a more actionable message here, the args themselves is more like debugging info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nit

final Process process = await _processManager.start(args);

bool exited = false;
unawaited(process.exitCode.then((int exitCode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you don't use the exitCode, use whenComplete(() { ... })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return null;
}
_logger.printTrace('Failed to launch browser. Command used to launch it: ${args.join(' ')}');
throw ToolExit(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throwToolExit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't use throwToolExit because the function must end with a return or throw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it help if we made throwToolExit return Never?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do that until we migrate to null safety

})
.firstWhere((String line) => line.startsWith('DevTools listening'), orElse: () {
if (hitGlibcBug) {
_logger.printStatus(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user probably can't do too much with this message, so maybe it should just be a printTrace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// A precaution that avoids accumulating browser processes, in case the
// glibc bug doesn't cause the browser to quit and we keep looping and
// launching more processes.
process.kill();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using the exited flag, it's a little more idiomatic to put a timeout out process.exitCode, and then do process.kill() if it doesn't finish in time. Following sending the kill, if stdout might be useful for debugging, you'll want to explicitly drain that stream.

// Wait until the DevTools are listening before trying to connect. This is
// only required for flutter_test --platform=chrome and not flutter run.
bool hitGlibcBug = false;
await process.stderr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can throw dart:io exceptions if the process fails in certain ways/at certain times. Ditto for process.stdout which is another reason to do .listen().asFuture() and await the Future in a try {} catch {} for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like a good future enhancement. I'd prefer to wait until it actually happens, so we can recreate the exact situation in a test. I ran this code 18000 times and it worked fine.

@yjbanov
Copy link
Contributor Author

yjbanov commented Oct 7, 2020

@zanderso

Instead of using the exited flag, it's a little more idiomatic to put a timeout out process.exitCode, and then do process.kill() if it doesn't finish in time. Following sending the kill, if stdout might be useful for debugging, you'll want to explicitly drain that stream.

Done.

@yjbanov
Copy link
Contributor Author

yjbanov commented Oct 8, 2020

Landing on stuck "Google testing". Google is using ddr instead of flutter tools for this anyway.

@yjbanov yjbanov merged commit 0b78110 into flutter:master Oct 8, 2020
// glibc bug doesn't cause the browser to quit and we keep looping and
// launching more processes.
unawaited(process.exitCode.timeout(const Duration(seconds: 1), onTimeout: () {
process.kill();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drive by comment: if this code ever has to run on windows, it might fail to kill the process and fail to block execution, leading to accumulation of processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a good way to do this on Windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't one unfortunately. As far as I can tell, Dart does not offer a way to reliably kill processes on windows - if you want to be really paranoid about it you can try to sleep/poll for when the process actually dies, but if the process is really stuck (like, you have to reboot the machine stuck), that might never end.

Windows doesn't quite have a "kill -9" idea.

@yjbanov
Copy link
Contributor Author

yjbanov commented Oct 14, 2020

To follow-up. This fix seems to have been effective:

Screen Shot 2020-10-14 at 9 55 35 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c: contributor-productivity Team-specific productivity, code health, technical debt. tool Affects the "flutter" command-line tool. See also t: labels.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Web test timing out from after two hours

5 participants