Work around the glibc bug that causes rare Chrome crashes#67466
Work around the glibc bug that causes rare Chrome crashes#67466yjbanov merged 13 commits intoflutter:masterfrom
Conversation
| "task_name":"linux_web_tests", | ||
| "enabled":true, | ||
| "run_if":["dev/", "packages/flutter/", "packages/flutter_goldens_client/", "packages/flutter_goldens/", "packages/flutter_test/", "packages/flutter_tools/lib/src/test/", "packages/flutter_web_plugins/", "bin/"] | ||
| "run_if":["dev/", "packages/", "bin/"] |
There was a problem hiding this comment.
Depends on the definition of "right" :)
There was a problem hiding this comment.
ahh, I read this as "dev/packages/bin", lol, disregard
|
|
||
| Future<Process> _spawnChromiumProcess(List<String> args) async { | ||
| // Keep attempting to launch the browser until one of: | ||
| // - We successfully launched it, in which case we just return from the loop. |
| .transform(const LineSplitter()) | ||
| .map((String line) { | ||
| _logger.printTrace('[CHROME]:$line'); | ||
| if (line.contains('Inconsistency detected by ld.so')) { |
There was a problem hiding this comment.
Consider hoisting this into a constant, along with the documentation you added above.
| 'Will try launching browser again.', | ||
| ); | ||
| return null; | ||
| } else { |
There was a problem hiding this comment.
else isn't necessary due to return above
| // A precaution that avoids accumulating browser processes, in case the | ||
| // glibc bug doesn't cause the browser to quit and we keep looping and | ||
| // launching more processes. | ||
| process.kill(); |
There was a problem hiding this comment.
Check if this is successful. What happens if the tool fails to kill it?
There was a problem hiding this comment.
How does the tool normally do this check? I can't find an example. I'm not sure what to do if it fails to kill it. I added it only as a precaution. I think it's OK if we fail. That would probably indicate that the system is in a completely busted and unrecoverable state.
There was a problem hiding this comment.
I don't have any advice :)
There was a problem hiding this comment.
Instead of using the exited flag, it's a little more idiomatic to put a timeout out process.exitCode, and then do process.kill() if it doesn't finish in time. Following sending the kill, if stdout might be useful for debugging, you'll want to explicitly drain that stream.
|
|
||
| if (!hitGlibcBug) { | ||
| return process; | ||
| } else if (!exited) { |
There was a problem hiding this comment.
if (!exited) { on the next line instead of else if
| ); | ||
| return null; | ||
| } else { | ||
| throw ToolExit('Failed to spawn: ${args.join(' ')}'); |
There was a problem hiding this comment.
Generally I would stick with a more actionable message here, the args themselves is more like debugging info.
| final Process process = await _processManager.start(args); | ||
|
|
||
| bool exited = false; | ||
| unawaited(process.exitCode.then((int exitCode) { |
There was a problem hiding this comment.
since you don't use the exitCode, use whenComplete(() { ... })
| return null; | ||
| } | ||
| _logger.printTrace('Failed to launch browser. Command used to launch it: ${args.join(' ')}'); | ||
| throw ToolExit( |
There was a problem hiding this comment.
Can't use throwToolExit because the function must end with a return or throw.
There was a problem hiding this comment.
Would it help if we made throwToolExit return Never?
There was a problem hiding this comment.
We cannot do that until we migrate to null safety
| }) | ||
| .firstWhere((String line) => line.startsWith('DevTools listening'), orElse: () { | ||
| if (hitGlibcBug) { | ||
| _logger.printStatus( |
There was a problem hiding this comment.
The user probably can't do too much with this message, so maybe it should just be a printTrace.
| // A precaution that avoids accumulating browser processes, in case the | ||
| // glibc bug doesn't cause the browser to quit and we keep looping and | ||
| // launching more processes. | ||
| process.kill(); |
There was a problem hiding this comment.
Instead of using the exited flag, it's a little more idiomatic to put a timeout out process.exitCode, and then do process.kill() if it doesn't finish in time. Following sending the kill, if stdout might be useful for debugging, you'll want to explicitly drain that stream.
| // Wait until the DevTools are listening before trying to connect. This is | ||
| // only required for flutter_test --platform=chrome and not flutter run. | ||
| bool hitGlibcBug = false; | ||
| await process.stderr |
There was a problem hiding this comment.
This can throw dart:io exceptions if the process fails in certain ways/at certain times. Ditto for process.stdout which is another reason to do .listen().asFuture() and await the Future in a try {} catch {} for it.
There was a problem hiding this comment.
This sounds like a good future enhancement. I'd prefer to wait until it actually happens, so we can recreate the exact situation in a test. I ran this code 18000 times and it worked fine.
Done. |
|
Landing on stuck "Google testing". Google is using ddr instead of flutter tools for this anyway. |
| // glibc bug doesn't cause the browser to quit and we keep looping and | ||
| // launching more processes. | ||
| unawaited(process.exitCode.timeout(const Duration(seconds: 1), onTimeout: () { | ||
| process.kill(); |
There was a problem hiding this comment.
drive by comment: if this code ever has to run on windows, it might fail to kill the process and fail to block execution, leading to accumulation of processes.
There was a problem hiding this comment.
What's a good way to do this on Windows?
There was a problem hiding this comment.
There isn't one unfortunately. As far as I can tell, Dart does not offer a way to reliably kill processes on windows - if you want to be really paranoid about it you can try to sleep/poll for when the process actually dies, but if the process is really stuck (like, you have to reboot the machine stuck), that might never end.
Windows doesn't quite have a "kill -9" idea.

Description
Due to https://sourceware.org/bugzilla/show_bug.cgi?id=19329 once every few thousands times Chrome fails to launch, crashing with exit code 127.
This PR works around that issue by checking for an error message specific to that bug and relaunching Chrome. With this change a test run that launched Chrome 18000 times was 100% green, when previously that many attempts would be 100% red.
Other changes:
try_builders.jsonto always run tests whenever anything underpackages/changes.Related Issues
Fixes #67454