-
-
Notifications
You must be signed in to change notification settings - Fork 35.1k
Description
- Version: 6.6.0, x64
- Platform: win10, x64
- Subsystem: v8 (probably)
A little background: I need to parse a lot of text based data. Everything was running smoothly and the code was very fast, until I ran it on a larger amount of data (about 15GB). Around halfway down the computation, the code suddenly started to be slow and sections that should have taken 30s took up to 10 minutes.
The issue seems to be that if the garbage collection is called enough times, it causes some code to run very slowly. I have also tested Node 4.5 and 6.3 of different machines and the issue was the same.
I experimented with the problem and narrowed the problem down to the code that is available here: https://gist.github.com/dsehnal/004f7f46a238ab2e391a62a172b9b19c.
The code includes two slightly different implementations are stripped down versions of the tokenizer of my parser:
- In the original (slow) version of the code I use a "class" (prototype + access the state of the tokenizer via
this). So moving to the next token is done throughtokenizer.moveNext(). - In the updated (fast) version of the code I use a literal object to represent the state and use "normal" functions that I pass the state to explicitly:
moveNext(state).
I have also included code to trigger manual garbage collection to trigger the issue:
if (global.gc && t % 100 === 0) {
global.gc();
}When I run the code without the manual GC, both approaches perform the same:
> node test slow
7 runs: min 125ms, max 129ms, median 127ms
> node test fast
7 runs: min 129ms, max 133ms, median 132ms
However, when I enable manual garbage collection (via --expose_gc), the "slow" code is more than 2 times slower:
> node --expose_gc test slow
7 runs: min 574ms, max 629ms, median 611ms
> node --expose_gc test fast
7 runs: min 257ms, max 296ms, median 276ms