I have a benchmark here, in which the order of the tests seems to completely change the results.
That is, if the order is like this, sigma:defer wins by about 10-15% ...
add('sigma:defer', () => parseSigmaDefer(SAMPLE)),
add('sigma:grammar', () => parseSigmaGrammar(SAMPLE)),
add('parjs', () => parseParjs(SAMPLE)),
Whereas, if the order is like this, sigma:grammar wins by the same 10-15% ...
add('sigma:grammar', () => parseSigmaGrammar(SAMPLE)),
add('sigma:defer', () => parseSigmaDefer(SAMPLE)),
add('parjs', () => parseParjs(SAMPLE)),
So it would appear whatever runs first just wins.
I tried tweaking all the options as well, minimums, delay, etc. - nothing changes.
I wonder if benchmark is still reliable after 6 years and no updates? It's benchmark method was first described 13 years ago - a lot of water under the bridge since then, I'm sure?
To start with, I'd expect benchmarks should run in dedicated Workers, which I don't think existed back then?
Even then, they probably shouldn't run one after the other (111122223333) but rather round-robin (123123123123) or perhaps even randomly, to make sure they all get equally affected by the garbage collector, run-time optimizations, and other side-effects? Ideally, they probably shouldn't even run in the same process though.
I have a benchmark here, in which the order of the tests seems to completely change the results.
That is, if the order is like this,
sigma:deferwins by about 10-15% ...Whereas, if the order is like this,
sigma:grammarwins by the same 10-15% ...So it would appear whatever runs first just wins.
I tried tweaking all the options as well, minimums, delay, etc. - nothing changes.
I wonder if
benchmarkis still reliable after 6 years and no updates? It's benchmark method was first described 13 years ago - a lot of water under the bridge since then, I'm sure?To start with, I'd expect benchmarks should run in dedicated Workers, which I don't think existed back then?
Even then, they probably shouldn't run one after the other (111122223333) but rather round-robin (123123123123) or perhaps even randomly, to make sure they all get equally affected by the garbage collector, run-time optimizations, and other side-effects? Ideally, they probably shouldn't even run in the same process though.