I'd like to suggest to implement tail-call dispatch in QuickJS. Quick demo of what it is.
This got recently done to CPython with great success, +5-10% performance. I've slapped together a wip patch for QuickJS and here are some preliminary results (Debian 13 arm64 VM on Mac M4 with clang 19.1.7, median of 10 runs):
| Test |
b226856 |
2dcc05b1 |
% |
p_welch |
| Richards |
1799.5 |
1968 |
+9.36% |
0.0000* |
| DeltaBlue |
1872.5 |
1836 |
-1.95% |
0.0000* |
| Crypto |
2033 |
2472 |
+21.59% |
0.0288* |
| RayTrace |
3408.5 |
3652 |
+7.14% |
0.0000* |
| EarleyBoyer |
4052 |
4257.5 |
+5.07% |
0.5588 |
| RegExp |
1012 |
1023 |
+1.09% |
0.5063 |
| Splay |
5693.5 |
5813 |
+2.10% |
0.8515 |
| SplayLatency |
19752 |
20219 |
+2.36% |
0.0019* |
| NavierStokes |
3730 |
5150.5 |
+38.08% |
0.0000* |
| Geomean |
3247 |
3534 |
+8.84% |
|
The diff is somewhat large (914+ 659-), but the bulk of it are harmless formatting changes to make CASE blocks less interdependent and splittable into separate functions. Beside tail call dispatch, being able to split them up like that could be also useful for experimenting with adding some JIT.
I'd like to suggest to implement tail-call dispatch in QuickJS. Quick demo of what it is.
This got recently done to CPython with great success, +5-10% performance. I've slapped together a wip patch for QuickJS and here are some preliminary results (Debian 13 arm64 VM on Mac M4 with clang 19.1.7, median of 10 runs):
2dcc05b1The diff is somewhat large (914+ 659-), but the bulk of it are harmless formatting changes to make CASE blocks less interdependent and splittable into separate functions. Beside tail call dispatch, being able to split them up like that could be also useful for experimenting with adding some JIT.