Skip to content

Speed up Websock receive queue reads via DataView#2024

Open
PekingSpades wants to merge 4 commits intonovnc:masterfrom
PekingSpades:master
Open

Speed up Websock receive queue reads via DataView#2024
PekingSpades wants to merge 4 commits intonovnc:masterfrom
PekingSpades:master

Conversation

@PekingSpades
Copy link
Copy Markdown

Summary

  • Replace the byte-by-byte addition in core/websock.js:_rQshift() with a DataView-backed fast path for 1/2/4 byte reads to cut CPU time in the receive queue.
  • Maintain a cached DataView whenever the receive queue buffer is allocated or resized so the optimized path is always available.
  • Capture and share reproducible browser benchmarks that highlight the performance win across different engines and machines.

Performance Summary

Average speed-up = mean reduction in the 1/2/4-byte benchmark cases (higher is better).

Speed-up (% faster)
                0        10       20       30       40       50
                |--------|--------|--------|--------|--------|
Chrome   45.2%  █████████████████████████████████████████
Edge     40.9%  █████████████████████████████████████
Firefox  29.9%  ███████████████████████████
Safari   43.5%  ███████████████████████████████████████

Browser / Platform Avg speed-up
Windows Chrome 142 43.6% faster
Windows Chrome 142 (Machine 2) 46.4% faster
Windows Chrome 101 41.7% faster
Windows Chrome 92.0 45.6% faster
Windows Chrome 83.0 44.7% faster
Windows Chrome 71.0 49.0% faster
Windows Edge 142 46.1% faster
Windows Edge 142 (Machine 2) 35.6% faster
Windows Firefox 113 31.6% faster
Windows Firefox 142 35.7% faster
Windows Firefox 145.0 22.5% faster
Safari 18 43.5% faster

Testing

  • Manual benchmark – Windows 10, Chrome 142 (20 logical cores) ✅
  • Manual benchmark – Windows 10, Chrome 142 (dual-core machine) ✅
  • Manual benchmark – Windows 10, Chrome 101/92/83/71 ✅
  • Manual benchmark – Windows 10, Edge 142 (two hardware profiles) ✅
  • Manual benchmark – Windows 10, Firefox 113/142/145 ✅
  • Manual benchmark – macOS 10.15, Safari 18.6 ✅

Benchmark Results

Windows Chrome 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
Platform Win32 HW concurrency 20
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 183.7 Used JS heap (MB) 176.1
Performance timeOrigin 1763268027246.2
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 205.920 192.900 249.300
1 DataView 10 164.550 156.800 201.300 🏆
2 loop 10 179.260 177.000 181.200
2 DataView 10 99.260 96.000 118.300 🏆
4 loop 10 184.910 181.100 197.600
4 DataView 10 62.880 60.700 73.600 🏆

Windows Chrome 142(Machine 2)

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 75.5 Used JS heap (MB) 72.3
Performance timeOrigin 1763455005177.6
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 366.130 353.100 424.600
1 DataView 10 235.270 226.700 275.500 🏆
2 loop 10 215.860 206.400 279.500
2 DataView 10 136.190 129.600 166.100 🏆
4 loop 10 261.090 238.400 290.300
4 DataView 10 87.640 76.900 96.700 🏆

Windows Chrome 101

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.0.0 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 73.5 Used JS heap (MB) 71.3
Performance timeOrigin 1763454953935.7
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 292.310 270.600 343.900
1 DataView 10 243.210 228.300 273.700 🏆
2 loop 10 227.190 216.700 290.900
2 DataView 10 127.550 123.700 147.800 🏆
4 loop 10 238.620 233.300 264.500
4 DataView 10 85.110 81.900 98.400 🏆

Windows Chrome 92.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 72.2 Used JS heap (MB) 69.8
Performance timeOrigin 1763454862654.8
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 332.680 310.400 374.800
1 DataView 10 239.060 226.800 267.200 🏆
2 loop 10 224.740 217.400 248.000
2 DataView 10 127.160 122.800 154.100 🏆
4 loop 10 241.880 230.600 282.900
4 DataView 10 83.860 75.000 113.400 🏆

Windows Chrome 83.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 3585.8
Total JS heap (MB) 73.1 Used JS heap (MB) 68.9
Performance timeOrigin 1763454793849.9363
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 372.971 349.125 447.595
1 DataView 10 288.493 251.800 475.025 🏆
2 loop 10 269.894 263.340 302.725
2 DataView 10 148.638 144.555 166.145 🏆
4 loop 10 267.037 245.535 303.445
4 DataView 10 89.074 82.745 119.115 🏆

Windows Chrome 71.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.44 Safari/537.36
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, zh Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 2222.1
Total JS heap (MB) 9.5 Used JS heap (MB) 9.5
Performance timeOrigin 1763454480039.055
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 268.720 224.600 449.000
1 DataView 10 218.370 210.400 236.700 🏆
2 loop 10 318.890 229.300 422.500
2 DataView 10 151.140 113.800 217.400 🏆
4 loop 10 259.290 229.800 318.600
4 DataView 10 63.390 57.000 81.200 🏆

Windows Edge 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0
Platform Win32 HW concurrency 16
Device memory (GB) 8 Language en
Languages en, zh-CN, en-GB, en-US Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 123.7 Used JS heap (MB) 114.7
Performance timeOrigin 1763453859070.7
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 281.940 267.900 343.200
1 DataView 10 213.270 202.200 268.100 🏆
2 loop 10 228.380 226.100 241.200
2 DataView 10 126.480 123.100 155.000 🏆
4 loop 10 249.870 246.300 264.600
4 DataView 10 76.660 74.500 88.000 🏆

Windows Edge 142(Machine 2)

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0
Platform Win32 HW concurrency 2
Device memory (GB) 8 Language zh-CN
Languages zh-CN, en, en-GB, en-US Screen resolution 2560x1440
Screen pixel depth 24 JS heap size limit (MB) 4095.8
Total JS heap (MB) 93.1 Used JS heap (MB) 89.2
Performance timeOrigin 1763432982331
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 281.980 249.200 366.600
1 DataView 10 245.630 233.200 272.700 🏆
2 loop 10 260.800 207.600 384.000
2 DataView 10 211.140 135.500 398.800 🏆
4 loop 10 868.830 295.300 3937.300
4 DataView 10 219.100 90.300 735.600 🏆

Windows Firefox 113

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763454236144
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 679.600 611.000 918.000
1 DataView 10 656.000 621.000 747.000 🏆
2 loop 10 500.000 487.000 519.000
2 DataView 10 364.100 342.000 430.000 🏆
4 loop 10 893.800 590.000 1707.000
4 DataView 10 319.800 244.000 506.000 🏆

Windows Firefox 142

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:142.0) Gecko/20100101 Firefox/142.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763454368830
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 1069.500 547.000 2812.000
1 DataView 10 614.900 532.000 700.000 🏆
2 loop 10 457.100 386.000 562.000
2 DataView 10 378.400 327.000 550.000 🏆
4 loop 10 498.000 244.000 2302.000
4 DataView 10 261.600 196.000 591.000 🏆

Windows Firefox 145.0

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:145.0) Gecko/20100101 Firefox/145.0
Platform Win32 HW concurrency 2
Language zh-CN Languages zh-CN, zh, zh-TW, zh-HK, en-US, en
Screen resolution 2560x1440 Screen pixel depth 24
Performance timeOrigin 1763455136427
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 601.100 541.000 749.000
1 DataView 10 503.900 459.000 646.000 🏆
2 loop 10 426.900 346.000 525.000
2 DataView 10 327.800 268.000 377.000 🏆
4 loop 10 256.500 233.000 310.000
4 DataView 10 184.100 169.000 203.000 🏆

Safari 18

Key Value Key Value
Buffer size (bytes) 67108864 Buffer size (MB) 64.0
Rounds per case 10 Bytes tested 1, 2, 4
Timer performance.now() User agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.6 Safari/605.1.15
Platform MacIntel HW concurrency 4
Language en-US Languages en-US
Screen resolution 3840x2160 Screen pixel depth 24
Performance timeOrigin 1763455352927
Bytes Method Rounds Avg ms Min ms Max ms Winner
1 loop 10 1415.600 1365.000 1546.000
1 DataView 10 962.200 934.000 1029.000 🏆
2 loop 10 889.100 870.000 917.000
2 DataView 10 472.100 468.000 474.000 🏆
4 loop 10 849.900 694.000 1098.000
4 DataView 10 412.500 269.000 615.000 🏆

Karma Test

  Websock
    Receive queue methods
      rQpeek8
        √ should peek at the next byte without poping it off the queue
      rQshift8()
        √ should pop a single byte from the receive queue
      rQshift16()
        √ should pop two bytes from the receive queue and return a single number
      rQshift32()
        √ should pop four bytes from the receive queue and return a single number
      rQlen())
        √ should return the number of buffered bytes in the receive queue
      rQshiftStr
        √ should shift the given number of bytes off of the receive queue and return a string
        √ should be able to handle very large strings
      rQshiftBytes
        √ should shift the given number of bytes of the receive queue and return an array
        √ should return a shared array if requested
      rQpeekBytes
        √ should not modify the receive queue
        √ should return a shared array if requested
      rQwait
        √ should return true if there are not enough bytes in the receive queue
        √ should return false if there are enough bytes in the receive queue
        √ should return true and reduce rQi by "goback" if there are not enough bytes
        √ should raise an error if we try to go back more than possible
        √ should not reduce rQi if there are enough bytes
    Send queue methods
      sQpush8()
        √ should send a single byte
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpush16()
        √ should send a number as two bytes
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpush32()
        √ should send a number as two bytes
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
      sQpushString()
        √ should send a string buffer
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
        √ should implicitly split a large buffer
      sQpushBytes()
        √ should send a byte buffer
        √ should not send any data until flushing
        √ should implicitly flush if the queue is full
        √ should implicitly split a large buffer
      flush
        √ should actually send on the websocket
        √ should not call send if we do not have anything queued up
    lifecycle methods
      opening
        √ should pick the correct protocols if none are given
        √ should open the actual websocket
      attaching
        √ should attach to an existing websocket
      closing
        √ should close the actual websocket if it is open
        √ should close the actual websocket if it is connecting
        √ should not try to close the actual websocket if closing
        √ should not try to close the actual websocket if closed
        √ should reset onmessage to not call _recvMessage
      event handlers
        √ should call _recvMessage on a message
        √ should call the open event handler on opening
        √ should call the close event handler on closing
        √ should call the error event handler on error
      ready state
        √ should be "unused" after construction
        √ should be "connecting" if WebSocket is connecting
        √ should be "open" if WebSocket is open
        √ should be "closing" if WebSocket is closing
        √ should be "closed" if WebSocket is closed
        √ should be "unknown" if WebSocket state is unknown
        √ should be "connecting" if RTCDataChannel is connecting
        √ should be "open" if RTCDataChannel is open
        √ should be "closing" if RTCDataChannel is closing
        √ should be "closed" if RTCDataChannel is closed
        √ should be "unknown" if RTCDataChannel state is unknown
    WebSocket receiving
      √ should support adding data to the receive queue
      √ should call the message event handler if present
      √ should not call the message event handler if there is nothing in the receive queue
      √ should compact the receive queue when fully read
      √ should compact the receive queue when we reach the end of the buffer
      √ should automatically resize the receive queue if the incoming message is larger than the buffer
      √ should automatically resize the receive queue if the incoming message is larger than 1/8th of the buffer and we reach the end of the buffer

Can I use

https://caniuse.com/mdn-javascript_builtins_dataview

Comment thread core/websock.js Outdated
@PekingSpades
Copy link
Copy Markdown
Author

  1. Websock now reads 8/16/32-bit values directly via the cached DataView in rQshift8/16/32() and no longer calls _rQshift, so every call assumes _rQdv is in sync with _rQ.
  2. Tests that manually overwrite _rQ (the shared buffer stub in tests/test.rfb.js, plus the buffer-mutation cases in tests/test.websock.js) were updated to rebuild _rQdv whenever they replace the queue. Without that, the new inline methods dereferenced null.getUint* and the suite failed.
  3. For any future test that patches _rQ (or _sQ) to custom buffers, make sure to mimic the production invariant by also assigning new DataView(buffer) to _rQdv; otherwise the hot path will crash before the test actually exercises the intended behavior.

@PekingSpades PekingSpades requested a review from demike December 31, 2025 14:12
Copy link
Copy Markdown

@demike demike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@PekingSpades
Copy link
Copy Markdown
Author

@samhed

@samhed
Copy link
Copy Markdown
Member

samhed commented Jan 14, 2026

This looks interesting. It looks like a lot of testing was involved, I'm curious about the process here.

How was this improvement discovered? And how did you go about testning?

@PekingSpades
Copy link
Copy Markdown
Author

PekingSpades commented Jan 29, 2026

Hi! Happy to share the background and testing process. @samhed

How this improvement was discovered

I noticed a small detail while reading websock.js: there’s a comment that says
TODO(directxman12): test performance with these vs a DataView.

That TODO caught my attention because my team is also building a high-performance remote control product, and our controller side is Web-based as well. I’ve been digging into VNC and noVNC to learn how the high-throughput message buffering is done, and websock.js is a great reference for that. So I decided to actually run the performance comparison suggested by the TODO, and that’s what led to this change.

How I went about testing

The benchmarking approach was straightforward:

  • I used a small JS benchmark script (written with GPT’s help) to repeatedly run the relevant buffer operations.
  • The script did multiple rounds with a warm-up phase first, then measured steady-state performance across many iterations.
  • I ran the tests on multiple physical machines that I already have access to (so I could validate across different CPU/memory/OS combinations, not just a single device).

Unfortunately I can’t locate the exact script anymore (I didn’t preserve it at the time), but the structure was roughly:

  1. warm up loops,
  2. timed loops using performance.now(),
  3. repeat multiple times and take the average/median,
  4. avoid counting the first runs to reduce JIT/GC noise.

Why so many devices/browsers?

The main reason I tested across so many machines/browsers was to rule out “this TODO exists for a reason” scenarios — e.g. historical browser compatibility issues, engine-specific slow paths, etc. In practice I didn’t see compatibility problems, and the improvement was consistently measurable.

For older Safari versions, I didn’t have an old macOS machine available, so I used LambdaTest (their free quota) to cover those versions.

Outcome

Across the devices/browsers I tested, the change showed a clear performance improvement and I didn’t observe regressions or compatibility issues in the test matrix I ran.

If it would help, I can recreate a new minimal benchmark script and share it in the PR so others can reproduce/extend the testing going forward.

@PekingSpades
Copy link
Copy Markdown
Author

@samhed

@samhed
Copy link
Copy Markdown
Member

samhed commented Feb 7, 2026

It sounds like you only tested JS code snippets out of context from the rest of the noVNC code? Or am I misunderstanding? Did you do any manual "real-life" testing as well?

Yes, please share a benchmarking script, preferably similar to the one you used.

@PekingSpades
Copy link
Copy Markdown
Author

@samhed

(() => {
  const SIZE   = 64 * 1024 * 1024; // 64MB
  const ROUNDS = 10;     
  const BYTES_LIST = [1, 2, 4];

  const hasPerf = typeof performance !== "undefined" && typeof performance.now === "function";
  const now = hasPerf ? () => performance.now() : () => Date.now();
  const timerName = hasPerf ? "performance.now()" : "Date.now()";
  const buf = new ArrayBuffer(SIZE);
  const u8  = new Uint8Array(buf);
  const dv  = new DataView(buf);

  for (let i = 0; i < u8.length; i++) {
    u8[i] = i & 0xFF;
  }

  let _rQ  = u8;
  let _rQi = 0;

  function _rQshift_loop(bytes) {
    let res = 0;
    for (let byte = bytes - 1; byte >= 0; byte--) {
      res += _rQ[_rQi++] << (byte * 8);
    }
    return res >>> 0;
  }

  let _rQiDV = 0;

  function _rQshift_dataview(bytes) {
    let res;
    if (bytes === 1) {
      res = dv.getUint8(_rQiDV);
    } else if (bytes === 2) {
      res = dv.getUint16(_rQiDV, false); 
    } else if (bytes === 4) {
      res = dv.getUint32(_rQiDV, false);
    } else {
      throw new Error("only support 1/2/4 bytes");
    }
    _rQiDV += bytes;
    return res >>> 0;
  }

  // ===== benchmark =====
  const results = []; // { method, bytes, round, timeMs }

  function bench(bytes) {
    const iterations = (SIZE / bytes) | 0;
    let dummy = 0;

    for (let round = 1; round <= ROUNDS; round++) {
      // loop
      _rQi = 0;
      let t0 = now();
      for (let i = 0; i < iterations; i++) {
        dummy ^= _rQshift_loop(bytes);
      }
      let t1 = now();
      results.push({ method: "loop", bytes, round, timeMs: t1 - t0 });

      // DataView
      _rQiDV = 0;
      t0 = now();
      for (let i = 0; i < iterations; i++) {
        dummy ^= _rQshift_dataview(bytes);
      }
      t1 = now();
      results.push({ method: "DataView", bytes, round, timeMs: t1 - t0 });
    }

    globalThis.__benchmarkDummy = dummy;
  }

  BYTES_LIST.forEach(bench);

  function summarize(method, bytes) {
    const rows = results.filter(r => r.method === method && r.bytes === bytes);
    const times = rows.map(r => r.timeMs);
    const sum = times.reduce((a, b) => a + b, 0);
    const avg = sum / times.length;
    const min = Math.min(...times);
    const max = Math.max(...times);
    return { method, bytes, rounds: rows.length, avg, min, max };
  }

  const summaries = [];
  ["loop", "DataView"].forEach(method => {
    BYTES_LIST.forEach(bytes => {
      summaries.push(summarize(method, bytes));
    });
  });

  const winners = {}; // { [bytes]: "loop" | "DataView" | "tie" }
  BYTES_LIST.forEach(bytes => {
    const sLoop = summaries.find(s => s.bytes === bytes && s.method === "loop");
    const sDV   = summaries.find(s => s.bytes === bytes && s.method === "DataView");
    if (!sLoop || !sDV) return;
    if (Math.abs(sLoop.avg - sDV.avg) < 1e-6) {
      winners[bytes] = "tie";
    } else if (sLoop.avg < sDV.avg) {
      winners[bytes] = "loop";
    } else {
      winners[bytes] = "DataView";
    }
  });

  const envPairs = [];

  function addEnvPair(key, value) {
    if (value === undefined || value === null) return;
    const v = String(value).replace(/\|/g, "\\|");
    envPairs.push({ key, value: v });
  }

  // Config
  addEnvPair("Buffer size (bytes)", SIZE);
  addEnvPair("Buffer size (MB)", (SIZE / (1024 * 1024)).toFixed(1));
  addEnvPair("Rounds per case", ROUNDS);
  addEnvPair("Bytes tested", BYTES_LIST.join(", "));
  addEnvPair("Timer", timerName);

  // Client Info
  try {
    addEnvPair("User agent", navigator.userAgent);
    addEnvPair("Platform", navigator.platform);
    addEnvPair("HW concurrency", navigator.hardwareConcurrency);
    addEnvPair("Device memory (GB)", navigator.deviceMemory);
    addEnvPair("Language", navigator.language);
    addEnvPair("Languages", navigator.languages && navigator.languages.join(", "));
  } catch (e) {}

  try {
    addEnvPair("Screen resolution", `${screen.width}x${screen.height}`);
    addEnvPair("Screen pixel depth", screen.pixelDepth);
  } catch (e) {}

  try {
    if (hasPerf && performance && performance.memory) {
      addEnvPair("JS heap size limit (MB)", (performance.memory.jsHeapSizeLimit / (1024 * 1024)).toFixed(1));
      addEnvPair("Total JS heap (MB)", (performance.memory.totalJSHeapSize / (1024 * 1024)).toFixed(1));
      addEnvPair("Used JS heap (MB)", (performance.memory.usedJSHeapSize / (1024 * 1024)).toFixed(1));
    }
    if (hasPerf && performance && typeof performance.timeOrigin === "number") {
      addEnvPair("Performance timeOrigin", performance.timeOrigin);
    }
  } catch (e) {}

  let md = "";

  // Config + Client Info 
  md += `## Config & Client Info\n\n`;
  md += `| Key | Value | Key | Value |\n`;
  md += `| --- | ----- | --- | ----- |\n`;
  for (let i = 0; i < envPairs.length; i += 2) {
    const a = envPairs[i];
    const b = envPairs[i + 1];
    md += `| ${a.key} | ${a.value} | ${b ? b.key : ""} | ${b ? b.value : ""} |\n`;
  }
  md += `\n`;

  md += `## Result\n\n`;
  md += `| Bytes | Method   | Rounds | Avg ms | Min ms | Max ms | Winner |\n`;
  md += `| ----- | -------- | ------ | ------ | ------ | ------ | ------ |\n`;

  BYTES_LIST.forEach(bytes => {
    const sLoop = summaries.find(s => s.bytes === bytes && s.method === "loop");
    const sDV   = summaries.find(s => s.bytes === bytes && s.method === "DataView");
    const winner = winners[bytes];

    const loopWinEmoji =
      winner === "loop" ? "🏆" :
      winner === "tie"  ? "⚖️" : "";
    const dvWinEmoji =
      winner === "DataView" ? "🏆" :
      winner === "tie"      ? "⚖️" : "";

    if (sLoop) {
      md += `| ${bytes} | loop     | ${sLoop.rounds} | ${sLoop.avg.toFixed(3)} | ${sLoop.min.toFixed(3)} | ${sLoop.max.toFixed(3)} | ${loopWinEmoji} |\n`;
    }
    if (sDV) {
      md += `| ${bytes} | DataView | ${sDV.rounds} | ${sDV.avg.toFixed(3)} | ${sDV.min.toFixed(3)} | ${sDV.max.toFixed(3)} | ${dvWinEmoji} |\n`;
    }
  });

  md += `\n`;

  console.log(md);
})();

@samhed
Copy link
Copy Markdown
Member

samhed commented Feb 10, 2026

It sounds like you only tested JS code snippets out of context from the rest of the noVNC code? Or am I misunderstanding? Did you do any manual "real-life" testing as well?

@PekingSpades
Copy link
Copy Markdown
Author

My earlier numbers were mostly from isolated JS benchmarks, not a full live VNC-session benchmark.

I’ve now added two browser-level checks to the PR that exercise noVNC itself rather than standalone snippets:

  1. A smoke test that drives an actual noVNC RFB handshake plus a Raw framebuffer update in headless Chrome. That passes and renders the expected pixel.
  2. A protocol-stream benchmark that feeds complete FramebufferUpdate/CopyRect messages through the actual noVNC RFB/Websock path.

In the parser-focused configuration of that benchmark (display work stubbed so the measurement stays attributable to the receive path changed by this PR), I’m seeing about 30-34% improvement versus current master on repeated runs on my machine.

I also ran the same protocol stream with display work enabled. There the total time was essentially unchanged, which is why I think this particular optimization is hard to validate with end-to-end “real-life session” timing alone: once rendering is included, the receive-path signal gets swamped by display cost.

So the short answer is: I had not originally done a good live-session benchmark, but I have now added browser-level smoke/perf scripts to the PR that run noVNC in context rather than just isolated snippets. They can be rerun locally with node tests/perf/run_rfb_bench.mjs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants