Zola2026-03-07T00:00:00+00:00https://rustls.dev/atom.xmlBenchmarking rustls 0.23.37 vs OpenSSL 3.6.1 vs BoringSSL on x86_642026-03-07T00:00:00+00:002026-03-07T00:00:00+00:00
Unknown
https://rustls.dev/perf/2026-03-07-report/<h3 id="system-configuration">System configuration</h3>
<p>We ran the benchmarks on a bare-metal server with the following characteristics:</p>
<ul>
<li>OS: Debian 12 (Bookworm).</li>
<li>C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.</li>
<li>Rust toolchain: 1.94.0</li>
<li>CPU: <a href="https://www.intel.com/content/www/us/en/products/sku/214806/intel-xeon-e2386g-processor-12m-cache-3-50-ghz/specifications.html">Xeon E-2386G</a> (supporting AVX-512).</li>
<li>Memory: 32GB.</li>
<li>Extra configuration: hyper-threading disabled, dynamic frequency scaling disabled, cpu scaling
governor set to performance for all cores.</li>
</ul>
<h3 id="versions">Versions</h3>
<p>The benchmarking tool used for both OpenSSL and BoringSSL was <a href="https://github.com/ctz/openssl-bench/tree/82b86b22">openssl-bench 82b86b22</a>.</p>
<p>This was built from source with its makefile.</p>
<h4 id="boringssl">BoringSSL</h4>
<p>The tested version of BoringSSL is <a href="https://github.com/google/boringssl/tree/30cd935">30cd935</a>, which was the most recent point on main
when we started these measurements.</p>
<p>BoringSSL was built from source with <code>CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release</code>.</p>
<h4 id="openssl">OpenSSL</h4>
<p>The tested version of OpenSSL is <a href="https://github.com/openssl/openssl/tree/openssl-3.6.1">3.6.1</a>, which was the latest release at the time of writing.</p>
<p>OpenSSL was built from source with <code>./Configure ; make -j12</code>.</p>
<h4 id="rustls">Rustls</h4>
<p>The tested version of rustls is <a href="https://github.com/rustls/rustls/releases/tag/v%2F0.23.37">0.23.37</a>, which was the latest stable release at the time of writing.
This was used with aws-lc-rs 1.16.0 / aws-lc-sys 0.37.1.</p>
<h3 id="measurements">Measurements</h3>
<p>BoringSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
</span></code></pre>
<p>OpenSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure
</span></code></pre>
<p>rustls was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/rustls
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
</span></code></pre>
<h2 id="results">Results</h2>
<p>Transfer measurements are in megabytes per second.
Handshake units are handshakes per second.</p>
<table><thead><tr><th></th><th>BoringSSL 30cd935</th><th>OpenSSL 3.6.1</th><th>rustls 0.23.37</th></tr></thead><tbody>
<tr><td>transfer, 1.2, aes-128-gcm, sending</td><td>8291.9</td><td>6610.19</td><td>8133.85</td></tr>
<tr><td>transfer, 1.2, aes-128-gcm, receiving</td><td>6722.26</td><td>7129.43</td><td>7946.96</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, sending</td><td>7564.71</td><td>5844.35</td><td>7421.02</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, receiving</td><td>6217.68</td><td>6237.52</td><td>7332.91</td></tr>
<tr><td></td><td>BoringSSL 30cd935</td><td>OpenSSL 3.6.1</td><td>rustls 0.23.37</td></tr>
<tr><td>full handshakes, 1.2, rsa, client</td><td>5657.81</td><td>3186.87</td><td>8122.99</td></tr>
<tr><td>full handshakes, 1.2, rsa, server</td><td>1476.66</td><td>2154.41</td><td>2835.45</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, client</td><td>3545.32</td><td>2201.16</td><td>4344.30</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, server</td><td>9057.57</td><td>5226.96</td><td>13,524.99</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>3179.48</td><td>2219.64</td><td>4766.63</td></tr>
<tr><td>full handshakes, 1.3, rsa, server</td><td>1302.03</td><td>1712.64</td><td>2357.36</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, client</td><td>2364.8</td><td>1642.88</td><td>3160.39</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, server</td><td>4981.38</td><td>3176.58</td><td>6786.26</td></tr>
<tr><td></td><td>BoringSSL 30cd935</td><td>OpenSSL 3.6.1</td><td>rustls 0.23.37</td></tr>
<tr><td>resumed handshakes, 1.2, client</td><td>45,390.3</td><td>21,136.9</td><td>63,870.34</td></tr>
<tr><td>resumed handshakes, 1.2, server</td><td>44,429</td><td>22,647.1</td><td>72,480.52</td></tr>
<tr><td>resumed handshakes, 1.3, client</td><td>4648.41</td><td>3594.88</td><td>6735.88</td></tr>
<tr><td>resumed handshakes, 1.3, server</td><td>5687.32</td><td>3780</td><td>7249.08</td></tr>
</tbody></table>
<p><img src="/2026-03-07-transfer.svg" alt="graph of transfer speeds" /></p>
<p><img src="/2026-03-07-full-handshake.svg" alt="graph of full handshakes" /></p>
<p><img src="/2026-03-07-resumed-handshake.svg" alt="graph of resumed handshakes" /></p>
Rustls and the Rust Foundation's Rust Innovation Lab2025-09-03T00:00:00+00:002025-09-03T00:00:00+00:00
Joe Birr-Pixton
with
Dirkjan Ochtman, Daniel McCarney and Josh Aas
https://rustls.dev/blog/2025-09-03-rustls-and-rust-foundation/<p>As you may have seen, <a href="https://rustfoundation.org/media/rust-foundation-launches-rust-innovation-lab-with-rustls-as-inaugural-project/">Rustls is the first project in the Rust Foundation's new Rust Innovation Lab program</a>.</p>
<p>Rustls is a project I started in May 2016 as I learned Rust. Since then it has grown from a casual project, to having multiple maintainers, hundreds of contributors, funded contributions, and finally multiple funded maintainers.</p>
<p>As a project it has expanded to incorporate adjacent efforts, such as an <a href="https://github.com/rustls/rustls-openssl-compat">OpenSSL-compatible API</a>, a <a href="https://github.com/rustls/rustls-ffi">C API</a>, <a href="https://github.com/rustls/rustls-platform-verifier">deep integration into platform-specific certificate verifiers</a> and integrations with important Rust ecosystem crates such as <a href="http://github.com/rustls/tokio-rustls">tokio</a> and <a href="http://github.com/rustls/hyper-rustls">hyper</a>.</p>
<p>Now it supports a significant amount of the crates ecosystem and applications with <em>billions</em> of users.</p>
<p>Giving the Rustls project an administrative and legal home is the next step in that development.</p>
<p>Users will see no change in the project's direction or personnel. If you rely on Rustls in a commercial context we would love to talk about how we can address your needs, and how we can work together to support the project long-term.</p>
<p>We want to thank ISRG for its <a href="https://www.memorysafety.org/initiative/rustls/">significant and ongoing support</a>, and <a href="https://www.sovereign.tech/tech/rustls">Sovereign Tech Agency</a> for their recent funding of the project</p>
<h1 id="q-a">Q+A:</h1>
<h2 id="why-do-this-now">Why do this now?</h2>
<p>We want to make it easier for potential funding sources to support the project. In conversations, it became clear that a clear legal and governance status would make the project more attractive for funders.</p>
<h2 id="why-the-rust-foundation">Why the Rust Foundation?</h2>
<p>We feel the Rust Foundation shares our goals in promoting the Rust language and serving its users.</p>
<h2 id="does-this-mean-rustls-is-being-funded-by-my-organization-s-membership-of-the-rust-foundation">Does this mean Rustls is being funded by my organization's membership of the Rust Foundation?</h2>
<p>No. Only funding specifically for Rustls will be made available to the Rustls project. This does not affect funding to the Rust Project or other initiatives funded by the Rust Foundation.</p>
<h2 id="does-this-change-rustls-technical-direction-or-personnel">Does this change Rustls' technical direction or personnel?</h2>
<p>It does not change that. The existing maintainers have complete and unequivocal control over the Rustls project and project roadmap.</p>
Benchmarking rustls 0.23.31 vs OpenSSL 3.5.15 vs BoringSSL on x86_642025-07-31T00:00:00+00:002025-07-31T00:00:00+00:00
Unknown
https://rustls.dev/perf/2025-07-31-report/<h3 id="system-configuration">System configuration</h3>
<p>We ran the benchmarks on a bare-metal server with the following characteristics:</p>
<ul>
<li>OS: Debian 12 (Bookworm).</li>
<li>C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.</li>
<li>CPU: <a href="https://www.intel.com/content/www/us/en/products/sku/214806/intel-xeon-e2386g-processor-12m-cache-3-50-ghz/specifications.html">Xeon E-2386G</a> (supporting AVX-512).</li>
<li>Memory: 32GB.</li>
<li>Extra configuration: hyper-threading disabled, dynamic frequency scaling disabled, cpu scaling
governor set to performance for all cores.</li>
</ul>
<h3 id="versions">Versions</h3>
<p>The benchmarking tool used for both OpenSSL and BoringSSL was <a href="https://github.com/ctz/openssl-bench/tree/82b86b22">openssl-bench 82b86b22</a>.</p>
<p>This was built from source with its makefile.</p>
<h4 id="boringssl">BoringSSL</h4>
<p>The tested version of BoringSSL is <a href="https://github.com/google/boringssl/tree/0.20250701.0">0.20250701.0</a>, which was the most recent point on master
when we started these measurements.</p>
<p>BoringSSL was built from source with <code>CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release</code>.</p>
<h4 id="openssl">OpenSSL</h4>
<p>The tested version of OpenSSL is <a href="https://github.com/openssl/openssl/tree/openssl-3.5.1">3.5.1</a>, which was the latest release at the time of writing.</p>
<p>OpenSSL was built from source with <code>./Configure ; make -j12</code>.</p>
<h4 id="rustls">Rustls</h4>
<p>The tested version of rustls is <a href="https://github.com/rustls/rustls/releases/tag/v%2F0.23.31">0.23.31</a>, which was the latest release at the time of writing.
This was used with aws-lc-rs 1.13.1 / aws-lc-sys 0.29.0.</p>
<h3 id="measurements">Measurements</h3>
<p>BoringSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
</span></code></pre>
<p>OpenSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure
</span></code></pre>
<p>rustls was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/rustls
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
</span></code></pre>
<h2 id="results">Results</h2>
<p>Transfer measurements are in megabytes per second.
Handshake units are handshakes per second.</p>
<table><thead><tr><th></th><th>BoringSSL 0.20250701.0</th><th>OpenSSL 3.5.1</th><th>rustls 0.23.31</th></tr></thead><tbody>
<tr><td>transfer, 1.2, aes-128-gcm, sending</td><td>8575.27</td><td>6565.22</td><td>8074.82</td></tr>
<tr><td>transfer, 1.2, aes-128-gcm, receiving</td><td>6986.81</td><td>7219.67</td><td>7952.68</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, sending</td><td>7739.61</td><td>6093.27</td><td>7628.68</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, receiving</td><td>6421.36</td><td>6472.3</td><td>7407.83</td></tr>
<tr><td></td><td>BoringSSL 0.20250701.0</td><td>OpenSSL 3.5.1</td><td>rustls 0.23.31</td></tr>
<tr><td>full handshakes, 1.2, rsa, client</td><td>5375.06</td><td>3251.54</td><td>8206.33</td></tr>
<tr><td>full handshakes, 1.2, rsa, server</td><td>1447.33</td><td>2169</td><td>2857.81</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, client</td><td>3454.89</td><td>2195.55</td><td>4345.05</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, server</td><td>9096.44</td><td>5178.02</td><td>13618.81</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>3125.36</td><td>2222.21</td><td>4187.28</td></tr>
<tr><td>full handshakes, 1.3, rsa, server</td><td>1285.88</td><td>1714.24</td><td>2273.13</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, client</td><td>2344.76</td><td>1650.56</td><td>2884.83</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, server</td><td>5113.83</td><td>3183.26</td><td>6229.71</td></tr>
<tr><td></td><td>BoringSSL 0.20250701.0</td><td>OpenSSL 3.5.1</td><td>rustls 0.23.31</td></tr>
<tr><td>resumed handshakes, 1.2, client</td><td>47,509.5</td><td>19,936.5</td><td>65,617.35</td></tr>
<tr><td>resumed handshakes, 1.2, server</td><td>46,561.8</td><td>21,043.1</td><td>74,771.51</td></tr>
<tr><td>resumed handshakes, 1.3, client</td><td>4695.79</td><td>3574.86</td><td>5614.4</td></tr>
<tr><td>resumed handshakes, 1.3, server</td><td>5803.03</td><td>3771.28</td><td>6623.94</td></tr>
</tbody></table>
<p><img src="/2025-07-31-transfer.svg" alt="graph of transfer speeds" /></p>
<p><img src="/2025-07-31-full-handshake.svg" alt="graph of full handshakes" /></p>
<p><img src="/2025-07-31-resumed-handshake.svg" alt="graph of resumed handshakes" /></p>
<h3 id="notable-changes-since-last-time">Notable changes since <a href="/perf/2024-10-18-report">last time</a></h3>
<h4 id="post-quantum-key-exchange">Post-quantum key exchange</h4>
<p>OpenSSL and rustls now use X25519MLKEM768 post-quantum key exchange by default.
BoringSSL is configured to do the same. This applies to all TLS1.3 handshakes.</p>
<table><thead><tr><th></th><th>old</th><th></th><th>new</th></tr></thead><tbody>
<tr><td></td><td>BoringSSL 76968bb3</td><td>➡️</td><td>BoringSSL 0.20250701.0</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>4813.91 hs/s</td><td>1.54x slower</td><td>3125.36 hs/s</td></tr>
<tr><td></td><td>OpenSSL 3.3.2</td><td>➡️</td><td>OpenSSL 3.5.1</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>2788.76 hs/s</td><td>1.25x slower</td><td>2222.21 hs/s</td></tr>
<tr><td></td><td>rustls 0.23.15</td><td>➡️</td><td>rustls 0.23.31</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>6803.93 hs/s</td><td>1.62x slower</td><td>4187.28 hs/s</td></tr>
</tbody></table>
<h4 id="boringssl-avx-512-aes-gcm">BoringSSL AVX-512 AES-GCM</h4>
<p>BoringSSL now has AVX512-accelerated AES-GCM. Since last time, that looks like:</p>
<table><thead><tr><th></th><th>old</th><th></th><th>new</th></tr></thead><tbody>
<tr><td></td><td>BoringSSL 76968bb3</td><td>➡️</td><td>BoringSSL 0.20250701.0</td></tr>
<tr><td>transfer, 1.2, aes-128-gcm, sending</td><td>5043.04 MB/s</td><td>1.7x faster</td><td>8575.27 MB/s</td></tr>
</tbody></table>
<h4 id="rustls-extension-optimizations">rustls extension optimizations</h4>
<p>We spent some time improving our internal representation for TLS extensions. This applied
to clients and servers, and all TLS versions. But it's most visible here in TLS1.2
performance because there aren't any cryptography changes masking it.</p>
<table><thead><tr><th></th><th>old</th><th></th><th>new</th></tr></thead><tbody>
<tr><td></td><td>rustls 0.23.15</td><td>➡️</td><td>rustls 0.23.31</td></tr>
<tr><td>resumed handshakes, 1.2, client</td><td>64,722.55 hs/s</td><td>1.02x faster</td><td>65,617.35 hs/s</td></tr>
<tr><td>resumed handshakes, 1.2, server</td><td>71,149.91 hs/s</td><td>1.05x faster</td><td>74,771.51 hs/s</td></tr>
</tbody></table>
Measuring and (slightly) Improving Post-Quantum Handshake Performance2024-12-17T00:00:00+00:002024-12-17T00:00:00+00:00
Unknown
https://rustls.dev/perf/2024-12-17-pq-kx/<p>To defend against the potential advent of "Cryptographically Relevant Quantum Computers"
there is a move to using "hybrid" key exchange algorithms. These glue together
a widely-deployed classical algorithm (like <a href="https://datatracker.ietf.org/doc/html/rfc7748">X25519</a>) and a new post-quantum-secure algorithm
(like <a href="https://csrc.nist.gov/pubs/fips/203/final">ML-KEM</a>) and treat the result as one TLS-level key exchange algorithm (like <a href="https://datatracker.ietf.org/doc/draft-ietf-tls-ecdhe-mlkem/">X25519MLKEM768</a>).</p>
<p>In this report, first we'll measure the additional cost of post-quantum-secure key exchange.
Then we'll describe and measure an optimization we have implemented.</p>
<h1 id="headline-measurements">Headline measurements</h1>
<p>All these measurements are taken on our amd64 benchmarking machine, which has a
<a href="https://www.intel.com/content/www/us/en/products/sku/214806/intel-xeon-e2386g-processor-12m-cache-3-50-ghz/specifications.html">Xeon E-2386G</a> CPU. We'll compare:</p>
<ul>
<li>rustls using post-quantum-insecure X25519 key exchange,</li>
<li>rustls using post-quantum-secure X25519MLKEM768 key exchange, and</li>
<li>OpenSSL 3.3.2 using post-quantum-insecure X25519 key exchange.</li>
</ul>
<p>All three are taken on the same hardware, and the latter measurements are from
<a href="https://rustls.dev/perf/2024-10-18-report/">our previous report</a> -- which also contains reproduction
instructions and describes what the benchmarks measure.</p>
<p>One important thing to note is that post-quantum key exchange involves sending and
receiving much larger messages than classical ones. Our benchmark design only covers
CPU costs -- and does not include networking -- so real-world performance will
be worse than these measurements.</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/tls13-client-hs-openssl.svg" alt="client handshake performance results on amd64 architecture" /></p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/tls13-server-hs-openssl.svg" alt="server handshake performance results on amd64 architecture" /></p>
<p>The cost of X25519MLKEM768 post-quantum key exchange is clearly visible for
both clients and servers.</p>
<p>We can see that the performance headroom that rustls has attained means we can <em>almost</em>
completely absorb the extra cost of post-quantum key exchange, while still performing
better than (post-quantum-insecure) OpenSSL -- with the exception of client resumption.</p>
<p>We will do further comparative benchmarking in this area when OpenSSL gains post-quantum key
exchange support.</p>
<h1 id="sharing-x25519-setup-costs">Sharing X25519 setup costs</h1>
<h2 id="background">Background</h2>
<p>In TLS1.3, the client starts the key exchange in its first message (the <code>ClientHello</code>).
The <code>ClientHello</code> includes both a description of which algorithms the client supports, and
zero or more presumptive "key shares".</p>
<p>The server then evaluates which algorithms it is willing to use, and either uses one
of the presumptive key shares, or replies with a <code>HelloRetryRequest</code> which instructs
the client to send new <code>ClientHello</code> with a specific, mutually-acceptable key share.</p>
<p>A <code>HelloRetryRequest</code> can be expensive, because it introduces an additional round trip
into the handshake. It also means any work the client did for its presumptive key
shares is wasted.</p>
<p>It's therefore advantageous for a client to avoid <code>HelloRetryRequest</code>s, by:</p>
<ul>
<li>
<p><strong>Having prior knowledge of the server's preferences.</strong> <a href="https://datatracker.ietf.org/doc/draft-ietf-tls-key-share-prediction/">draft-ietf-tls-key-share-prediction</a>
is an effort to standardize a mechanism for a client to learn this out-of-band.</p>
</li>
<li>
<p><strong>Remembering a server's preferences from a previous connection.</strong> rustls has
done this since adding support for TLS1.3 in 2017. This generally means
a client making many connections to one server may avoid repeated <code>HelloRetryRequest</code>s.</p>
</li>
<li>
<p><strong>Sending many presumptive key shares.</strong> Though there's an obvious trade-off
in terms of wasted computation and message size.</p>
</li>
<li>
<p><strong>Following ecosystem preferences.</strong> <a href="https://datatracker.ietf.org/doc/html/rfc7748">X25519</a> key exchange is overwhelmingly
preferred in TLS1.3 implementations, due to its performance and implementation
quality.</p>
</li>
</ul>
<p>The key shares in a <code>ClientHello</code> would look like:</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/hybrid-only.svg" alt="diagram of TLS1.3 client key exchange with X25519MLKEM768" /></p>
<p>At least for a transitional period, we want to avoid a <code>HelloRetryRequest</code> round
trip when connecting to a server that hasn't been upgraded to support X25519MLKEM768.
That means also offering a separate X25519 key share:</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/hybrid-both.svg" alt="diagram of TLS1.3 client key exchange with X25519MLKEM768 and X25519" /></p>
<p>However, this arrangement is not optimal. While X25519 setup is very fast, we are doing it twice
and then we are guaranteed to throw away half of that work, because the server can only ever select
one key share to use.</p>
<p>Instead, we can do:</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/hybrid-opt.svg" alt="diagram of TLS1.3 optimized client key exchange with X25519MLKEM768 and X25519" /></p>
<p>This report measures the benefit of that optimization.</p>
<p>This optimization is described further in <a href="https://www.ietf.org/archive/id/draft-ietf-tls-hybrid-design-11.html#name-transmitting-public-keys-an">draft-ietf-tls-hybrid-design</a> section 3.2.</p>
<h2 id="micro-benchmarking">Micro benchmarking</h2>
<p>First, we can micro-benchmark the time to construct and serialize a <code>ClientHello</code>, in a variety
of situations:</p>
<ul>
<li>X25519 key share included only.</li>
<li>X25519MLKEM768 and X25519 key shares, with the optimization.</li>
<li>X25519MLKEM768 and X25519 key shares, without the optimization.</li>
</ul>
<p>We run this on two machines that cover both amd64 (Xeon E-2386G) and aarch64 (Ampere Altra Q80-30)
architectures.</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/microbench-amd64.svg" alt="micro benchmark results on amd64 architecture" /></p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/microbench-arm64.svg" alt="micro benchmark results on arm64 architecture" /></p>
<p>From this we can see:</p>
<ul>
<li>There is a small but measurable benefit, as expected.</li>
<li>ML-KEM-768 key generation costs are significantly more expensive than X25519.</li>
</ul>
<h2 id="whole-handshakes">Whole handshakes</h2>
<p>Next, let's measure the same scenarios in the context of whole client handshakes.
The remaining measurements are only done on our amd64 benchmark machine.</p>
<p>The above optimization only affects the client's first message, so now we'll see
whether the effect of the optimization is meaningful when compared to the rest
of the computation a client must do.</p>
<p><img src="https://rustls.dev/perf/2024-12-17-pq-kx/tls13-client-hs.svg" alt="client handshake performance results on amd64 architecture" /></p>
<p>The difference is visible but small, as it has been diluted by other parts
of the handshake. It is approximately 4.3% for resumptions,
2.8% for full RSA handshakes, and 2.6% for ECDSA handshakes.</p>
Measuring and Improving rustls's Multithreaded Performance2024-11-28T00:00:00+00:002024-11-28T00:00:00+00:00
Unknown
https://rustls.dev/perf/2024-11-28-threading/<h3 id="system-configuration">System configuration</h3>
<p>We ran the benchmarks on a bare-metal server with the following characteristics:</p>
<ul>
<li>OS: Debian 12 (Bookworm).</li>
<li>C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.</li>
<li>CPU: Ampere Altra Q80-30, 80 cores.</li>
<li>Memory: 128GB.</li>
<li>All cores set to <code>performance</code> CPU frequency governor.</li>
</ul>
<p>This is the <a href="https://www.hetzner.com/dedicated-rootserver/matrix-rx/">Hetzner RX170</a>.</p>
<h3 id="how-the-benchmarks-work">How the benchmarks work</h3>
<p>Compared to <a href="/perf">previous reports</a>, the benchmark tool can now perform the same
benchmarks in many threads simultaneously. Each thread runs the same benchmarking
operation as before, and threads do not contend with each other <em>except</em> via
the internals of the TLS library.</p>
<p>As before, the benchmarking is performed by measuring a TLS client "connecting" to
a TLS server over a memory buffer -- there is no network latency, system calls, or
other overhead that would be present in a typical networked application.
This arrangement actually should be the worst case for multithreaded testing: every
thread should be working all the time (rather than waiting for IO) and therefore
contention on any locks in the library under test should be maximal.</p>
<h3 id="versions">Versions</h3>
<p>The benchmarking tool used for both OpenSSL and BoringSSL was <a href="https://github.com/ctz/openssl-bench/tree/02249496d2963e2a0b694e7be3b37f0d85f8eccd">openssl-bench <code>02249496</code></a>.</p>
<h4 id="boringssl-and-openssl">BoringSSL and OpenSSL</h4>
<p>The version of BoringSSL and its build instructions are the same
as <a href="https://rustls.dev/perf/2024-10-18-report/">the previous report</a>.</p>
<p>We test OpenSSL 3.4.0 which is the latest release at the time of writing.</p>
<p>We also include measurements using OpenSSL 3.0.14, as packaged by Debian
as 3.0.14-1~deb12u2. This is included to observe the thread scalability improvements
made between OpenSSL 3.0 and 3.4.</p>
<h4 id="rustls">Rustls</h4>
<p>The tested version of rustls was 0.23.16, which was the latest release when this work
was started. This was used with aws-lc-rs 1.10.0 / aws-lc-sys 0.22.0.</p>
<p>Additionally the following three commits were included, which affect the benchmark tool but do not affect the core crate:</p>
<ul>
<li><a href="https://github.com/rustls/rustls/commit/a5d510ea4e5a44611f49985bbaba84b6c4f51533">a5d510ea</a></li>
<li><a href="https://github.com/rustls/rustls/commit/44522ad089add58bc7df54ec9903528ab6d5f64f">44522ad0</a></li>
<li><a href="https://github.com/rustls/rustls/commit/d1c33f8641c1c69edc27d98047c38f7f852f55eb">d1c33f86</a></li>
</ul>
<h3 id="measurements">Measurements</h3>
<p>BoringSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=2 setarch -R make threads BORINGSSL=1
</span></code></pre>
<p>OpenSSL 3.4.0 was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=2 setarch -R make threads
</span></code></pre>
<p>OpenSSL 3.0.14 was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=2 setarch -R make threads HOST_OPENSSL=1
</span></code></pre>
<p>rustls was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/rustls
</span><span>$ BENCH_MULTIPLIER=2 setarch -R make -f admin/bench-measure.mk threads
</span></code></pre>
<h2 id="initial-results">Initial results</h2>
<p>Since we now have three dimensions of measurement (implementation,
thread count, test case), this is restricted to some selected test cases.</p>
<p>We concentrate on server performance, as servers seem more likely to
be run at large concurrencies.</p>
<p>The graphs below are handshakes per second <em>per thread</em>. The ideal
and expected shape should be a flat, horizontal line up to 80 threads,
with a fall-off after that. A flat line means doubling the number of
threads doubles the throughput; a declining line would mean less than
that, and indicates additional threads reduce the per-thread performance.</p>
<h3 id="full-tls1-3-handshakes-on-the-server">Full TLS1.3 handshakes on the server</h3>
<p>This case is a common one for a TLS server, and excludes
any need for state on the server side.</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/full-server.svg" alt="graph of TLS1.3 full handshake server performance" /></p>
<p>This graph shows the OpenSSL 3 scalability problems, the improvements
made between OpenSSL 3.0 and 3.4, and the absence of such problems
in BoringSSL and rustls.</p>
<h3 id="resumed-tls1-2-handshakes-on-the-server">Resumed TLS1.2 handshakes on the server</h3>
<p>This covers both ticket-based resumption, and session-id-based resumption.
The latter requires the server to maintain a cache across threads, so some
"droop" is expected in these traces.</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/resumed-12-server.svg" alt="graph of TLS1.2 resumed handshake server performance" /></p>
<p>Clearly something is not right with rustls's ticket resumption performance.</p>
<h3 id="resumed-tls1-3-handshakes-on-the-server">Resumed TLS1.3 handshakes on the server</h3>
<p>This covers only ticket-based resumption, so the server needs no state between
threads.</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/resumed-13-server.svg" alt="graph of TLS1.3 resumed handshake server performance" /></p>
<p>Again, rustls's ticket resumption performance is suspect.</p>
<h1 id="improving-rustls-s-ticket-resumption-performance">Improving rustls's ticket resumption performance</h1>
<p>In rustls we have a <code>TicketSwitcher</code>, which is responsible for rotating
between ticket keys periodically. This is important because (especially
in TLS1.2) compromise of the ticket key destroys the security of all past
and future sessions.</p>
<p>Unfortunately, a mutex was held around all ticket operations -- so that
a thread finding the key need to be rotated did not race another thread
currently using that key.</p>
<p>In <a href="https://github.com/rustls/rustls/pull/2193">PR #2193</a> this was improved
to use a rwlock -- this means there is no contention at all in the common case.</p>
<p>The improvement was released in <a href="https://github.com/rustls/rustls/releases/tag/v%2F0.23.17">rustls 0.23.17</a>
and looks like:</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/resumed-12-server-postfix.svg" alt="graph of TLS1.2 resumed handshake server performance" /></p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/resumed-13-server-postfix.svg" alt="graph of TLS1.3 resumed handshake server performance" /></p>
<h1 id="measuring-worst-case-latency-at-high-concurrency">Measuring worst-case latency at high concurrency</h1>
<p>The above measurements only record <em>average handshake throughput per thread</em>,
which is fine for seeing how that changes with the number of threads.
Another important measure is: what is the latency distribution of all handshakes
performed, at high thread counts? What we're looking for here is evidence
that no one handshake experiences poor performance.</p>
<p>From here on, the version of rustls tested moved to <a href="https://github.com/rustls/rustls/commit/fc6b4a193b065604d10e16e79d601d8a30c18492"><code>fc6b4a19</code></a> (which is shortly after the 0.23.18 release)
to add support for outputting individual handshake latencies in the benchmark tool.</p>
<p>In the following test, we measure handshake latency when 80 threads are working
at once. In the rustls test, this produces very satisfying <code>htop</code> output:</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/htop-80-99.png" alt="htop output, showing 80 cores at high utilization" /></p>
<p>Note that the below charts use a base 10 logarithm x axis, to adequately show the
four distributions in one figure.</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/latency-resume-tls12-server.svg" alt="graph of TLS1.2 resumed handshake server latency" /></p>
<p>Here we can clearly see an improvement between OpenSSL 3.0 and OpenSSL 3.4.
rustls has the tightest distribution, followed by BoringSSL. (OpenSSL 3.4's distribution
may appear visually tighter than BoringSSL's, however the scale is logarithmic, so it is not.)</p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/latency-resume-tls13-server.svg" alt="graph of TLS1.3 resumed handshake server latency" /></p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/latency-fullhs-tls12-server.svg" alt="graph of TLS1.2 full handshake server latency" /></p>
<p><img src="https://rustls.dev/perf/2024-11-28-threading/latency-fullhs-tls13-server.svg" alt="graph of TLS1.3 full handshake server latency" /></p>
<p>The remainder show the same theme: rustls has a tight distribution, followed
by BoringSSL, followed by OpenSSL 3.4. OpenSSL 3.0 has the widest distribution.</p>
Benchmarking rustls 0.23.15 vs OpenSSL 3.3.2 vs BoringSSL on ARM642024-10-31T00:00:00+00:002024-10-31T00:00:00+00:00
Unknown
https://rustls.dev/perf/2024-10-31-arm64/<h3 id="system-configuration">System configuration</h3>
<p>We ran the benchmarks on a bare-metal server with the following characteristics:</p>
<ul>
<li>OS: Debian 12 (Bookworm).</li>
<li>C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.</li>
<li>CPU: Ampere Altra Q80-30, 80 cores.</li>
<li>Memory: 128GB.</li>
<li>All cores set to <code>performance</code> CPU frequency governor.</li>
</ul>
<p>This is the <a href="https://www.hetzner.com/dedicated-rootserver/matrix-rx/">Hetzner RX170</a>.</p>
<h3 id="versions">Versions</h3>
<p>The benchmarking tool used for both OpenSSL and BoringSSL was <a href="https://github.com/ctz/openssl-bench/tree/d5de57d92d483169cabf8ec22c351fe3819ba656">openssl-bench d5de57d9</a>.</p>
<p>This was built from source with its makefile.</p>
<h4 id="boringssl">BoringSSL</h4>
<p>The tested version of BoringSSL is <a href="https://github.com/google/boringssl/tree/76968bb3d5">76968bb3d5</a>, which was the most recent point on master
when we started <a href="https://rustls.dev/perf/2024-10-18-report/">the previous measurements</a>.</p>
<p>BoringSSL was built from source with <code>CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release</code>.
clang is used here to <a href="https://issues.chromium.org/issues/42290529">avoid potential performance deficits to GCC</a>
and for consistency with the x86 results.</p>
<h4 id="openssl">OpenSSL</h4>
<p>The tested version of OpenSSL is <a href="https://github.com/openssl/openssl/tree/openssl-3.3.2">3.3.2</a>, which was the latest release at the time of writing.</p>
<p>OpenSSL was built from source with <code>./Configure ; make -j12</code>.</p>
<h4 id="rustls">Rustls</h4>
<p>The tested version of rustls was 0.23.15, which was the latest release at the time of writing.
This was used with aws-lc-rs 1.10.0 / aws-lc-sys 0.22.0.</p>
<p>Additionally the following two commits were included, which affect the benchmark tool but do not affect the core crate:</p>
<ul>
<li><a href="https://github.com/rustls/rustls/commit/13144a0aa391bbec55aa92ee020e88c2bb8c3ea8">13144a0a</a></li>
<li><a href="https://github.com/rustls/rustls/commit/b553880a5f5caf58bbd2c43e4031e8c55d6da486">b553880a</a></li>
</ul>
<h3 id="measurements">Measurements</h3>
<p>BoringSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
</span></code></pre>
<p>OpenSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure
</span></code></pre>
<p>rustls was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/rustls
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
</span></code></pre>
<h2 id="results">Results</h2>
<p>Transfer measurements are in megabytes per second.
Handshake units are handshakes per second.</p>
<table><thead><tr><th></th><th>BoringSSL 76968bb3</th><th>OpenSSL 3.3.2</th><th>rustls 0.23.15</th></tr></thead><tbody>
<tr><td>transfer, 1.2, aes-128-gcm, sending</td><td>2211.53</td><td>2101.23</td><td>2077.19</td></tr>
<tr><td>transfer, 1.2, aes-128-gcm, receiving</td><td>2250.93</td><td>2344.94</td><td>2173.4</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, sending</td><td>1886.17</td><td>1741.07</td><td>1809.8</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, receiving</td><td>1899.72</td><td>1953.49</td><td>1935.8</td></tr>
<tr><td></td><td>BoringSSL 76968bb3</td><td>OpenSSL 3.3.2</td><td>rustls 0.23.15</td></tr>
<tr><td>full handshakes, 1.2, rsa, client</td><td>1968.07</td><td>1588.54</td><td>4498.42</td></tr>
<tr><td>full handshakes, 1.2, rsa, server</td><td>334.077</td><td>319.886</td><td>614.27</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, client</td><td>1527.73</td><td>1118.56</td><td>2154.06</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, server</td><td>3636.48</td><td>2950.54</td><td>8303.67</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>1861.15</td><td>1441.86</td><td>3986.81</td></tr>
<tr><td>full handshakes, 1.3, rsa, server</td><td>330.484</td><td>312.446</td><td>599.39</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, client</td><td>1459.64</td><td>1045.98</td><td>2032.11</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, server</td><td>3252.58</td><td>2440.25</td><td>6212.45</td></tr>
<tr><td></td><td>BoringSSL 76968bb3</td><td>OpenSSL 3.3.2</td><td>rustls 0.23.15</td></tr>
<tr><td>resumed handshakes, 1.2, client</td><td>45452.2</td><td>18396.5</td><td>65267.61</td></tr>
<tr><td>resumed handshakes, 1.2, server</td><td>43356.5</td><td>20426.4</td><td>65313.22</td></tr>
<tr><td>resumed handshakes, 1.3, client</td><td>3969.88</td><td>3282.14</td><td>8443.11</td></tr>
<tr><td>resumed handshakes, 1.3, server</td><td>3791.21</td><td>3071.61</td><td>7841.35</td></tr>
</tbody></table>
<p><img src="/2024-10-31-transfer.svg" alt="graph of transfer speeds" /></p>
<p><img src="/2024-10-31-full-handshake.svg" alt="graph of full handshakes" /></p>
<p><img src="/2024-10-31-resumed-handshake.svg" alt="graph of resumed handshakes" /></p>
<h3 id="observations-on-results">Observations on results</h3>
<p>rustls trails a little in throughput tests. The three underlying
cryptography libraries (BoringSSL, aws-lc, OpenSSL) have their own
benchmarking tools which confirm that there is little variance between
them:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/boringssl
</span><span>$ LD_LIBRARY_PATH=. ./tool/bssl speed -filter AES-256-GCM
</span><span>(...)
</span><span>Did 139000 AES-256-GCM (16384 bytes) seal operations in 1004138us (138427.2 ops/sec): 2268.0 MB/s
</span><span>(...)
</span></code></pre>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/aws-lc
</span><span>$ LD_LIBRARY_PATH=. ./tool/bssl speed -filter AES-256-GCM
</span><span>(...)
</span><span>Did 139000 EVP-AES-256-GCM encrypt (16384 bytes) operations in 1004522us (138374.3 ops/sec): 2267.1 MB/s
</span><span>(...)
</span></code></pre>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl
</span><span>$ LD_LIBRARY_PATH=. ./apps/openssl speed -aead -evp aes-256-gcm
</span><span>(...)
</span><span>Doing AES-256-GCM ops for 3s on 16384 size blocks: 434715 AES-256-GCM ops in 3.00s
</span><span>(...)
</span><span>The 'numbers' are in 1000s of bytes per second processed.
</span><span>type 2 bytes 31 bytes 136 bytes 1024 bytes 8192 bytes 16384 bytes
</span><span>AES-256-GCM 13570.29k 168865.38k 623050.41k 1766296.58k 2320760.83k 2374123.52k
</span></code></pre>
<p>That is 2268, 2267 and 2264 MB/s for BoringSSL, aws-lc and OpenSSL respectively.
Given these project's shared lineage, it would not be surprising if the implementations
are the same.</p>
Benchmarking rustls 0.23.15 vs OpenSSL 3.3.2 vs BoringSSL on x86_642024-10-18T00:00:00+00:002024-10-18T00:00:00+00:00
Unknown
https://rustls.dev/perf/2024-10-18-report/<h3 id="system-configuration">System configuration</h3>
<p>We ran the benchmarks on a bare-metal server with the following characteristics:</p>
<ul>
<li>OS: Debian 12 (Bookworm).</li>
<li>C/C++ toolchains: GCC 12.2.0 and Clang 14.0.6.</li>
<li>CPU: <a href="https://www.intel.com/content/www/us/en/products/sku/214806/intel-xeon-e2386g-processor-12m-cache-3-50-ghz/specifications.html">Xeon E-2386G</a> (supporting AVX-512).</li>
<li>Memory: 32GB.</li>
<li>Extra configuration: hyper-threading disabled, dynamic frequency scaling disabled, cpu scaling
governor set to performance for all cores.</li>
</ul>
<h3 id="versions">Versions</h3>
<p>The benchmarking tool used for both OpenSSL and BoringSSL was <a href="https://github.com/ctz/openssl-bench/tree/d5de57d92d483169cabf8ec22c351fe3819ba656">openssl-bench d5de57d9</a>.</p>
<p>This was built from source with its makefile.</p>
<h4 id="boringssl">BoringSSL</h4>
<p>The tested version of BoringSSL is <a href="https://github.com/google/boringssl/tree/76968bb3d5">76968bb3d5</a>, which was the most recent point on master
when we started these measurements.</p>
<p>BoringSSL was built from source with <code>CC=clang CXX=clang++ cmake -DCMAKE_BUILD_TYPE=Release</code>.
clang is used here to <a href="https://issues.chromium.org/issues/42290529">avoid potential performance deficits to GCC</a>.</p>
<h4 id="openssl">OpenSSL</h4>
<p>The tested version of OpenSSL is <a href="https://github.com/openssl/openssl/tree/openssl-3.3.2">3.3.2</a>, which was the latest release at the time of writing.</p>
<p>OpenSSL was built from source with <code>./Configure ; make -j12</code>.</p>
<h4 id="rustls">Rustls</h4>
<p>The tested version of rustls was 0.23.15, which was the latest release at the time of writing.
This was used with aws-lc-rs 1.10.0 / aws-lc-sys 0.22.0.</p>
<p>Additionally the following two commits were included, which affect the benchmark tool but do not affect the core crate:</p>
<ul>
<li><a href="https://github.com/rustls/rustls/commit/13144a0aa391bbec55aa92ee020e88c2bb8c3ea8">13144a0a</a></li>
<li><a href="https://github.com/rustls/rustls/commit/b553880a5f5caf58bbd2c43e4031e8c55d6da486">b553880a</a></li>
</ul>
<h3 id="measurements">Measurements</h3>
<p>BoringSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure BORINGSSL=1
</span></code></pre>
<p>OpenSSL was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/openssl-bench
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make measure
</span></code></pre>
<p>rustls was tested with this command:</p>
<pre data-lang="shell" style="background-color:#2b303b;color:#c0c5ce;" class="language-shell "><code class="language-shell" data-lang="shell"><span>~/bench/rustls
</span><span>$ BENCH_MULTIPLIER=16 setarch -R make -f admin/bench-measure.mk measure
</span></code></pre>
<h2 id="results">Results</h2>
<p>Transfer measurements are in megabytes per second.
Handshake units are handshakes per second.</p>
<table><thead><tr><th></th><th>BoringSSL 76968bb3</th><th>OpenSSL 3.3.2</th><th>rustls 0.23.15</th></tr></thead><tbody>
<tr><td>transfer, 1.2, aes-128-gcm, sending</td><td>5043.04</td><td>6560.79</td><td>8154.27</td></tr>
<tr><td>transfer, 1.2, aes-128-gcm, receiving</td><td>4429.26</td><td>7192.17</td><td>7436.88</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, sending</td><td>4332.5</td><td>5982.18</td><td>7094.13</td></tr>
<tr><td>transfer, 1.3, aes-256-gcm, receiving</td><td>3872.34</td><td>6521.2</td><td>7278.15</td></tr>
<tr><td></td><td>BoringSSL 76968bb3</td><td>OpenSSL 3.3.2</td><td>rustls 0.23.15</td></tr>
<tr><td>full handshakes, 1.2, rsa, client</td><td>5470.01</td><td>3201.92</td><td>8227.61</td></tr>
<tr><td>full handshakes, 1.2, rsa, server</td><td>1449.65</td><td>2159.59</td><td>2829.04</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, client</td><td>3451.51</td><td>2071.74</td><td>4369.39</td></tr>
<tr><td>full handshakes, 1.2, ecdsa, server</td><td>9115.04</td><td>5196.8</td><td>12921.68</td></tr>
<tr><td>full handshakes, 1.3, rsa, client</td><td>4813.91</td><td>2788.76</td><td>6803.93</td></tr>
<tr><td>full handshakes, 1.3, rsa, server</td><td>1386.06</td><td>1913.38</td><td>2544.31</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, client</td><td>3177.49</td><td>1859.77</td><td>3937.7</td></tr>
<tr><td>full handshakes, 1.3, ecdsa, server</td><td>7107.86</td><td>3938.47</td><td>8325.74</td></tr>
<tr><td></td><td>BoringSSL 76968bb3</td><td>OpenSSL 3.3.2</td><td>rustls 0.23.15</td></tr>
<tr><td>resumed handshakes, 1.2, client</td><td>45547.6</td><td>20703.8</td><td>64722.55</td></tr>
<tr><td>resumed handshakes, 1.2, server</td><td>43985.3</td><td>22268.1</td><td>71149.91</td></tr>
<tr><td>resumed handshakes, 1.3, client</td><td>9818.4</td><td>5328.6</td><td>10912.87</td></tr>
<tr><td>resumed handshakes, 1.3, server</td><td>8600.76</td><td>4866.2</td><td>9500.11</td></tr>
</tbody></table>
<p><img src="/2024-10-18-transfer.png" alt="graph of transfer speeds" /></p>
<p><img src="/2024-10-18-full-handshake.png" alt="graph of full handshakes" /></p>
<p><img src="/2024-10-18-resumed-handshake.png" alt="graph of resumed handshakes" /></p>
<h3 id="observations-on-results">Observations on results</h3>
<p>AVX-512 support shows up twice in these results:</p>
<ul>
<li>rustls/aws-lc and OpenSSL's performance advantage in throughput tests is due to use of AVX-512F/VAES.</li>
<li>rustls/aws-lc and OpenSSL's performance advantage in server-side full handshake tests is due to use of AVX-512IFMA-accelerated RSA.</li>
</ul>
<p>This support was contributed to the respective projects by Intel.</p>
<p>TLS1.3 resumption is slower than TLS1.2 resumption because it includes a fresh key exchange.</p>