ECH is enabled in Firefox by default since version 119, and is recommended by Mozilla to be used along with DNS over HTTPS. In September 2023, Chromium version 117 (used in Google Chrome, Microsoft Edge, Samsung Internet, and Opera) enabled it by default, also requiring keys to be deployed in HTTPS resource records in DNS.
For some reason, nobody’s written about what it takes to crawl a big chunk of the web in a while: the last point of reference I saw was Michael Nielsen’s post from 2012.
這次是純 HTML (沒有 JavaScript),在 2012 年的文章沒有特別提,應該也是沒有在管 JavaScript:
HTML only. The elephant in the room. Even by 2017 much of the web had come to require JavaScript. But I wanted an apples-to-apples comparison with older web crawls, and in any case, I was doing this as a side project and didn’t have time to add and optimize a bunch of playwright workers.
Profiles showed that parsing was clearly the bottleneck, but I was using the same lxml parsing library that was popular in 2012 (as suggested by Gemini). I eventually figured out that it was because the average web page has gotten a lot bigger: metrics from a test run indicated the P50 uncompressed page size is now 138KB, while the mean is even larger at 242KB - many times larger than Nielsen’s estimated average of 51KB in 2012!
I switched from lxml to selectolax, a much newer library wrapping Lexbor, a modern parser in C++ designed specifically for HTML5. The page claimed it can be 30 times faster than lxml. It wasn’t 30x overall, but it was a huge boost.
另外一個就是 HTTPS 的 handshake overhead 了:
That said, one part of fetching got harder: a LOT more websites use SSL now than a decade ago. This was crystal clear in profiles, with SSL handshake computation showing up as the most expensive function call, taking up a whopping 25% of all CPU time on average, which - given that we weren’t near saturating the network pipes, meant fetching became bottlenecked by the CPU before the network!
One year from now, with the release of Chrome 154 in October 2026, we will change the default settings of Chrome to enable “Always Use Secure Connections”. This means Chrome will ask for the user's permission before the first access to any public site without HTTPS.
Tor 的 onion service 在 v3 後,網址本身就是 public key 了 (在「Onion Service 第二版的退場計畫」這邊有提到),可以直接放下 256-bit 的 ed25519 public key:
The most obvious difference between V2 and V3 onion services is the different address format. V3 onion addresses have 56 characters instead of 16 (because they contain a full ed25519 public key, not just the hash of a public key), meaning that migrating from V2 to V3 requires all users to learn/remember/save a new onion address address.
但上 TLS certificate 還是有很多好處,第一個馬上想到的是 browser 有很多 API 只支援在 https:// 的情況下才能使用:
Some browser features are available only with HTTPS, like Secure Contexts, Content Security Policy (CSP), Secure cookies, WebAuthn, WebRTC and PaymentRequest.
Allows for the usage of HTTP/2, since some browsers only support it if on HTTPS. In the future, HTTP2 and HTTP3 may only work with TLS, and thus valid certificates.
這次 2025 年的「Debian Bug report logs - #1110370 stardict-plugin: CVE-2025-55014: YouDao plugin sends the user's selection from other apps to Chinese servers」。
It's clearly a defensive excuse, as it is extremely unrealistic to expect final users to read all the docs of all the dependencies of a Linux distro. It's the responsibility of the maintainer to read the subset of docs relevant to the package(s) they're contributing, not the user's.
It could be that they were caught with their pants down and posted an ill-thought response, but I'd lean strongly towards malice with such a poor defense, it borders on confession. Clipboards are one of the most critical privacy/security features, you don't ever want to leak them unintentionally.
> It's the responsibility of the maintainer to read the subset of docs relevant to the package(s) they're contributing, not the user's.
I agree a lot with this. You're supposed to trust your distributions packages. If you can't trust your distro, who can you trust? If you don't, find one you do trust, as that's a viable alternative. If none are trustworthy to you, then the only real option is to become your own package maintainer and have fun with Linux From Scratch.
使用這個 distribution (這邊是 Debian),代表你需要信任這個 distribution,而這次的情況可以看出,身為這個 distribution 的 official package maintainers 之一,對於 privacy issue report 處理的態度已經是 malicious behavior 的等級了。
$ curl -s https://aws.amazon.com/blogs/networking-and-content-delivery/boost-application-performance-amazon-cloudfront-enables-https-record/ | grep -io '[0-9a-z]*\.cloudfront\.net' | sort -u | xargs -n1 host -t https
d1d1et6laiqoh9.cloudfront.net has no HTTPS record
d1fgizr415o1r6.cloudfront.net has no HTTPS record
d1hemuljm71t2j.cloudfront.net has no HTTPS record
d1le29qyzha1u4.cloudfront.net has no HTTPS record
d1oqpvwii7b6rh.cloudfront.net has no HTTPS record
d1vo51ubqkiilx.cloudfront.net has no HTTPS record
d1yyh5dhdgifnx.cloudfront.net has no HTTPS record
d2908q01vomqb2.cloudfront.net has no HTTPS record
d2a6igt6jhaluh.cloudfront.net has no HTTPS record
d2cpw7vd6a2efr.cloudfront.net has no HTTPS record
d36cz9buwru1tt.cloudfront.net has no HTTPS record
d3borx6sfvnesb.cloudfront.net has no HTTPS record
d3ctxlq1ktw2nl.cloudfront.net has no HTTPS record
d3h2ozso0dirfl.cloudfront.net has no HTTPS record
d7umqicpi7263.cloudfront.net has no HTTPS record
dftu77xade0tc.cloudfront.net has no HTTPS record
dgen8gghn3u86.cloudfront.net has no HTTPS record
dk261l6wntthl.cloudfront.net has no HTTPS record
想要找個 CloudFront 有 HTTPS record 的看一下,發現好像都沒找到... 反倒是發現 CloudFront 的 distribution hostname 長度有變?
And we negotiated a deal where Microsoft and Netscape would both support the IETF taking over the protocol and standardizing it in an open process, which led to me editing the RFC.
As a part of the horsetrading, we had to make some changes to SSL 3.0 (so it wouldn't look the IETF was just rubberstamping Netscape's protocol), and we had to rename the protocol (for the same reason). And thus was born TLS 1.0 (which was really SSL 3.1). And of course, now, in retrospect, the whole thing looks silly.