ECH 變成 Standards Track 了

看到 ECH 變成 Standards Track 了:「RFC 9849 TLS Encrypted Client Hello (via)」。

瀏覽器都在 2023 的下半年預設啟用了 (Firefox 119 是 2023/10/24,Chromium 117 是 2023/09/12)。

ECH is enabled in Firefox by default since version 119, and is recommended by Mozilla to be used along with DNS over HTTPS. In September 2023, Chromium version 117 (used in Google Chrome, Microsoft Edge, Samsung Internet, and Opera) enabled it by default, also requiring keys to be deployed in HTTPS resource records in DNS.

而 server 端看起來最近也都支援了:nginx 的「Encrypted Client Hello Comes to NGINX」、Caddy 的「Automatic HTTPS」。

這個算是基礎建設,再降低可被偵測到的部分。從 ESNI 走到 ECH 這樣也五六年了:

2025 年爬十億個頁面的成本

上禮拜看到的文章,作者在 AWS 上面只用 25.5 個小時就爬了 1B 個頁面,在 tune 過效能後的成本是 US$462:「Crawling a billion web pages in just over 24 hours, in 2025 (via)」。

作者在文章裡面有提到一篇 2012 年的「How to crawl a quarter billion webpages in 40 hours」,當時用了 39.5 個小時左右,花了 US$580 爬了 250M 個頁面:

For some reason, nobody’s written about what it takes to crawl a big chunk of the web in a while: the last point of reference I saw was Michael Nielsen’s post from 2012.

這次是純 HTML (沒有 JavaScript),在 2012 年的文章沒有特別提,應該也是沒有在管 JavaScript:

HTML only. The elephant in the room. Even by 2017 much of the web had come to require JavaScript. But I wanted an apples-to-apples comparison with older web crawls, and in any case, I was doing this as a side project and didn’t have time to add and optimize a bunch of playwright workers.

裡面有提到幾個有趣的問題,一個是 parsing 的部分其實很吃 CPU,會是瓶頸之一,主要的原因是現在頁面比 2012 年大許多了,中位數與平均數都大很多:

Profiles showed that parsing was clearly the bottleneck, but I was using the same lxml parsing library that was popular in 2012 (as suggested by Gemini). I eventually figured out that it was because the average web page has gotten a lot bigger: metrics from a test run indicated the P50 uncompressed page size is now 138KB, while the mean is even larger at 242KB - many times larger than Nielsen’s estimated average of 51KB in 2012!

他的解法是換 parsing library,從 lxml 換成 selectolax,效能有巨大的提升:

I switched from lxml to selectolax, a much newer library wrapping Lexbor, a modern parser in C++ designed specifically for HTML5. The page claimed it can be 30 times faster than lxml. It wasn’t 30x overall, but it was a huge boost.

另外一個就是 HTTPS 的 handshake overhead 了:

That said, one part of fetching got harder: a LOT more websites use SSL now than a decade ago. This was crystal clear in profiles, with SSL handshake computation showing up as the most expensive function call, taking up a whopping 25% of all CPU time on average, which - given that we weren’t near saturating the network pipes, meant fetching became bottlenecked by the CPU before the network!

Chrome 的 telemetry 可以看出來 HTTPS 的成長速度,這邊比較重要的時間點是 Let's Encrypt 在 2015 年十月的時候透過 IdenTrust 的簽名讓現有的瀏覽器支援 Let's Encrypt:

所以現在的硬體與技術對於 raw data 的取得已經不是太大的問題了 (甚至 2012 年的時候就已經是可行的了...)。

Chrome 也要轉預設 HTTPS only 了

在「HTTPS by default (via)」這邊看到一年後的版本 (2026/10) 要預設 HTTPS only 了:

One year from now, with the release of Chrome 154 in October 2026, we will change the default settings of Chrome to enable “Always Use Secure Connections”. This means Chrome will ask for the user's permission before the first access to any public site without HTTPS.

2015/09/14 的時候 Let's Encrypt 生出了第一個 domain,所以大約花了十一年的時間讓主流瀏覽器預設開 HTTPS only。

幫你 MITM HTTPS 連線的 httpjail

tab 上放蠻久的工具 httpjail,可以 MITM 攔截 HTTPS 連線,依照設定的邏輯,或是掛進來的 javascript script 邏輯處理連線。

目前支援 macOSLinux,然後有不同的限制:

看起來可以省掉自己搞 CA 與 isolated environment 的功夫。

軟體發展蠻快的,上次看到跟這次看到發現補了不少文件?

用 Caddy 在同一個 Port 同時服務網站與 HTTPS Proxy

目標是希望把 Port 443 同時用在 HTTPS 網站以及 HTTPS Proxy (i.e. 瀏覽器到 Proxy 中間有加密的協定),其中 HTTPS Proxy 是跑 Squid,本來是用 nginx 做這件事情 (透過 ngx_stream_ssl_preread_modulessl_preread_server_name 判斷 hostname),但遇到了 nginx 無法動態決定是否要啟用 proxy_protocol 的問題:「How to set the proxy_protocol to 'on' in a conditional manner in nginx?」,不確定是 bug 還是 feature。

第二個問題是 Squid 對 proxy protocol 的支援是透過 require-proxy-header 指定在 http_port 上的,而 https_port 不支援這個參數,也就是說在配合這個情境下,如果還想讓 Squid 可以記錄正確的 IP address 資訊,就需要讓 nginx 解 TLS,用 HTTP Proxy 協定丟進 Squid,這個也就是 TLS termination。

最後遇到的問題是 nginx 這邊的「網站端 vhost」會記錄到錯誤的 IP address,永遠是 127.0.0.1,這個問題卡了一年多沒解我就放掉了,剛好發現 Raspbery Pi 3B 上面跑的是 32-bit 版的 OS,就想說趁著重裝 64-bit 版的 OS 後改用 Caddy

Caddy 這邊目前是透過 github.com/mholt/caddy-l4 這個模組實作「讀取 TLS SNI 資訊決定後續行為」,我是透過 xcaddy 編譯 Caddy 裝起來的,裝好後需要的設定是這樣:

        layer4 {
                :443 {
                        @proxytwhinet tls sni proxy-tw-hinet.gslin.com
                        route @proxytwhinet {
                                tls {
                                        connection_policy {
                                                alpn http/1.1
                                        }
                                }
                                proxy {
                                        proxy_protocol v1
                                        upstream 127.0.0.1:3128
                                }
                        }

                        route {
                                proxy {
                                        proxy_protocol v1
                                        upstream 127.0.0.1:444
                                }
                        }
                }
        }
        servers 127.0.0.1:444 {
                listener_wrappers {
                        proxy_protocol {
                                allow 127.0.0.1/32
                                fallback_policy require
                        }
                        tls
                }
        }

看到 proxy-tw-hinet.gslin.com 後往 127.0.0.1:3128 丟,其他的都往 127.0.0.1:444 丟。

這邊有個比較特別的是針對 proxy-tw-hinet.gslin.com 要強制走 HTTP/1.1 協定,這是因為 Squid 這端的 http_port 不吃 HTTP/2,而 Firefox 連 HTTPS Proxy 時讀到 Caddy 透過 ALPN 說可以用 h2,於是就不會通,所以針對這個 domain 我只收 HTTP/1.1。

再來就是網站的部分,127.0.0.1:444 都要吃 proxy protocol 的 IP address 資訊。

而每個網站實際定義的部分則是這樣,裡面把不相關的設定都刪掉了:

proxy-tw-hinet.gslin.com:444 {
        tls [email protected]
        [...]
}

rent-hinet.gslin.com:444 {
        tls [email protected]
        [...]
}

接下來是 Squid 這邊,主要就是 require-proxy-header 以及允許哪些來源的 IP address 才是合法的:

http_port 3128 require-proxy-header
proxy_protocol_access allow localhost

當初就只差記錄正確的 IP address,為了做到改下去這包工程好大,但總算是都搞定了...

Tor 的 .onion (Onion Service) 的 TLS Certificate

Hacker News 首頁上看到「Certificates for Onion Services (torproject.org)」這篇,提到了 Toronion service (hidden service) 上申請 TLS certificate 的需求:「Certificates for Onion Services」。

四年前寫過「讓 Tor 的 .onion 支援 HTTPS」這篇有提到這件事情,看起來後面沒有太多進展?

Tor 的 onion service 在 v3 後,網址本身就是 public key 了 (在「Onion Service 第二版的退場計畫」這邊有提到),可以直接放下 256-bit 的 ed25519 public key:

The most obvious difference between V2 and V3 onion services is the different address format. V3 onion addresses have 56 characters instead of 16 (because they contain a full ed25519 public key, not just the hash of a public key), meaning that migrating from V2 to V3 requires all users to learn/remember/save a new onion address address.

但上 TLS certificate 還是有很多好處,第一個馬上想到的是 browser 有很多 API 只支援在 https:// 的情況下才能使用:

Some browser features are available only with HTTPS, like Secure Contexts, Content Security Policy (CSP), Secure cookies, WebAuthn, WebRTC and PaymentRequest.

另外一個是 HTTP/2 雖然在「規格上」有支援 plaintext 模式,但「實作上」只有支援 HTTPS 模式,而 HTTP/2 的速度會比 HTTP/1.1 快不少:

Allows for the usage of HTTP/2, since some browsers only support it if on HTTPS. In the future, HTTP2 and HTTP3 may only work with TLS, and thus valid certificates.

這兩點對於透過 Tor 的應用來說幫助蠻大的,看看這波討論能不能再推動一些進度...

StarDict 預設會將剪貼簿的內容透過 HTTP (不是 HTTPS) 傳到中國的伺服器上

前幾天頗熱門的消息,StarDict 的預設安裝下,會將剪貼簿的內容透過 HTTP 傳到中國的伺服器上:「StarDict sends X11 clipboard to remote servers (lwn.net)」,文章是 LWN 的付費內容,所以連結是 SubscriberLink 分享出來的:「StarDict sends X11 clipboard to remote servers」。

整串 mailing list 上的討論可以在「Debian Bug report logs - #1110370 stardict-plugin: CVE-2025-55014: YouDao plugin sends the user's selection from other apps to Chinese servers」這邊看到,不長但有不少訊息透露出來。

其中一個點會發現這個套件不是第一次了:

可以看到類似的問題不斷的在重複發生。

另外一個問題是現任 maintainer 的問題,在 id=44879832 提到的:

It's clearly a defensive excuse, as it is extremely unrealistic to expect final users to read all the docs of all the dependencies of a Linux distro. It's the responsibility of the maintainer to read the subset of docs relevant to the package(s) they're contributing, not the user's.

It could be that they were caught with their pants down and posted an ill-thought response, but I'd lean strongly towards malice with such a poor defense, it borders on confession. Clipboards are one of the most critical privacy/security features, you don't ever want to leak them unintentionally.

id=44889789 則是更清楚描述了信任關係:

> It's the responsibility of the maintainer to read the subset of docs relevant to the package(s) they're contributing, not the user's.

I agree a lot with this. You're supposed to trust your distributions packages. If you can't trust your distro, who can you trust? If you don't, find one you do trust, as that's a viable alternative. If none are trustworthy to you, then the only real option is to become your own package maintainer and have fun with Linux From Scratch.

使用這個 distribution (這邊是 Debian),代表你需要信任這個 distribution,而這次的情況可以看出,身為這個 distribution 的 official package maintainers 之一,對於 privacy issue report 處理的態度已經是 malicious behavior 的等級了。

現在事情被鬧大的以後,才「計畫」要在 3.1 拆出來 (2025/08/09 的回覆),但現在都已經過一個禮拜了,可以從 https://packages.debian.org/search?keywords=stardict 這邊看到完全沒看到 3.1,大概會被三催四請後才丟出來。

nginx 也要原生支援 ACME HTTP-01 了

之前 nginx 要使用 ACME 協定拿到 TLS certificate (像是 Let's Encrypt) 需要透過另外的程式處理,像是 Certbot (官方推薦的) 或是 Dehydrated (我自己愛用的),這次 nginx 則是宣佈可以在 web server 透過 HTTP-01 申請 TLS certificate 了:「NGINX Introduces Native Support for ACME Protocol」。

在這之前,在 web server 上面可以直接走 HTTP-01 申請的應該就是 Caddy,不過早期 binary 是非開源的 (但 source 本身還是用 Apache License 2.0 放出來),但後來到 2019 年的時候全部改 Apache License 2.0 了:「Caddy Server 要採用 Open Source...」,在這之後有看到不少人開始嘗試用 Caddy... 但我應該是今年才跳進去的?

回到 nginx 的設定部分,可以看到目前官方的範例是這樣,還是露出了實作上的細節 (cache 的設定),另外有點怪,server_name 部分不確定是什麼:

server { 

    listen 443 ssl;

    server_name  .example.com;

    acme_certificate letsencrypt;

    ssl_certificate       $acme_certificate;
    ssl_certificate_key   $acme_certificate_key;
    ssl_certificate_cache max=2;
}

整體算是往好的方向走,再過個幾版對這些設定調整一下,加上也有計畫要支援 DNS-01 的認證方式,到時候 wildcard domain 也可以掛上去,應該會更好用。

CloudFront 的 HTTPS record 支援

月初 AWS 宣布 CloudFront 支援 HTTPS record,號稱可以加速 HTTPS 連線速度:「Boost application performance: Amazon CloudFront enables HTTPS record」。

這邊提到的加速主要來自於 HTTP/3,但傳統的作法會需要先用 TCP 的 HTTP/2 或是 HTTP/1.1 連線,在看到 Upgrade header 後才會用 HTTP/3。

而 HTTPS record 目前最大的用途是讓瀏覽器在 DNS query 時就知道可以用 HTTP/3,不需要透過 Upgrade header 得知,像是 www.google.com 就有 HTTPS record:

$ host -t https www.google.com
www.google.com has HTTPS record 1 . alpn="h2,h3"

不過 HTTP/3 是否比較快還有爭論,加上遇到 firewall 的 fallback 機制,說「boost」有點微妙,不過就當作宣傳詞吧... 至少 HTTP/3 發明的陣營是這樣宣傳的。

但 AWS blog 這篇用到的 CloudFront 看起來都沒打開啊:

$ curl -s https://aws.amazon.com/blogs/networking-and-content-delivery/boost-application-performance-amazon-cloudfront-enables-https-record/ | grep -io '[0-9a-z]*\.cloudfront\.net' | sort -u | xargs -n1 host -t https
d1d1et6laiqoh9.cloudfront.net has no HTTPS record
d1fgizr415o1r6.cloudfront.net has no HTTPS record
d1hemuljm71t2j.cloudfront.net has no HTTPS record
d1le29qyzha1u4.cloudfront.net has no HTTPS record
d1oqpvwii7b6rh.cloudfront.net has no HTTPS record
d1vo51ubqkiilx.cloudfront.net has no HTTPS record
d1yyh5dhdgifnx.cloudfront.net has no HTTPS record
d2908q01vomqb2.cloudfront.net has no HTTPS record
d2a6igt6jhaluh.cloudfront.net has no HTTPS record
d2cpw7vd6a2efr.cloudfront.net has no HTTPS record
d36cz9buwru1tt.cloudfront.net has no HTTPS record
d3borx6sfvnesb.cloudfront.net has no HTTPS record
d3ctxlq1ktw2nl.cloudfront.net has no HTTPS record
d3h2ozso0dirfl.cloudfront.net has no HTTPS record
d7umqicpi7263.cloudfront.net has no HTTPS record
dftu77xade0tc.cloudfront.net has no HTTPS record
dgen8gghn3u86.cloudfront.net has no HTTPS record
dk261l6wntthl.cloudfront.net has no HTTPS record

想要找個 CloudFront 有 HTTPS record 的看一下,發現好像都沒找到... 反倒是發現 CloudFront 的 distribution hostname 長度有變?

另外是看文章時意外覺得不太對,發現是這篇介紹文章裡面用的圖片出現了 Comic Sans?(Comic Neue?)

唔,滿滿的槽點...

SSL 變成 TLS 名字的由來

Lobsters 上看到「Security Standards and Name Changes in the Browser Wars」這篇 2014 的文章,裡面提供了當初加密協定 SSL 變成 TLS 的由來。

當初 Netscape 弄出 SSLv2 後變得很熱門,但也發現了不少問題,所以微軟弄出自己的 PCT 準備競爭,不過後來 SSLv3 修掉了不少問題,加上 Netscape 當時的領先地位,IE 還是有支援 SSLv3。

接下來是大老們看到可能的分歧,找到關鍵人物開會取得共識,讓 IETF 來領隊:

And we negotiated a deal where Microsoft and Netscape would both support the IETF taking over the protocol and standardizing it in an open process, which led to me editing the RFC.

不過 SSLv3 把當時已知的問題都修掉了,但為了政治上的問題,故意小修了 SSLv3 裡面的東西,然後改名成 TLS:

As a part of the horsetrading, we had to make some changes to SSL 3.0 (so it wouldn't look the IETF was just rubberstamping Netscape's protocol), and we had to rename the protocol (for the same reason). And thus was born TLS 1.0 (which was really SSL 3.1). And of course, now, in retrospect, the whole thing looks silly.

差不多是三十年前的故事了...