RDAP 取代 WHOIS 的近況

看到「The current state of RDAP」這則在講 RDAP 取代 WHOIS 的情況,裡面這張說明了很多事情:

在官方宣佈要退休 WHOIS 加上 RDAP 本身又好爬很多 (資料又比 WHOIS 多?),這個趨勢好像不太意外...

現在應該是剩下 ccTLD 的部分要推動?

Netflix 用 CC 授權放出的自製/合作內容

看到「NETFLIX OPEN CONTENT (via)」這個,NetflixCC BY 4.0 放出七部影片的原始數位檔案讓人可以研究,像是 codec 的開發。

這個站有陣子了,從 Internet Archive 上可以看到 2020 年的時候就上了:「https://web.archive.org/web/20200718172531/https://opencontent.netflix.com/」。

其實第一個注意到的是一堆連結都掛上 Google 的 tracking,像是這樣的連結:

https://www.google.com/url?q=https%3A%2F%2Fcreativecommons.org%2Flicenses%2Fby%2F4.0%2Flegalcode&sa=D&sntz=1&usg=AOvVaw3DDX6ldzWtAO5wOs5KkByf

Hacker News 上有人解釋這是因為這個站放在 Blogger 上,而 tracking 的功能預設是開的:(id=46432873)

It's because their blog is hosted on blogger.com (yeah, weird decision), which is owned by Google and does that by default.

看到 El Fuente (2013) 這個畫面有點印象,應該是之前在測 codec & 調整 FFmpeg 參數時用到 VMAF 的關係...

MongoDB 這次耶誕節前爆出來的 secret leak 問題

在「Mongobleed PoC Exploit Tool Released for MongoDB Flaw that Exposes Sensitive Data」這邊看到的,使用 zlib 時沒有正確處理長度資訊導致的問題。

官方的 ticket 在「Make minimally sized buffers for uncompressed Messages」這邊,裡面有連結到 patch,可以看到對應的修改只有一行,像是 8.x 版的 029d8f99bf1e828b5327946b9c820bf493f466f1 這個:

-    return {output.length()};
+    return {length};

意外的看到被提醒上面的 counterHitDecompress(input.length(), output.length()); 也應該要一起修改,但從 master branch 的 src/mongo/transport/message_compressor_zlib.cpp 看起來沒理會?

Uber 開始賣搭乘者的搭乘資料給外部

看到 Uber 賣搭乘資料的消息:「Uber's latest play for ad dollars: turning data about your trips and takeout into insights for marketers (via)」,Business Insider 有可能被 paywall,這是備份連結

號稱去識別:

It has partnered with LiveRamp to aggregate users' data without revealing their identities.

但不管怎麼號稱去識別,都是可以被組合回來的,2006 年就有兩個經典案例,一個是 AOL search log release,可以看到 AOL 提供的搜尋記錄被找出本人,而 Netflix Prize 這邊則是透過交叉比對 IMDb 的資料找出本人。

開賣後接下來應該就會是訴訟戰了,比起 2006 年的法律,現在歐洲有 GDPR,美國至少在加州 (Uber 的總部) 有 CCPA,加上這次是直接以營利為目的,打起官司應該會更直接碰撞:

Uber has said its ad business is on track to generate $1.5 billion in revenue this year.

IBM 買 Confluent

前幾天有印象看到小道消息說 Confluent 想賣掉,剛剛看到「IBM to Acquire Confluent (via)」這個消息,以 IBM 這幾年對 open source project 的態度,不算是什麼好消息...

Apache Kafka 官方版本最大的問題就在 packaging & documentation,而 Confluent 版算是降低了不少這個問題,但以後就變得很不確定了。

在 streaming 這塊沒有太多其他可以打的對手,NATS 勉強還行,結果剛好就看到 Jespen 在評價 NATS 的新文章:「NATS 2.12.1」,還是看得出來成熟度有差...

Confluent 想賣公司?

Lobsters 上看到「Exclusive: Data streaming software maker Confluent explores sale, sources say」這個,有報導說 Confluent 想賣公司?這家是 Apache Kafka 後面主力的公司。

看起來是已經有買家有興趣了:

The software provider is working with an investment bank on the sale process, which is in its early stages and was instigated after both private equity firms and other technology companies expressed their interest to the company in buying it, the sources said.

2021 年 IPO 的,想賣掉的原因不確定...

zlib 的視覺化互動

看到「flateview (via)」這個有趣的網站,在網站上面可以自由的塞資料進去,然後他會解釋 zlib 壓出來的東西所代表的意義。

我拿維基百科英文版的 Kalafina 條目,第一句丟進去:

Kalafina (カラフィナ) is a Japanese vocal group formed by composer Yuki Kajiura in 2007, mainly to perform theme songs for the anime The Garden of Sinners, but later expanded to include many other theme songs for other anime shows and films including the Puella Magi Madoka Magica, Black Butler, The Heroic Legend of Arslan and Fate/stay night franchises.

可以從結果看到一些有趣的東西,像是前面包括了 Huffman coding 的資訊,後面則是出現了利用 reference 的 LZ77

課堂上如果有教到這塊的話,是個還蠻不錯的學習工具?

Anthropic (Claude 系列的服務) 在 2025/09/28 後預設會拿你的資料訓練

Anthropic 宣佈要開始使用你的資料訓練 AI model 了:「Updates to Consumer Terms and Privacy Policy」(via)。

包括了所有個人版本,無論是否有付費,預設都會拿你的資料去訓練:

These updates apply to users on our Claude Free, Pro, and Max plans, including when they use Claude Code from accounts associated with those plans. They do not apply to services under our Commercial Terms, including Claude for Work, Claude Gov, Claude for Education, or API use, including via third parties such as Amazon Bedrock and Google Cloud’s Vertex AI.

登入系統後會跳出通知,可以 opt-out 關掉,我已經關掉了,所以拿官方的圖:

之後在 Privacy 頁面上可以看到對應的設定:

Tesla 在車禍刻意隱瞞資料

從「Tesla said it didn't have key data in a fatal crash, then a hacker found it (washingtonpost.com)」這則裡看到的,原文有 paywall,在「Tesla said it didn’t have key data in a fatal crash. Then a hacker found it.」這邊可以看到全文。

2019 年的車禍 Tesla 說他們沒有車禍當時的資料:

Years after a Tesla driver using Autopilot plowed into a young Florida couple in 2019, crucial electronic data detailing how the fatal wreck unfolded was missing. The information was key for a wrongful death case the survivor and the victim’s family were building against Tesla, but the company said it didn’t have the data.

但後來被發現這是刻意「unlink」的:

Immediately after the wreck at 9:14 p.m. on April 25, 2019, the crucial data detailing how it unfolded was automatically uploaded to the company’s servers and stored in a vast central database, according to court documents. Tesla’s headquarters soon sent an automated message back to the car confirming that it had received the collision snapshot.

Moments later, court records show, the data was just as automatically “unlinked” from the 2019 Tesla Model S at the scene, meaning the local copy was marked for deletion, a standard practice for Teslas in such incidents, according to court testimony.

文章後面的描述比較清楚:在 Tesla 車輛偵測車禍後,相關的資料會馬上被上傳到 Tesla,然後 Tesla 的中央系統會通知車輛本身刪除車輛上面的記錄 (也就是上面提到的 unlink):

Inside a Starbucks near the Miami airport, the plaintiffs’ attorneys watched as greentheonly fired up his ThinkPad computer and plugged in a flash drive containing a forensic copy of the Autopilot unit’s contents. Within minutes, he found key data that was marked for deletion — along with confirmation that Tesla had received the collision snapshot within moments of the crash — proving the critical information should have actually been accessible all along.

提告的家屬找外部人士從取得的資料中恢復了資料後,Tesla 改口說他們的伺服器上有資料:

Then a self-described hacker, enlisted by the plaintiffs to decode the contents of a chip they recovered from the vehicle, found it while sipping a Venti-size hot chocolate at a South Florida Starbucks. Tesla later said in court that it had the data on its own servers all along.

喔...

Kagi 訂閱超過 50k 用戶

Hacker News 上看到「Kagi Reaches 50k Users (kagi.com)」的消息,這個算是當初 Kagi 自己定的目標:

前陣子有測試 DuckDuckGoBrave Search,品質有差不少 (好很多),不過目前有不少人猜測應該是還沒有被 SEO spammer 針對,畢竟現在還是很小眾的服務。

維基百科上剛好有提到三月底的時候是 43k 左右:

Kagi had around 43,403 subscribed members as of March 28, 2025 and 845,200 searches were made that day.

過了兩個多月漲到 50k 了,直接拿 unlimited 算的話大約是 $500k/mo 的收入 (有 $5/mo、$10/mo 以及 $25/mo 的方案),記得在 20k 的時候他們有說當時的收入只剛好付 infrastructure 的部分:「Kagi 訂閱數量過兩萬」。

上個禮拜的「Kagi status update: First three years」有提到 50k 後有公佈:

As of writing this, we are at almost 50,000 customers! You know what that means - there will soon be a Kagi surprise!

就等這幾天的消息?