Hey!

Hi, I’m FDKevin, a software engineer building practical tools and writing online.

Simply, you can call me FDKevin /ˌfʌk dɔ:g 'kevin/ .

Find me on

Github

Posts

21 Mar 2026
New Year, New Blog
4 Mar 2026
XTEINK X4
16 Aug 2025
Vibe Coding 指南
23 Dec 2022
睡不着
17 Oct 2022
组网方案
18 Feb 2022
新的博客
17 Nov 2018
Understanding Traditional Chinese Medicine
17 Nov 2018
China Real Estate Economics
17 Nov 2018
Power Supplies And Sound Quality
17 Nov 2018
Which Computer Should You Choose

Notes

TurboQuant
26/03/26, 00:00

We introduce a set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines.

From TurboQuant: Redefining AI efficiency with extreme compression by Google Research.

The practical bit is the combination: PolarQuant handles most of the compression, then QJL spends a single residual bit to correct bias. Google claims lossless-or-near-lossless KV-cache compression on long-context benchmarks, at least 6x memory reduction, and up to 8x attention-logit speedup on H100s.

It seems that graphics cards and memory prices can finally drop.
Attention Residuals
19/03/26, 00:00

Residual connections with PreNorm are standard in modern LLMs, yet they accumulate all layer outputs with fixed unit weights. This uniform aggregation causes uncontrolled hidden-state growth with depth, progressively diluting each layer’s contribution.

From Attention Residuals by Kimi Team.

Impressive.
ThinkingOnLLM
21/02/23, 16:00

近期几个大型机器学习模型似乎证明了，现有的科研项目已经呈现出了无法承担的成本；未来的模式可能更接近，大型资本通过海量资源先形成产品研发和应用，科研机构再顺着挖掘基础理论

新时代的贝尔实验室？

也有可能是国家集中力量，联合各家科研机构和企业攻关

最新的ChatGPT模型，运行时大概要10张A100，800g显存，显卡费用就要几十万

好像药品已经很久之前就进入这个状态了？

简化科研市场模型：产品的研发周期逐渐变长，成本与风险逐渐增加，投资回报周期变长。小型企业开始失去研发成本，

Hey!

Posts

Notes

TurboQuant

Attention Residuals

ThinkingOnLLM