I am an incoming computer science PhD student. I have been working with Professor Junchen Jiang, Dr. Yuhan Liu, and Dr. Liangcheng Yu. I am interested in building systems that can support real-world machine learning workloads. My current focus is optimizing KV caches for LLM inference.

Education


Selected Publications

For a complete list of publications, please visit my Google Scholar profile.

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving

Shaoting Feng*, Yuhan Liu*, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang

arXiv

pdf


LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

Yuhan Liu*, Yihua Cheng*, Jiayi Yao*, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Rui Zhang, Kuntai Du, Junchen Jiang

arXiv

pdf | codes


DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse

NSDI’26

pdf


AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving

Shaoting Feng*, Hanchen Li*, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang

SOSP workshop BigMem’25

pdf | codes | slides


METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang

SOSP’25

pdf | poster


Presentations

Run Multi-Modality Models with LMCache

  • SIGCOMM 2025 Full-day Tutorial: Networking for Stateful LLM Inference [slides] [video], Sep. 2025
Online


GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

  • APWeb-WAIM 2024 [slides], Aug. 2024
Jinhua, Zhejiang, China


Experience

TensorMesh, Inc. - Engineering Intern

June 2025 - May 2026

  • Impact: Widely used in enterprise settings (e.g., NVIDIA, IBM Cloud). >300TB KV cache data + 1.28 billion hit tokens weekly.
  • Contributed 74 commits (+11,164 / -3,251 LOC), ranking 5th in total contributions.
  • Developed prefill decode disaggregation to reduce tail latency, achieving 20× faster KV cache transmission over vLLM.
  • Developed dynamic CPU offloading for jointly managing GPU and CPU memory, achieving 2.29× TTFT improvement over vLLM.
  • Developed multimodal KV cache offloading to accelerate image, video, and audio inference, achieving 5.49× TTFT improvement.

Awards

MPCS Merit-Based Scholarship

Issued by UChicago Pre-Doctoral MS Program · Sep 2024


Dennis C.C.Chan Scholarship

Issued by Shanghai Jiao Tong University · Dec 2023

Awarded to 6 outstanding undergraduate students across the university.


Shanghai Government Scholarship

Issued by Shanghai Municipal Education Commission · Dec 2022

Awared to 0.175% undergraduate and associate degree students in Shanghai.