I am an incoming computer science PhD student. I have been working with Professor Junchen Jiang, Dr. Yuhan Liu, and Dr. Liangcheng Yu. I am interested in building systems that can support real-world machine learning workloads. My current focus is optimizing KV caches for LLM inference.
Education
Pre-Doc MS in Computer Science
University of Chicago, Sep. 2024 - Dec. 2025
B.E. in Information Engineering
Shanghai Jiao Tong University, Sep. 2020 - Jun. 2024Class Ranking: Top 5%
Selected Publications
EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving
Shaoting Feng*, Yuhan Liu*, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang
arXiv
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference
Yuhan Liu*, Yihua Cheng*, Jiayi Yao*, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Rui Zhang, Kuntai Du, Junchen Jiang
arXiv
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse
NSDI’26
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
Shaoting Feng*, Hanchen Li*, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang
SOSP workshop BigMem’25
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang
SOSP’25
Presentations
Run Multi-Modality Models with LMCache
OnlineGIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory
- APWeb-WAIM 2024 [slides], Aug. 2024
Experience
TensorMesh, Inc. - Engineering Intern
June 2025 - May 2026
- Impact: Widely used in enterprise settings (e.g., NVIDIA, IBM Cloud). >300TB KV cache data + 1.28 billion hit tokens weekly.
- Contributed 74 commits (+11,164 / -3,251 LOC), ranking 5th in total contributions.
- Developed prefill decode disaggregation to reduce tail latency, achieving 20× faster KV cache transmission over vLLM.
- Developed dynamic CPU offloading for jointly managing GPU and CPU memory, achieving 2.29× TTFT improvement over vLLM.
- Developed multimodal KV cache offloading to accelerate image, video, and audio inference, achieving 5.49× TTFT improvement.
University of Pennsylvania - Research Intern
June 2023 - September 2023
- Advised by Prof. Vincent Liu and Dr. Liangcheng Yu
- Designed practical fairness metric for network resource allocation.
Awards
MPCS Merit-Based Scholarship
Issued by UChicago Pre-Doctoral MS Program · Sep 2024
Dennis C.C.Chan Scholarship
Issued by Shanghai Jiao Tong University · Dec 2023
Awarded to 6 outstanding undergraduate students across the university.
Shanghai Government Scholarship
Issued by Shanghai Municipal Education Commission · Dec 2022
Awared to 0.175% undergraduate and associate degree students in Shanghai.
