Resume | Shaoting Feng

About me

I am an incoming computer science PhD student. I have been working with Professor Junchen Jiang, Dr. Yuhan Liu, and Dr. Liangcheng Yu. I am interested in building systems that can support real-world machine learning workloads. My current focus is optimizing KV caches for LLM inference.

Education

Pre-Doc MS in Computer Science

University of Chicago, Sep. 2024 - Dec. 2025

B.E. in Information Engineering

Shanghai Jiao Tong University, Sep. 2020 - Jun. 2024Class Ranking: Top 5%

Selected Publications

For a complete list of publications, please visit my Google Scholar profile.

EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving

Shaoting Feng*, Yuhan Liu*, Hanchen Li, Xiaokun Chen, Samuel Shen, Kuntai Du, Zhuohan Gu, Rui Zhang, Yuyang Huang, Yihua Cheng, Jiayi Yao, Qizheng Zhang, Ganesh Ananthanarayanan, Junchen Jiang

arXiv

pdf

LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference

Yuhan Liu*, Yihua Cheng*, Jiayi Yao*, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Rui Zhang, Kuntai Du, Junchen Jiang

arXiv

pdf | codes

DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving

Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse

NSDI’26

pdf

AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving

Shaoting Feng*, Hanchen Li*, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang

SOSP workshop BigMem’25

pdf | codes | slides

METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation

Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Shaoting Feng, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang

SOSP’25

pdf | poster

Presentations

Run Multi-Modality Models with LMCache

SIGCOMM 2025 Full-day Tutorial: Networking for Stateful LLM Inference [slides] [video], Sep. 2025

Online

GIPUT: Maximizing Photo Coverage Efficiency for UAV Trajectory

APWeb-WAIM 2024 [slides], Aug. 2024

Jinhua, Zhejiang, China

Experience

TensorMesh, Inc. - Engineering Intern

June 2025 - May 2026

Impact: Widely used in enterprise settings (e.g., NVIDIA, IBM Cloud). >300TB KV cache data + 1.28 billion hit tokens weekly.
Contributed 74 commits (+11,164 / -3,251 LOC), ranking 5th in total contributions.
Developed prefill decode disaggregation to reduce tail latency, achieving 20× faster KV cache transmission over vLLM.
Developed dynamic CPU offloading for jointly managing GPU and CPU memory, achieving 2.29× TTFT improvement over vLLM.
Developed multimodal KV cache offloading to accelerate image, video, and audio inference, achieving 5.49× TTFT improvement.

University of Pennsylvania - Research Intern

June 2023 - September 2023

Advised by Prof. Vincent Liu and Dr. Liangcheng Yu
Designed practical fairness metric for network resource allocation.

Awards

MPCS Merit-Based Scholarship

Issued by UChicago Pre-Doctoral MS Program · Sep 2024

Dennis C.C.Chan Scholarship

Issued by Shanghai Jiao Tong University · Dec 2023

Awarded to 6 outstanding undergraduate students across the university.

Shanghai Government Scholarship

Issued by Shanghai Municipal Education Commission · Dec 2022

Awared to 0.175% undergraduate and associate degree students in Shanghai.