Yufan Xiong

About Me

Hi! I'm Yufan Xiong, a research intern at AI4GC Lab, Zhejiang University. I am also fortunate to conduct research on on-device model post-training at Taobao & Tmall Group, Alibaba. I will begin my M.S. at the School of Artificial Intelligence and Data Science (AIDS), University of Science and Technology of China (USTC) in 2026.

My research has centered on efficient inference for large language models and multimodal LLMs, especially KV cache compression and long-context acceleration — making large models faster and more memory-efficient without sacrificing accuracy. I am now moving toward CLI agents. I am open to collaborations on efficient inference and agent research.

Research Directions

CLI Agents — building and improving agents that operate through command-line and computer interfaces to carry out real tasks.
KV Cache Compression — structured eviction and merging strategies that preserve cross-layer and cross-modal information rather than treating tokens independently.
Long-Context Inference — extending usable context length (up to the million-token regime) while controlling memory growth and positional-encoding extrapolation, including on-device deployment.

About Me

Research Directions

Selected Papers

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction