Yufan Xiong

AI4GC Lab

Yufan Xiong

CLI AgentsLLM Inference AccelerationKV Cache CompressionMultimodal Long-Context

About Me

Hi! I'm Yufan Xiong, a research intern at AI4GC Lab, Zhejiang University. I am also fortunate to conduct research on on-device model post-training at Taobao & Tmall Group, Alibaba. I will begin my M.S. at the School of Artificial Intelligence and Data Science (AIDS), University of Science and Technology of China (USTC) in 2026.

My research has centered on efficient inference for large language models and multimodal LLMs, especially KV cache compression and long-context acceleration — making large models faster and more memory-efficient without sacrificing accuracy. I am now moving toward CLI agents. I am open to collaborations on efficient inference and agent research.

Research Directions

  • CLI Agents — building and improving agents that operate through command-line and computer interfaces to carry out real tasks.
  • KV Cache Compression — structured eviction and merging strategies that preserve cross-layer and cross-modal information rather than treating tokens independently.
  • Long-Context Inference — extending usable context length (up to the million-token regime) while controlling memory growth and positional-encoding extrapolation, including on-device deployment.

Selected Papers