About Me
Hi! I am Yuhang Liu, a graduated Master student from AI4GC Lab at Zhejiang University, advised by Prof. Shengyu Zhang.
My research interests include multimodal GUI agents, general-purpose agent construction, and agentic RL. I am especially interested in building agents that can reason, learn from interaction, operate software, and complete complex tasks across environments.
I am currently an incoming Ph.D. student in the Department of Computing at The Hong Kong Polytechnic University.
During AI4GC
During my time at AI4GC Lab, I worked on multimodal GUI agents, focusing on visual interface understanding, action grounding, and reasoning for real software environments.
My early work on InfiGUIAgent studied how to build a generalist GUI agent from raw screenshots. We used two-stage supervised fine-tuning to combine GUI understanding and grounding with hierarchical reasoning and expectation-reflection reasoning. Accepted by EACL 2026 as an Oral Presentation.
I then explored InfiGUI-R1, which reframes GUI automation as a transition from reactive acting to deliberative reasoning. The work uses spatial reasoning distillation and reinforcement learning signals for sub-goal planning and error recovery, making planning and reflection central parts of GUI-agent training.
My later work on InfiGUI-G1 focused on GUI grounding, especially the semantic-alignment bottleneck that remains after spatial alignment improves. We designed Adaptive Exploration Policy Optimization to encourage broader and more purposeful search over interface elements. Accepted by AAAI 2026 as an Oral Presentation.
Selected Papers
AAAI2026
Oral
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization
EACL2026
Oral
arXiv2025
Now
Now: Incoming Ph.D. student in the Department of Computing at The Hong Kong Polytechnic University.