| layout | homepage |
|---|
Hi, I'm Boyu (苟博宇), a Ph.D. student at The Ohio State University. I'm fortunate to be co-advised by Prof. Yu Su and Prof. Huan Sun at the OSU NLP Group.
My research interests lie in Language Agents, with a specific focus on autonomous GUI agents that can use computers or browse the web.
We built the first multimodal web agent with all code open-sourced (SeeAct, Multimodal-Mind2Web). After that, we proposed building purely vision-based GUI agents that can operate as humans do and demonstrated their power despite their minimal design (UGround). We also built benchmarks to better assess cutting-edge web agents in short- to medium-horizon web agent tasks (Online-Mind2Web) and long-horizon agentic web search tasks (Mind2Web 2), introducing novel LLM-as-a-Judge and rubric-based Agent-as-a-Judge methods.
If you are interested in a research internship/collaboration, or just a short chat, feel free to email me or reach out on LinkedIn/Twitter to discuss ideas and potential collaborations.
- [June 2025] Mind2Web 2 is released! A benchmark designed specifically for agentic search systems (e.g., web agents, Deep Research systems) on realistic, long-horizon, and time-varying tasks. (Accepted to NeurIPS 2025)
- [Feb 2025] UGround is accepted to ICLR 2025 as an Oral!
- [Aug 2024] Released UGround, where we propose building entirely vision-only GUI agents under a generic framework SeeAct-V with a new SOTA GUI visual grounding model.
- [Jan 2024] Released SeeAct, a generalist web agent powered by GPT-4V. (Accepted to ICML 2024)
- [Aug 2023] Joined OSU and am honored to work in the OSU NLP Group.
{% include_relative _includes/publications.md %}
- SeeAct: An easy-to-use codebase and Python package designed for autonomous web agents. You can freely test it with any task on any live website with just one click! It is also useful for other agent work as a convenient web browsing implementation.
- GUI Agent Paper List: A comprehensive and up-to-date GUI Agent paper list.
- Lorvex: A personal vibe coding project. An AI-native todo and scheduling app. Highly recommended to try.
- Reviewer:
- Conferences & Journals: ICLR'25-'26, NeurIPS'26, UIST'25, ACL Rolling Review, IJCV, ACM IMWUT
- Workshops: LLMAgents @ ICLR'24, Computer Use Agents & Multi-Agent Systems @ ICML'25, AI Agents @ COLM'25, MTI-LLM @ NeurIPS'25, RSI & LLA (AC) @ ICLR'26
-
The Ohio State University, US
- Ph.D. in Computer Science and Engineering
- 2023.08 – Present
-
ShanghaiTech University, China
- B.Eng. in Computer Science and Technology
- 2019.08 – 2023.06
Teaching Assistant at ShanghaiTech University:
- SI152 Numerical Optimization, Fall 2022
- Instructor: Prof. Hao Wang
- Theories, algorithms, and convergence analysis of linear and nonlinear optimization.
- CS110 Computer Architecture, Spring 2022
- Instructors: Prof. Chundong Wang & Prof. Sören Schwertfeger
- Similar to UC Berkeley CS61C Great Ideas in Computer Architecture. The course homepage is here.
- CLPS1001 Intimacy Psychology, Spring 2023
- Instructor: Dr. Yaqi Cai