Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
-
Updated
Dec 28, 2024 - Jupyter Notebook
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
An open-sourced end-to-end VLM-based GUI Agent
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use".
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Add a description, image, and links to the gui-agent topic page so that developers can more easily learn about it.
To associate your repository with the gui-agent topic, visit your repo's landing page and select "manage topics."