Dockerized Computer Use Agents with Production Ready API’s - MCP Client for Langchain - GCA
-
Updated
Dec 28, 2024 - Python
Dockerized Computer Use Agents with Production Ready API’s - MCP Client for Langchain - GCA
Ui.Vision Open-Source RPA Software with Computer Vision, OCR, Anthropic Computer Use. Selenium IDE import/export.
Open Source Generative Process Automation (i.e. Generative RPA). AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
A fork of Anthropic Computer Use that you can run on Mac computers to give Claude and other AI models autonomous access to your computer.
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
An open-sourced end-to-end VLM-based GUI Agent
Desktop app powered by Claude’s computer use capability to control your computer
A framework to enable autonomous android and computer use using any LLM (local or remote)
A general AI agent framework that can be adapted to various tasks and environments.
This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use".
✨ Use natural language to control your browser, powered by LLM and playwright
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
try Computer Use on your Mac with a few clicks
Curated resources about automated GUI computer-use via LLMs. Highly opinionated, focus is on quality vs quantity.
🧩 Proposal to allow user scripts like "expand comments", "hide popups", "fill out this form", etc. to be reusable across pure browser environments, puppeteer, playwright, extensions, AI tools, and many other contexts with minimal adjustment.
Claude Computer Use API with Ubuntu that enables Claude to interact with and automate desktop environments. It allows seamless command execution through VNC or noVNC, enhancing productivity with secure, containerized workflows with Github Codespaces.
🤖 LLM-powered computer control through local and Docker environments. Features VNC integration, automated interactions, and a chat interface for natural language system control.
Give a Multi-Modal LLM full access of your linux computer
Add a description, image, and links to the computer-use topic page so that developers can more easily learn about it.
To associate your repository with the computer-use topic, visit your repo's landing page and select "manage topics."