Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 35 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ pip install langgraph-cua

## Quickstart

This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent. To use LangGraph CUA, you'll need both OpenAI and Scrapybara API keys.
This project by default uses [Scrapybara](https://scrapybara.com/) for accessing a virtual machine to run the agent, and [OpenRouter](https://openrouter.ai/) for the LLM (using Grok model). To use LangGraph CUA, you'll need both an API key and Scrapybara API key.

```bash
export OPENAI_API_KEY=<your_api_key>
export SCRAPYBARA_API_KEY=<your_api_key>
export OPENAI_API_KEY=<your_openrouter_api_key>
export SCRAPYBARA_API_KEY=<your_scrapybara_api_key>
```

Then, create the graph by importing the `create_cua` function from the `langgraph_cua` module.
Expand Down Expand Up @@ -87,6 +87,36 @@ The above example will invoke the graph, passing in a request for it to do some

You can find more examples inside the [`examples` directory](./examples/).

## LLM Providers

This library supports multiple LLM providers through OpenAI-compatible APIs:

### OpenAI (Default)
The library works with OpenAI's API directly. Set your OpenAI API key:

```bash
export OPENAI_API_KEY=<your_openai_api_key>
```

### OpenRouter
The library also supports [OpenRouter](https://openrouter.ai/) as an alternative provider, offering access to various models including Grok. The current implementation uses OpenRouter by default with the `x-ai/grok-4.1-fast:free` model.

To use OpenRouter, set the following environment variables:

```bash
export OPENAI_API_KEY=<your_openrouter_api_key>
export OPENAI_BASE_URL=https://openrouter.ai/api/v1
```

Or use the dedicated OpenRouter key:

```bash
export OPENROUTER_API_KEY=<your_openrouter_api_key>
```

> [!NOTE]
> The library automatically detects and uses OpenRouter API keys. Unit tests are available to verify OpenRouter integration.

## How to customize

The `create_cua` function accepts a few configuration parameters. These are the same configuration parameters that the graph accepts, along with `recursion_limit`.
Expand All @@ -97,7 +127,7 @@ You can either pass these parameters when calling `create_cua`, or at runtime wh

- `scrapybara_api_key`: The API key to use for Scrapybara. If not provided, it defaults to reading the `SCRAPYBARA_API_KEY` environment variable.
- `timeout_hours`: The number of hours to keep the virtual machine running before it times out.
- `zdr_enabled`: Whether or not Zero Data Retention is enabled in the user's OpenAI account. If `True`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `False`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `False`.
- `zdr_enabled`: Whether or not Zero Data Retention is enabled. If `True`, the agent will not pass the `previous_response_id` to the model, and will always pass it the full message history for each request. If `False`, the agent will pass the `previous_response_id` to the model, and only the latest message in the history will be passed. Default `False`.
- `recursion_limit`: The maximum number of recursive calls the agent can make. Default is 100. This is greater than the standard default of 25 in LangGraph, because computer use agents are expected to take more iterations.
- `auth_state_id`: The ID of the authentication state. If defined, it will be used to authenticate with Scrapybara. Only applies if 'environment' is set to 'web'.
- `environment`: The environment to use. Default is `web`. Options are `web`, `ubuntu`, and `windows`.
Expand Down Expand Up @@ -189,7 +219,7 @@ instance.modify_auth(auth_state_id="your_existing_auth_state_id", name="renamed_

## Zero Data Retention (ZDR)

LangGraph CUA supports Zero Data Retention (ZDR) via the `zdr_enabled` configuration parameter. When set to true, the graph will _not_ assume it can use the `previous_message_id`, and _all_ AI & tool messages will be passed to the OpenAI on each request.
LangGraph CUA supports Zero Data Retention (ZDR) via the `zdr_enabled` configuration parameter. When set to true, the graph will _not_ assume it can use the `previous_message_id`, and _all_ AI & tool messages will be passed to the LLM provider (OpenAI or OpenRouter) on each request.

## Development

Expand Down
8 changes: 8 additions & 0 deletions langgraph_cua/langgraph-mcp.code-workspace
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"folders": [
{
"path": "../../../../.."
}
],
"settings": {}
}
40 changes: 34 additions & 6 deletions langgraph_cua/nodes/call_model.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import json
import os
from typing import Any, Dict, Optional, Union

from langchain_core.messages import AIMessageChunk, SystemMessage
Expand Down Expand Up @@ -70,15 +72,40 @@ async def call_model(state: CUAState, config: RunnableConfig) -> Dict[str, Any]:
previous_response_id = messages[-2].response_metadata["id"]

llm = ChatOpenAI(
model="computer-use-preview",
model_kwargs={"truncation": "auto", "previous_response_id": previous_response_id},
model="x-ai/grok-4.1-fast:free",
openai_api_base="https://openrouter.ai/api/v1",
openai_api_key=os.getenv("OPENAI_API_KEY"),
max_tokens=4000,
)

tool = {
"type": "computer_use_preview",
"display_width": DEFAULT_DISPLAY_WIDTH,
"display_height": DEFAULT_DISPLAY_HEIGHT,
"environment": get_openai_env_from_state_env(environment),
"type": "function",
"function": {
"name": "computer_use",
"description": "Perform actions on the computer such as clicking, typing, scrolling, etc.",
"parameters": {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["click", "double_click", "drag", "keypress", "move", "screenshot", "wait", "scroll", "type"],
"description": "The type of action to perform"
},
"x": {"type": "number", "description": "X coordinate for mouse actions"},
"y": {"type": "number", "description": "Y coordinate for mouse actions"},
"text": {"type": "string", "description": "Text to type"},
"button": {"type": "string", "description": "Mouse button (left, right, middle)"},
"keys": {"type": "array", "items": {"type": "string"}, "description": "Keys to press"},
"path": {"type": "array", "items": {"type": "object", "properties": {"x": {"type": "number"}, "y": {"type": "number"}}}, "description": "Path for drag action"},
"scroll_x": {"type": "number", "description": "Horizontal scroll amount"},
"scroll_y": {"type": "number", "description": "Vertical scroll amount"},
"environment": {"type": "string", "description": "Environment type"},
"display_width": {"type": "number", "description": "Display width"},
"display_height": {"type": "number", "description": "Display height"},
},
"required": ["action"]
}
}
}
llm_with_tools = llm.bind_tools([tool])

Expand All @@ -100,4 +127,5 @@ async def call_model(state: CUAState, config: RunnableConfig) -> Dict[str, Any]:

return {
"messages": response,
"tool_outputs": response.additional_kwargs.get("tool_calls", []),
}
29 changes: 19 additions & 10 deletions langgraph_cua/nodes/take_computer_action.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import json
import time
from typing import Any, Dict, Optional

from langchain_core.messages import AnyMessage, ToolMessage
from langchain_core.runnables import RunnableConfig
from langgraph.config import get_stream_writer
from openai.types.responses.response_computer_tool_call import ResponseComputerToolCall
from scrapybara.types import ComputerResponse, InstanceGetStreamUrlResponse

from ..types import CUAState, get_configuration_with_defaults
Expand Down Expand Up @@ -49,14 +49,25 @@ def take_computer_action(state: CUAState, config: RunnableConfig) -> Dict[str, A
"""
message: AnyMessage = state.get("messages", [])[-1]
assert message.type == "ai", "Last message must be an AI message"
tool_outputs = message.additional_kwargs.get("tool_outputs")
tool_calls = message.additional_kwargs.get("tool_outputs")

if not is_computer_tool_call(tool_outputs):
if not is_computer_tool_call(tool_calls):
# This should never happen, but include the check for proper type safety.
raise ValueError("Cannot take computer action without a computer call in the last message.")

# Cast tool_outputs as list[ResponseComputerToolCall] since is_computer_tool_call is true
tool_outputs: list[ResponseComputerToolCall] = tool_outputs
# Find the computer use call
computer_call = None
for call in tool_calls:
if call.get("function", {}).get("name") == "computer_use":
computer_call = call
break

if not computer_call:
raise ValueError("No computer use call found")

args = json.loads(computer_call["function"]["arguments"])
action = args
call_id = computer_call["id"]

instance_id = state.get("instance_id")
if not instance_id:
Expand Down Expand Up @@ -89,13 +100,11 @@ def take_computer_action(state: CUAState, config: RunnableConfig) -> Dict[str, A
writer = get_stream_writer()
writer({"stream_url": stream_url})

output = tool_outputs[-1]
action = output.get("action")
tool_message: Optional[ToolMessage] = None

try:
computer_response: Optional[ComputerResponse] = None
action_type = action.get("type")
action_type = action.get("action")

if action_type == "click":
computer_response = instance.computer(
Expand Down Expand Up @@ -152,12 +161,12 @@ def take_computer_action(state: CUAState, config: RunnableConfig) -> Dict[str, A
tool_message = {
"role": "tool",
"content": [output_content],
"tool_call_id": output.get("call_id"),
"tool_call_id": call_id,
"additional_kwargs": {"type": "computer_call_output"},
}
except Exception as e:
print(f"\n\nFailed to execute computer call: {e}\n\n")
print(f"Computer call details: {output}\n\n")
print(f"Computer call details: {computer_call}\n\n")

return {
"messages": tool_message if tool_message else None,
Expand Down
2 changes: 1 addition & 1 deletion langgraph_cua/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,4 @@ def is_computer_tool_call(tool_outputs: Any) -> bool:
if not tool_outputs or not isinstance(tool_outputs, list):
return False

return any(output.get("type") == "computer_call" for output in tool_outputs)
return any(call.get("function", {}).get("name") == "computer_use" for call in tool_outputs)
3 changes: 2 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ dependencies = [
"langgraph>=0.3.17,<0.4.0",
"langchain-core>=0.3.46,<0.4.0",
"scrapybara>=2.4.1,<3.0.0",
"langchain-openai>=0.3.10,<0.4.0"
"langchain-openai>=0.3.10,<0.4.0",
"pdm>=2.26.2",
]

[dependency-groups]
Expand Down
90 changes: 90 additions & 0 deletions tests/unit/test_openrouter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import os
import pytest
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

# Load environment variables
load_dotenv()


def test_openrouter_initialization():
"""Test that ChatOpenAI can be initialized with OpenRouter configuration."""
# Test that we can create a ChatOpenAI instance with OpenRouter settings
llm = ChatOpenAI(
model="x-ai/grok-4.1-fast:free",
openai_api_base="https://openrouter.ai/api/v1",
openai_api_key=os.getenv("OPENAI_API_KEY"),
max_tokens=1000,
)

# Verify the instance was created successfully
assert llm is not None
assert llm.model_name == "x-ai/grok-4.1-fast:free"
assert llm.openai_api_base == "https://openrouter.ai/api/v1"


@pytest.mark.asyncio
async def test_openrouter_basic_call():
"""Test a basic API call to OpenRouter (requires valid API key)."""
# Check for OpenRouter API key (prefer OPENROUTER_API_KEY, fallback to OPENAI_API_KEY if it looks like OpenRouter key)
api_key = os.getenv("OPENROUTER_API_KEY") or os.getenv("OPENAI_API_KEY")
if not api_key or not api_key.startswith("sk-or-v1-"):
pytest.skip("Valid OpenRouter API key not found in OPENROUTER_API_KEY or OPENAI_API_KEY environment variables")

llm = ChatOpenAI(
model="x-ai/grok-4.1-fast:free",
openai_api_base="https://openrouter.ai/api/v1",
openai_api_key=api_key,
max_tokens=100,
)

# Test a simple message
messages = [{"role": "user", "content": "Hello, can you respond with just 'OpenRouter test successful'?"}]

try:
response = await llm.ainvoke(messages)
assert response is not None
assert hasattr(response, 'content')
assert len(response.content) > 0
# Check that the response contains expected text (case insensitive)
assert "openrouter" in response.content.lower() or "successful" in response.content.lower()
except Exception as e:
# If the API call fails due to invalid key or other issues, that's expected
# The test is mainly to verify the integration setup works
pytest.fail(f"OpenRouter API call failed: {e}")


def test_openrouter_with_tools():
"""Test that ChatOpenAI can be configured with tools for OpenRouter."""
api_key = os.getenv("OPENAI_API_KEY")
if not api_key:
pytest.skip("OPENAI_API_KEY environment variable not set")

llm = ChatOpenAI(
model="x-ai/grok-4.1-fast:free",
openai_api_base="https://openrouter.ai/api/v1",
openai_api_key=api_key,
max_tokens=1000,
)

# Define a simple tool
tool = {
"type": "function",
"function": {
"name": "test_tool",
"description": "A test tool for OpenRouter integration",
"parameters": {
"type": "object",
"properties": {
"message": {"type": "string", "description": "Test message"}
},
"required": ["message"]
}
}
}

# Bind tools
llm_with_tools = llm.bind_tools([tool])

# Verify the instance was created
assert llm_with_tools is not None
Loading