diff --git a/docs/01_introduction/quick-start.mdx b/docs/01_introduction/quick-start.mdx index da166da9..9eed691f 100644 --- a/docs/01_introduction/quick-start.mdx +++ b/docs/01_introduction/quick-start.mdx @@ -105,4 +105,5 @@ To see how you can integrate the Apify SDK with popular web scraping libraries, - [Selenium](../guides/selenium) - [Crawlee](../guides/crawlee) - [Scrapy](../guides/scrapy) +- [Browser Use](../guides/browser-use) - [Running webserver](../guides/running-webserver) diff --git a/docs/03_guides/09_browser_use.mdx b/docs/03_guides/09_browser_use.mdx new file mode 100644 index 00000000..77529963 --- /dev/null +++ b/docs/03_guides/09_browser_use.mdx @@ -0,0 +1,90 @@ +--- +id: browser-use +title: Browser AI agents with Browser Use +description: Build an Apify Actor that automates a browser with an LLM agent using the Browser Use library. +--- + +import RunnableCodeBlock from '@site/src/components/RunnableCodeBlock'; + +import BrowserUseExample from '!!raw-loader!roa-loader!./code/09_browser_use.py'; + +In this guide, you'll learn how to use the [Browser Use](https://browser-use.com/) library to drive a browser with an LLM agent in your Apify Actors. + +## Introduction + +[Browser Use](https://browser-use.com/) is a Python library that lets an LLM control a real web browser. Instead of writing selectors and navigation steps by hand, you give an agent a natural-language task - such as "find the top post on Hacker News and return its title and URL" - and the agent decides which pages to open, what to click, and what to read until the task is done. + +Some of the features that make Browser Use a good fit for Apify Actors: + +- **Natural-language tasks** - Describe what you want in plain English; the agent figures out the steps. This is well suited to pages whose structure changes often or is hard to target with fixed selectors. +- **Model-agnostic** - Browser Use ships wrappers for many providers (`ChatOpenAI`, `ChatAnthropic`, `ChatGoogle`, and more), so you can pick the model that fits your task and budget. +- **Structured output** - Pass a [Pydantic](https://docs.pydantic.dev/) model as the output schema and the agent returns a validated object instead of free-form text, which maps cleanly onto an Apify dataset. +- **Real browser via CDP** - The agent drives a real Chromium over the Chrome DevTools Protocol, so JavaScript-heavy pages render just like they would for a human. +- **First-class async support** - The agent's `run` method is asynchronous, which integrates naturally with the asyncio-based Apify SDK. + +Browser Use needs only the `browser-use` package - install it with: + +```bash +pip install browser-use +``` + +## Configuring the LLM + +Browser Use needs an LLM to drive the agent. You choose a provider wrapper, give it a model name, and supply the provider's API key: + +- **`ChatOpenAI`** - OpenAI models such as `gpt-4.1-mini` or `gpt-5-mini`. Reads the key from `OPENAI_API_KEY`, or accepts it via the `api_key` argument. +- **`ChatAnthropic`** - Anthropic Claude models such as `claude-sonnet-4-5` or `claude-haiku-4-5`. Reads the key from `ANTHROPIC_API_KEY`. +- **`ChatGoogle`** - Google Gemini models such as `gemini-2.5-flash`. Reads the key from `GOOGLE_API_KEY`. + +The example Actor in this guide uses `ChatOpenAI`, but switching providers is a one-line change in `run_agent_task`. More capable models generally complete tasks in fewer steps and more reliably, while smaller models are cheaper per step. + +Keep the API key out of the Actor input and source code. The example reads it from an environment variable, which on the Apify platform you set as a [secret environment variable](https://docs.apify.com/platform/actors/development/programming-interface/environment-variables) (for example `OPENAI_API_KEY`), and locally you export in your shell. + +## Example Actor + +The following Actor runs a Browser Use agent for a single task and stores its structured result in the default dataset. By default it opens [Hacker News](https://news.ycombinator.com) and returns the title and URL of the top five posts, but the task, model, and step limit are all configurable through the Actor input. + +The whole Actor fits in a single file. A `run_agent_task` helper holds the Browser Use-specific logic - it defines the output schema and builds the LLM, browser, and agent - while the `main` coroutine handles the [Actor](https://docs.apify.com/platform/actors) lifecycle, reads the input, sets up [Apify Proxy](https://docs.apify.com/platform/proxy), runs the agent, and stores the result: + + + {BrowserUseExample} + + +A few things worth pointing out: + +- Keeping the agent setup in `run_agent_task` separates the Browser Use-specific code from the Actor's orchestration logic. `main` only decides what to read from the input and what to store. +- Passing `output_model_schema=Posts` makes the agent return a validated `Posts` instance via `history.structured_output`, so `main` can push each item straight to the dataset. Adapt the task and the `Post`/`Posts` models together to fit your own use case. +- `enable_signal_handler=False` leaves signal handling to the Actor, which manages the run's lifecycle. Without it, Browser Use would install its own handlers and interfere with a clean shutdown. +- `headless=Actor.configuration.headless` runs the browser without a visible window, which is what you want on the platform. + +## Using Apify Proxy + +Running on the Apify platform gives your agent access to [Apify Proxy](https://docs.apify.com/platform/proxy), which rotates IP addresses to avoid rate limiting and blocking. In the example above, `main` creates a proxy configuration with `Actor.create_proxy_configuration` and passes a fresh proxy URL to `run_agent_task`. + +Browser Use expects the proxy as a `ProxySettings` object with separate `server`, `username`, and `password` fields, whereas `ProxyConfiguration.new_url` returns a single URL string (for example `http://user:pass@proxy.apify.com:8000`). The `_proxy_settings` helper splits that URL into the fields Browser Use expects. To select specific proxy groups or a country, pass the relevant arguments to `Actor.create_proxy_configuration`. For more details, see the [Proxy management](../concepts/proxy-management) guide. + +## Running on the Apify platform + +Browser Use drives a real Chromium over CDP, so the Actor needs a browser binary available at runtime. The simplest way to provide one is to build on top of the [Apify Playwright base image](https://hub.docker.com/r/apify/actor-python-playwright), which already ships a browser together with all of its system-level dependencies. Browser Use discovers that browser automatically, so no extra install step is needed in the image. + +Disable Browser Use's telemetry and cloud sync inside the Actor by setting the `ANONYMIZED_TELEMETRY=false` and `BROWSER_USE_CLOUD_SYNC=false` environment variables in your Dockerfile. + +When running the Actor locally, install the browser once with the `browser-use install` command, which downloads a Chromium build together with its dependencies: + +```bash +browser-use install +``` + +Remember to provide the LLM API key in both environments - as a secret environment variable on the platform, and exported in your shell when running locally. + +## Conclusion + +In this guide, you learned how to use Browser Use in your Apify Actors. You can now drive a real browser with an LLM agent, return its results as a validated Pydantic model, route the browser through Apify Proxy, and run the whole thing on the Apify platform. See the [Actor templates](https://apify.com/templates/categories/python) to get started with your own automation tasks. If you have questions or need assistance, feel free to reach out on our [GitHub](https://github.com/apify/apify-sdk-python) or join our [Discord community](https://discord.com/invite/jyEM2PRvMU). Happy automating! + +## Additional resources + +- [Browser Use: Official documentation](https://docs.browser-use.com/) +- [Browser Use: Supported models](https://docs.browser-use.com/customize/supported-models) +- [Browser Use: Structured output](https://docs.browser-use.com/customize/agent/output-format) +- [Browser Use: GitHub repository](https://github.com/browser-use/browser-use) +- [Apify: Proxy management](https://docs.apify.com/platform/proxy) diff --git a/docs/03_guides/code/09_browser_use.py b/docs/03_guides/code/09_browser_use.py new file mode 100644 index 00000000..cd16773f --- /dev/null +++ b/docs/03_guides/code/09_browser_use.py @@ -0,0 +1,113 @@ +import asyncio +import os +from urllib.parse import urlsplit + +from browser_use import Agent, Browser, ChatOpenAI +from browser_use.browser import ProxySettings +from pydantic import BaseModel + +from apify import Actor + +# Default task, aligned with the `Posts` schema below. +DEFAULT_TASK = ( + 'Open https://news.ycombinator.com and return the title and URL ' + 'of the top 5 posts on the front page.' +) + + +class Post(BaseModel): + """A single item the agent is asked to extract.""" + + title: str + url: str + + +class Posts(BaseModel): + """The structured result returned by the agent.""" + + posts: list[Post] + + +def to_browser_use_proxy(proxy_url: str) -> ProxySettings: + """Convert an Apify Proxy URL into Browser Use `ProxySettings`.""" + parts = urlsplit(proxy_url) + return ProxySettings( + server=f'{parts.scheme}://{parts.hostname}:{parts.port}', + username=parts.username, + password=parts.password, + ) + + +async def run_agent_task( + task: str, + *, + model: str, + llm_api_key: str, + max_steps: int, + headless: bool = True, + proxy_url: str | None = None, +) -> Posts | None: + """Run a Browser Use agent for one task and return its structured output.""" + # Configure the LLM. Swap `ChatOpenAI` for another provider if needed. + llm = ChatOpenAI(model=model, api_key=llm_api_key) + + # Configure the browser, optionally routed through a proxy. + browser = Browser( + headless=headless, + proxy=to_browser_use_proxy(proxy_url) if proxy_url else None, + ) + + # `output_model_schema` returns a validated `Posts`; signals stay with the Actor. + agent = Agent( + task=task, + llm=llm, + browser=browser, + output_model_schema=Posts, + enable_signal_handler=False, + ) + + history = await agent.run(max_steps=max_steps) + return history.structured_output + + +async def main() -> None: + async with Actor: + # Read the Actor input. + actor_input = await Actor.get_input() or {} + task = actor_input.get('task', DEFAULT_TASK) + model = actor_input.get('model', 'gpt-4.1-mini') + max_steps = actor_input.get('maxSteps', 25) + + # Read the LLM API key from the environment (set it as a secret on Apify). + llm_api_key = os.environ.get('OPENAI_API_KEY') + if not llm_api_key: + raise RuntimeError('The OPENAI_API_KEY environment variable is not set.') + + # Route the browser through Apify Proxy. + proxy_configuration = await Actor.create_proxy_configuration() + proxy_url = await proxy_configuration.new_url() if proxy_configuration else None + + Actor.log.info(f'Running the agent (model={model}) for task: {task}') + + result = await run_agent_task( + task, + model=model, + llm_api_key=llm_api_key, + max_steps=max_steps, + headless=Actor.configuration.headless, + proxy_url=proxy_url, + ) + + if result is None: + Actor.log.warning('The agent did not return any structured output.') + return + + # Store each extracted item as a dataset row. + Actor.log.info(f'The agent returned {len(result.posts)} post(s); storing them.') + for post in result.posts: + Actor.log.info(f'Storing post: {post.title!r} ({post.url})') + await Actor.push_data(post.model_dump()) + + +if __name__ == '__main__': + asyncio.run(main())