Action Browser is a portable agent skill for controlling a real browser through ActionBook. It gives an agent a repeatable workflow for opening pages, clicking, filling forms, reading page state, handling login-dependent sites, and exporting structured web data.
It is designed for tasks that need a real browser context instead of plain HTTP fetching, especially when the user's Chrome cookies, login state, extensions, or existing tabs matter.
sessionis the ActionBook browser container. It owns browser mode, shared cookies, and the tab set.tabis one page inside that session. Clicks, fills, snapshots, and exports run against a specific tab.- In the current extension-mode first pass, treat one Chrome-backed session plus multiple task tabs as the default operating model.
- Treat that operating model as valid only after a second CLI command can still access the same session. A one-shot
browser startsuccess is not enough. - For task flows, treat
scripts/actionbook_session.pyas the only supported session/tab lifecycle entrypoint. Keep rawactionbook browser start/new-tab/list-tabs/close-tabfor diagnostics and low-level debugging only. - Do not model scheduler concurrency as "one task = one extension session" unless the environment has proved that multiple extension sessions are stable.
Action Browser wraps ActionBook browser automation into a skill-oriented workflow. It provides:
- Browser session bootstrap and health checks
- Stable ref-based page operations through ActionBook snapshots
- Login-aware workflows for sites that need the user's Chrome session
- Long-running workflow tracking and interruption handling
- Site-specific extraction helpers for social, video, and content platforms
- Webpage-to-Markdown extraction using a browser-rendered page
The skill favors observable browser state over fixed sleeps. After each operation it checks URL, title, visible elements, list counts, or extracted payloads before continuing.
- Real Browser Control: operate local or Chrome extension browser sessions through ActionBook.
- Extension Mode First: reuse the user's Chrome login state when a task depends on cookies or authenticated pages.
- Session Recovery: reuse healthy sessions, open a new tab when possible, and rebuild only when the session is invalid.
- Stable Interaction Model: use snapshot refs such as
@e3and@e7instead of fragile remembered selectors. - Long-Running Runs: start crawls and exports through a tracked process wrapper so they can be stopped cleanly.
- Platform Workflows: includes helpers for Xiaohongshu, X, Weibo, Douban, Zhihu, YouTube, Douyin, and Bilibili.
- Markdown Capture: extract rendered webpages into Markdown with metadata.
Clone or copy this repository into your agent's skills directory:
git clone https://github.com/VintLin/action-browser.git <skills-dir>/action-browserThen invoke the skill by asking your agent to use 浏览器操作 or action-browser for browser tasks.
If you already have a local copy:
mkdir -p <skills-dir>/action-browser
rsync -a ./ <skills-dir>/action-browser/For a new user, the safe default is to finish the extension-mode setup before using most site adapters in this skill. Many workflows in this repo depend on the user's existing Chrome login state, so installing the skill files alone is not enough.
Run this path in order:
npm install -g @actionbookdev/cli
actionbook --version
actionbook setup --browser extension --non-interactive
cd "<skills-dir>/action-browser"
unzip -o actionbook-extension-v0.5.0.zipNo API key is required for this initialization flow. Do not block setup on ACTIONBOOK_API_KEY.
Then install the browser extension in Chrome from the unpacked skill-local directory:
- Open
chrome://extensions/ - Turn on Developer mode
- Click "Load unpacked"
- Select
<skills-dir>/action-browser/actionbook-extension-v0.5.0
The agent cannot directly complete the Chrome extension installation inside the user's browser profile. It can prepare the zip, unpack the directory, and tell the user what to click, but the final Load unpacked step must be completed by the user in Chrome.
This skill ships a pinned extension bundle at actionbook-extension-v0.5.0.zip. Use that bundle as the default source for Chrome installation. Do not assume the extension currently bundled inside the installed CLI is compatible with this skill. The pinned extension manifest version is 0.5.0, and that is the version new users should load into Chrome.
The CLI is still required for the actionbook runtime, but the Chrome extension source should come from the skill-local zip above unless you are explicitly debugging a different CLI build.
Verify the setup before using the skill:
actionbook extension status --json
python3 scripts/actionbook_session.py ensure \
--session init-check \
--url "https://example.com" \
--jsonDo not start site workflows until actionbook extension status --json shows bridge: listening and extension_connected: true.
- An agent runtime with local skill support
- Python 3.10+
- Node.js 18+
- Google Chrome for extension-mode workflows
- ActionBook CLI:
npm install -g @actionbookdev/cliFor a first-time setup, run:
actionbook setupIf the task needs the user's logged-in Chrome session, configure extension mode and load the skill-bundled Chrome extension:
actionbook setup --browser extension --non-interactive
cd "<skills-dir>/action-browser"
unzip -o actionbook-extension-v0.5.0.zipThe initialization guide is in references/initialization.md.
Ask your agent to use the skill:
使用浏览器操作打开 https://example.com,并读取页面主要内容。
The skill will bootstrap an ActionBook session, open or reuse a browser tab, take a snapshot, and operate on the latest page refs.
python3 scripts/actionbook_session.py ensure \
--session task-browser \
--url "https://example.com" \
--jsonThe script returns a usable session_id and tab_id. With an explicit --session, it reuses only that same session, then opens a new tab in that session, and only creates that named session as a fallback. Cross-session adoption is reserved for the default bootstrap session only.
Use one session when tasks need the same logged-in browser context, then give each subtask its own explicit tab id:
python3 scripts/actionbook_session.py ensure --session research --url "https://example.com" --json
python3 scripts/actionbook_session.py new-tab --session research --url "https://example.com/a" --json
python3 scripts/actionbook_session.py new-tab --session research --url "https://example.com/b" --json
python3 scripts/actionbook_session.py list-tabs --session research --jsonPass the returned tab_id into each workflow with --tab. Do not let parallel tasks share one implicit current tab. For scheduler-style fan-out, prefer one stable extension session and one tab lease per task.
Use the run wrapper when a workflow may take time or needs a clean stop path:
python3 scripts/actionbook_run.py run \
--id xhs-profile-download \
--cwd "$PWD" \
-- \
python3 scripts/adapters/xiaohongshu_workflow.py profile download \
--session xhs-profile-download \
--profile-url "https://www.xiaohongshu.com/user/profile/..." \
--count all \
--output-dir "$PWD/output/xhs-profile"Stop it later with:
python3 scripts/actionbook_run.py stop --id xhs-profile-download- Use
scripts/scheduler.pyforsubmit,list,status, andstop. reconcileis reserved for recovery work but is not implemented in the first pass yet.- First-pass extension scheduling assumes one shared browser session and one leased tab per running task.
- Scheduler-managed tasks open exclusive tabs with
--force-new-tab --no-adopt. - The first pass integrates one adapter contract through Taobao.
- Legacy workflow records stay at the output root; scheduler contract files live under
contract/. - Unsupported sites still default to direct agent browser work first.
python3 scripts/webpage_markdown.py capture \
--session page-capture \
--url "https://example.com" \
--output-dir "$PWD/output/page"| Script | Purpose |
|---|---|
scripts/actionbook_session.py |
Ensure a usable ActionBook browser session and tab. |
tests/e2e/actionbook_e2e_smoke.py |
Run a helper-first unsupported/serial/parallel browser smoke and write a report. |
scripts/diagnostics/actionbook_diagnose.py |
Probe extension/session persistence and write a diagnostic report. |
scripts/actionbook_run.py |
Run, stop, inspect, and list tracked long-running workflows. |
scripts/webpage_markdown.py |
Capture a rendered webpage or local HTML as Markdown. |
scripts/adapters/xiaohongshu_workflow.py |
View and download Xiaohongshu notes, search results, profiles, feeds, favorites, and likes. |
scripts/adapters/x_workflow.py |
View and download X home, bookmarks, tweets, threads, searches, profiles, and current account posts. |
scripts/adapters/weibo_workflow.py |
View and download Weibo posts, profiles, searches, feeds, comments, favorites, and user data. |
scripts/adapters/douban_workflow.py |
View Douban search, charts, subjects, photos, marks, and reviews. |
scripts/adapters/zhihu_workflow.py |
View Zhihu hot lists, recommendations, searches, questions, answers, collections, and export articles. |
scripts/adapters/youtube_workflow.py |
View YouTube search, video metadata, transcripts, comments, channels, playlists, feeds, history, watch later, and subscriptions. |
scripts/adapters/douyin_workflow.py |
View Douyin creator pages, videos, collections, activities, hashtags, locations, stats, and public user videos. |
scripts/adapters/bilibili_workflow.py |
View Bilibili hot lists, rankings, search, videos, comments, dynamics, history, following, subtitles, and summaries. |
scripts/adapters/jd_workflow.py |
View JD product search, item details, reviews, cart, and current account state. |
scripts/adapters/taobao_workflow.py |
View Taobao product search, item details, reviews, cart, and current account state. |
scripts/adapters/zhipin_workflow.py |
View BOSS 直聘 filters, recommendation lists, keyword search/detail crawls, and chat metadata with DOM fallback for blocked APIs. |
Typical workflow outputs are written to the directory passed with --output-dir. Depending on the workflow, outputs may include:
summary.jsonmetadata.json- extracted Markdown files
- structured JSON payloads
- downloaded media assets
- per-item folders for batch exports
Long-running run state is stored outside the project at:
~/.action-browser/runs/
This skill uses progressive disclosure:
| File | Purpose | Loaded When |
|---|---|---|
SKILL.md |
Core rules, browser operation flow, waiting strategy, and stop handling. | Always when the skill is invoked. |
references/initialization.md |
ActionBook, Node.js, Chrome, CLI, and extension setup. | When the local ActionBook environment is missing or incomplete. |
references/status-check.md |
Minimal checks before starting browser work. | When daemon, extension, session, or tab state is uncertain. |
references/*.md |
Site-specific workflows and payload expectations. | Only for the matching site task. |
scripts/*.py |
Reusable workflow helpers and extraction scripts. | When the task needs automation beyond one-off browser operations. |
agents/openai.yaml |
Skill metadata for compatible agent interfaces. | When a tool reads skill display metadata. |
- Use one stable session id per task or task group.
- Confirm the real tab id before interacting with a page.
- For parallel work in one session, allocate one stable tab id per subtask and pass
--tabexplicitly. - Use the helper for session/tab lifecycle. Raw browser session commands are for diagnostics, not the default task path.
- Take a fresh snapshot after page structure changes.
- Use snapshot refs over remembered selectors.
- Treat timeouts as failure ceilings, not as a waiting strategy.
- Stop for login, CAPTCHA, MFA, and risk-control pages so the user can complete them in the same browser session.
- Track long workflows so interruption can stop the underlying process group.
Action Browser does not read, save, or submit user passwords, cookies, tokens, API keys, or other secrets. Login and risk-control steps remain user-controlled in the browser.
Some target sites may change their DOM, API responses, login flow, or anti-automation behavior. The workflow references document the expected payloads and known recovery steps, but site-specific helpers should be treated as operational scripts that require maintenance.
Built around ActionBook and packaged as a portable agent skill.