Skip to content

feat: add Linux desktop automation support via AT-SPI2#356

Merged
thymikee merged 25 commits intomainfrom
claude/linux-desktop-automation-Rgtcx
Apr 4, 2026
Merged

feat: add Linux desktop automation support via AT-SPI2#356
thymikee merged 25 commits intomainfrom
claude/linux-desktop-automation-Rgtcx

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 4, 2026

Add Linux as a first-class desktop automation platform using the AT-SPI2 accessibility framework, mirroring the macOS approach with accessibility snapshots.

Architecture

Uses a Python subprocess (atspi-dump.py) with PyGObject bindings to capture the AT-SPI2 accessibility tree. This avoids native compilation issues (node-gtk) and leverages the reference GObject Introspection consumer that's available on all Linux desktops.

New platform files (src/platforms/linux/)

File Purpose
atspi-dump.py Python AT-SPI2 tree dumper — outputs JSON to stdout
atspi-bridge.ts Node bridge that spawns the Python script and maps output to RawSnapshotNode[]
role-map.ts Maps ~100 AT-SPI2 role names to normalized snapshot types
snapshot.ts Entry point mapping SessionSurface to AT-SPI2 SnapshotSurface
input-actions.ts Input synthesis via xdotool (X11) / ydotool (Wayland)
screenshot.ts Screenshots via scrot/grim with fallback chains
app-lifecycle.ts App open/close, back/home actions
linux-env.ts Display server detection, input tool resolution
devices.ts Local device discovery

Integration changes

  • Extended Platform type with 'linux', SnapshotBackend with 'linux-atspi'
  • Added isDesktopBackend() helper consolidating macOS + Linux treatment
  • Wired Linux into dispatch, snapshot capture, device discovery, interactors
  • Added Linux surface support (app, desktop, frontmost-app)
  • Added 'linux' to replay metadata platforms and CLI --platform flag

Tests & CI

  • Unit tests for atspi-bridge.ts (9 tests) and role-map.ts (3 tests)
  • Linux CI workflow (.github/workflows/linux.yml): Xvfb + D-Bus + AT-SPI2 registry
  • Smoke test opens gnome-calculator, takes screenshot + snapshot, asserts elements exist
  • Screenshots uploaded as CI artifacts for visual verification

Documentation

  • src/platforms/SNAPSHOT_CONTRACT.md: Cross-platform traversal contract documenting output schema, traversal rules, surface semantics, and normalized role types

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT

Add Linux as a first-class platform using AT-SPI2 accessibility framework
via node-gtk for accessibility tree snapshots. This mirrors the macOS
desktop automation approach using accessibility snapshots.

New files:
- src/platforms/linux/atspi-bridge.ts: Core AT-SPI2 bridge using node-gtk
  with lazy loading, recursive tree traversal (max 1500 nodes, depth 12)
- src/platforms/linux/role-map.ts: AT-SPI2 role normalization (~100 roles
  mapped to existing snapshot type conventions)
- src/platforms/linux/snapshot.ts: Snapshot entry point with surface,
  scope, depth, and interactive-only filtering support
- src/platforms/linux/devices.ts: Local device discovery for Linux
- src/platforms/linux/node-gtk.d.ts: Type declarations for node-gtk

Integration:
- Extended Platform type with 'linux', backend union with 'linux-atspi'
- Wired snapshot into dispatch.ts and snapshot-capture.ts
- Added Linux device discovery to dispatch-resolve.ts
- Added stub interactor (input actions deferred to Phase 3)
- Added 'linux' to CLI --platform flag
- node-gtk added as optional dependency (only installs on Linux)

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-356/

Built to branch gh-pages at 2026-04-04 10:25 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0d880e00b1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


const devices: DeviceInfo[] = [];
try {
devices.push(...(await listLinuxDevices()));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve Android auto-selection before Linux host device

This new discovery order prepends the synthetic local Linux device ahead of Android candidates in the no---platform path, which changes default targeting on Linux hosts. Because resolveDevice keeps discovery order when multiple booted physical devices are equally valid, a connected Android phone (kind: device) is now commonly displaced by the Linux host entry, so existing commands that relied on implicit Android selection start resolving to Linux instead.

Useful? React with 👍 / 👎.

Comment on lines +137 to +141
case 'linux':
return {
open: () => {
throw new AppError('UNSUPPORTED_OPERATION', 'open not yet supported on Linux');
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate unsupported Linux commands in capability checks

The Linux interactor path is mostly stubbed with UNSUPPORTED_OPERATION throws, but Linux was added as a platform without a Linux-specific capability matrix, so command support checks still treat Linux like Android and allow commands such as open, click, and fill to proceed until they fail at runtime here. This creates false-positive support signals and inconsistent behavior across handlers that rely on isCommandSupportedOnDevice for early rejection.

Useful? React with 👍 / 👎.

claude added 17 commits April 4, 2026 06:30
- Extract SnapshotBackend type alias to replace repeated string union
  across 5 files (snapshot.ts, snapshot-capture.ts, session-replay-heal.ts,
  interaction.test.ts)
- Remove duplicate scope/interactive/depth filtering from
  linux/snapshot.ts — let the existing buildSnapshotState pipeline handle
  it, same as Android
- Extract isDesktopBackend() helper in snapshot-capture.ts to consolidate
  the "skip mobile semantics" pattern for macos-helper and linux-atspi
- Collapse 17 repetitive throw statements in Linux interactor stubs
  into a linuxStub() factory function

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Add xdotool/ydotool input actions (tap, swipe, scroll, type, fill,
right/middle click, long press, double click), screenshot capture via
grim/scrot, and app lifecycle management (open, close, back, home).
Wire Linux interactors with real implementations and fix device
discovery order so Linux doesn't displace Android in auto-selection.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
- Extract linux-env.ts with cached display server + input tool detection
  so every action avoids repeated `which` lookups
- Add moveTo/clickButton/sendKey helpers to eliminate repeated
  mousemove boilerplate across 5 mouse actions
- Make scrollLinux respect amount/pixels options instead of hardcoded
  scroll count
- Have backLinux/homeLinux reuse sendKey instead of duplicating tool
  detection

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Add GitHub Actions workflow that boots a virtual X11 display (Xvfb),
installs AT-SPI2 accessibility tooling and xdotool, opens
gnome-calculator, takes screenshots, and captures an accessibility
snapshot. Screenshots are uploaded as artifacts for visual verification.

Also adds 'linux' to replay script metadata platforms and a
test:replay:linux script to package.json.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
The replay runner requires an active session before any commands can
run. Move the screenshot after the open command that creates the
session.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
- Add gobject-introspection, libcairo2-dev, build-essential for
  node-gtk native compilation
- Split AT-SPI2 registry start into its own step so it picks up
  DBUS_SESSION_BUS_ADDRESS from GITHUB_ENV
- Set GTK_A11Y=atspi, GTK_MODULES=gail:atk-bridge, NO_AT_BRIDGE=0
  to ensure GTK apps expose their accessibility tree on headless CI
- Set GSETTINGS_BACKEND=memory to avoid dconf failures
- Add node-gtk verification step to catch build failures early

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
pnpm install silently skips failed optional dependency builds and
the pnpm cache may not include the native binary. Force a rebuild
after install to ensure the node-gtk .node binding is compiled
against the system GI/cairo headers.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
pnpm rebuild doesn't trigger node-pre-gyp properly for optional deps.
Run node-pre-gyp install --fallback-to-build --update-binary directly
inside the node-gtk package directory to force compilation when no
prebuilt binary exists for the current Node ABI (v127 / Node 22).

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
node-gtk is a native C++ addon that requires compilation against
specific Node ABI versions and GObject Introspection headers. This
proved unreliable on CI (no prebuilt binaries for Node 22 ABI v127,
silent optional dep build failures, pnpm cache staleness).

Replace it with a Python helper script (atspi-dump.py) that uses
PyGObject — the reference GObject Introspection consumer. python3-gi
is trivially installable on any Linux distro with no compilation step.
The Node bridge spawns `python3 atspi-dump.py` and parses JSON output.

- Remove node-gtk from optionalDependencies
- Remove node-gtk.d.ts type stub
- Add atspi-dump.py (~200 lines) doing the same tree traversal
- Rewrite atspi-bridge.ts to use subprocess instead of in-process GI
- Simplify CI workflow: no more native build deps or rebuild steps

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
python3-gi, gir1.2-atspi-2.0, at-spi2-core, and dbus-x11 are already
present on Ubuntu GitHub Actions runners.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
- Allow --surface desktop and --surface frontmost-app on Linux
  (previously only macOS could use --surface)
- Add unit tests for atspi-bridge (9 tests: JSON parsing, role
  normalization, null coercion, error handling, arg forwarding)
- Add unit tests for role-map (3 tests: common roles, case
  normalization, PascalCase fallback)
- Improve .py script path resolution (walk upward instead of
  hardcoded relative paths)
- CI replay test now asserts snapshot contains calculator UI
  nodes via is-exists

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Document the shared schema, traversal rules, surface semantics, and
normalized role types that all snapshot backends (Swift, Python,
Android) must conform to. This serves as the single source of truth
when adding or modifying platform backends.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
pnpm-lock.yaml still referenced node-gtk after it was removed from
package.json, causing pnpm install --frozen-lockfile to fail in CI.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
- atspi-dump.py: use ctx dict for traversal limits instead of globals,
  fix rect filter (width/height <= 0 should use `or`), add surface validation
- input-actions.ts: make sendKey scancodes required to prevent silent
  no-op on ydotool, fix ydotool longPress/swipe to use click --down/--up
- app-lifecycle.ts: use pkill -x (exact match) instead of pkill -f
- linux-env.ts: emit diagnostic warning when falling back to xdotool on Wayland

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Ubuntu runners may not have at-spi2-core, python3-gi, gir1.2-atspi-2.0,
or dbus-x11 pre-installed. Install them explicitly instead of assuming
they exist. Also make the verify step's tree dump non-fatal since no
apps are running at that point.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
The selector parser tokenizes on whitespace, so `role=push button`
was split into two tokens causing a parse failure. Use single quotes
inside the selector: `role='push button'`.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
appName is not a valid selector key. The supported keys are: id, role,
text, label, value, visible, hidden, editable, selected, enabled,
hittable. Simplified to use label and role only.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
@thymikee thymikee changed the title feat: add Linux desktop automation support via AT-SPI2 (Phase 1+2) feat: add Linux desktop automation support via AT-SPI2 Apr 4, 2026
claude added 7 commits April 4, 2026 09:23
- snapshot.ts: emit diagnostic warning when menubar surface is
  requested on Linux (falls back to desktop silently otherwise)
- SNAPSHOT_CONTRACT.md: fix unmapped role example to use a role
  that isn't actually mapped (was "color chooser" which maps to Dialog)

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
P0: Add explicit Linux capability matrix with 3-way platform routing
(Apple/Linux/Android) in isCommandSupportedOnDevice. Linux now correctly
blocks unsupported commands (clipboard, rotate, scrollIntoView, etc.)
at capability level rather than throwing at runtime. Includes tests.

P0: Expand Linux CI to run typecheck + unit tests before smoke tests.
Add AT-SPI2 registry health probe with fail-fast on missing registry.

P1: Harden atspi-dump.py — arg parsing now produces JSON errors on bad
int values, and a top-level catch wraps unexpected exceptions in JSON.

P1: Add 10s per-action timeout to xdotool/ydotool input commands to
prevent indefinite hangs.

P1: Tighten smoke test selectors to calculator-specific signals
(digit labels) instead of generic role='push button'.

P2: Document Linux surface mapping, supported commands, and known
limitations in SNAPSHOT_CONTRACT.md.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Linux snapshots were bypassing snapshotInteractiveOnly and snapshotDepth
filtering that macOS-helper gets via shapeDesktopSurfaceSnapshot. Route
Linux through the same function so snapshot -i and --depth flags work.

Renamed shapeMacOsSurfaceSnapshot → shapeDesktopSurfaceSnapshot since
it's now shared between macOS and Linux desktop backends.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
- app-lifecycle.ts: emit diagnostic on fire-and-forget app launch
  failure instead of silently swallowing errors
- linux-env.ts: make xdotool on Wayland a hard error instead of
  a broken fallback (xdotool doesn't work on Wayland)
- atspi-bridge.ts: increase Python subprocess timeout from 15s to
  30s for safety on slow/loaded systems with large a11y trees

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
Selectors:
- Add appname and windowtitle as selector keys for desktop platforms.
  Both macOS and Linux snapshots already populate these fields — now
  they're usable in selector expressions (e.g., "label=OK appname=Calc").
  Keys are case-insensitive.

Clipboard:
- Implement readLinuxClipboard/writeLinuxClipboard using xclip/xsel
  (X11) or wl-copy/wl-paste (Wayland) with descriptive TOOL_MISSING
  errors. Enable clipboard in Linux capability matrix. 7 unit tests.

Input action tests:
- Add 18 unit tests covering xdotool and ydotool code paths: press,
  right/middle click, double click, sendKey, type, scroll, swipe,
  focus, fill. Tests mock runCmd and verify correct tool + args.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
…_info helper

Avoid repeated `which` calls on every screenshot/clipboard operation by
caching the resolved tool on first use, matching the input-action pattern.
Extract duplicated app_name/pid retrieval in atspi-dump.py into get_app_info.

https://claude.ai/code/session_01H9hrmueNF5pcBM8JeX81mT
@thymikee thymikee merged commit caf0e83 into main Apr 4, 2026
15 checks passed
@thymikee thymikee deleted the claude/linux-desktop-automation-Rgtcx branch April 4, 2026 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants