Skip to content

feat: Extension installation module#2811

Open
csmith49 wants to merge 46 commits intomainfrom
feat/installed-extensions
Open

feat: Extension installation module#2811
csmith49 wants to merge 46 commits intomainfrom
feat/installed-extensions

Conversation

@csmith49
Copy link
Copy Markdown
Collaborator

@csmith49 csmith49 commented Apr 13, 2026

  • A human has tested these changes.

Why

Skills and plugins have repeated but slightly-divergent implementations of installing/loading. This PR attempts to unify both with one mechanism that can be reused for other extensions in the future.

This PR is a follow-up to #2801 and should be merged after.

Summary

  • New openhands.sdk.extensions.installation module that provides the interface for installing/loading extensions
  • Tests for the above
  • Updates to openhands.sdk.plugin.installed and openhands.sdk.skills.installed to use the new machinery while exposing the old interface

How to Test

Unit/integration tests all use temp files to actually stress the saving/loading.

Type

  • Bug fix
  • Feature
  • Refactor
  • Breaking change
  • Docs / chore

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.13-nodejs22-slim Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:828ef3e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-828ef3e-python \
  ghcr.io/openhands/agent-server:828ef3e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:828ef3e-golang-amd64
ghcr.io/openhands/agent-server:828ef3e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:828ef3e-golang-arm64
ghcr.io/openhands/agent-server:828ef3e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:828ef3e-java-amd64
ghcr.io/openhands/agent-server:828ef3e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:828ef3e-java-arm64
ghcr.io/openhands/agent-server:828ef3e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:828ef3e-python-amd64
ghcr.io/openhands/agent-server:828ef3e-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-amd64
ghcr.io/openhands/agent-server:828ef3e-python-arm64
ghcr.io/openhands/agent-server:828ef3e-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-slim-arm64
ghcr.io/openhands/agent-server:828ef3e-golang
ghcr.io/openhands/agent-server:828ef3e-java
ghcr.io/openhands/agent-server:828ef3e-python

About Multi-Architecture Support

  • Each variant tag (e.g., 828ef3e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 828ef3e-python-amd64) are also available if needed

csmith49 and others added 30 commits April 10, 2026 09:49
Cover parse_extension_source, get_cache_path, fetch,
fetch_with_resolution, SourceType enum, and repo_path handling
for the new shared extensions module.

Co-authored-by: openhands <openhands@all-hands.dev>
…h.py

- Add class docstring to SourceType enum describing each variant.
- Update parse_extension_source docstring: fix arg description,
  return types, and example function names.
- Fix _apply_subpath docstring example from 'plugin repository'
  to 'extension repository'.
- Fix get_cache_path docstring references from plugin to extension.
- Rename plugin_path variable to ext_path in fetch_with_resolution.

Co-authored-by: openhands <openhands@all-hands.dev>
Both modules now delegate all fetch logic to the shared
extensions.fetch module while preserving their public interfaces:

- plugin/fetch.py: parse_plugin_source, get_cache_path, fetch_plugin,
  fetch_plugin_with_resolution, PluginFetchError, DEFAULT_CACHE_DIR
- skills/fetch.py: fetch_skill, fetch_skill_with_resolution,
  SkillFetchError, DEFAULT_CACHE_DIR

ExtensionFetchError is caught and re-raised as the domain-specific
error type with 'extension' replaced by 'plugin'/'skill' in messages.

Co-authored-by: openhands <openhands@all-hands.dev>
- Remove parse_plugin_source and get_cache_path from plugin/fetch.py;
  callers should use extensions.fetch directly.
- Slim test_plugin_fetch.py from 1038 to 100 lines: keep only
  PluginFetchError wrapping, DEFAULT_CACHE_DIR, and Plugin.fetch() tests.
- Move git infrastructure tests (clone, update, checkout, locking,
  GitHelper errors, get_default_branch) to tests/sdk/git/test_cached_repo.py.
- Update test_installed_plugins.py to import from extensions.fetch.

Co-authored-by: openhands <openhands@all-hands.dev>
The except block was discarding the ExtensionFetchError message and
replacing it with a generic 'Failed to fetch plugin'. Now it carries
the original message through with 'extension' replaced by 'plugin'.

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Python API breakage checks — ✅ PASSED

Result:PASSED

Action log

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

REST API breakage checks (OpenAPI) — ✅ PASSED

Result:PASSED

Action log

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linus-Style Code Review

Taste Rating: 🟡 Acceptable - Good architectural thinking with clean abstraction, but has a critical runtime bug

[CRITICAL ISSUES]

  • Runtime Bug: model_validate_json() called on Python dict instead of JSON string - will fail at runtime
  • Eval Risk: This refactors plugin/skill loading; needs human review + lightweight evals before merge

[ARCHITECTURAL ASSESSMENT]
Good taste here: The generic InstallationManager[T] abstraction is solid. You've eliminated code duplication between plugins and skills by finding the right data structure pattern. Clean separation of concerns with InstallationInterface, InstallationInfo, and MetadataSession.

VERDICT:
Needs rework: Fix the critical bug first, then flag for human maintainer review due to eval risk (affects agent capabilities via plugin/skill loading).

KEY INSIGHT:
The abstraction is right, but the validation bug shows this needs runtime testing beyond unit tests. Run an actual install/load cycle end-to-end before merge.

@csmith49 csmith49 changed the base branch from main to feat/extensions-utils April 13, 2026 15:00
@csmith49 csmith49 changed the title Feat/installed extensions feat: Extension installation module Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 13, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/extensions/installation
   manager.py116595%179, 182, 194, 197, 200
   metadata.py91693%196–198, 202–204
openhands-sdk/openhands/sdk/plugin
   __init__.py16193%87
openhands-sdk/openhands/sdk/skills
   installed.py851285%44, 196–199, 201–202, 206–207, 211, 226–227
TOTAL22353642171% 

Co-authored-by: OpenHands Bot <contact@all-hands.dev>
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS WITH ISSUES

Refactor successfully unifies plugin and skill installation without breaking core functionality. Minor breaking change in internal API exports.

Phase Result
Environment Setup ✓ Build successful, all dependencies installed
CI & Tests ✓ 17/17 checks passed, 115 new tests added (all passing)
Functional Verification ✓ End-to-end plugin/skill install/uninstall/enable/disable working correctly
Functional Verification

Test Coverage

Created test plugin and skill from local paths, then exercised all core operations:

Plugin Operations (all passed ✓):

1. Installing plugin from local path...
   ✓ Installed: qa-test-plugin v1.0.0
   ✓ Source: /tmp/qa-test-extension/test-plugin
   ✓ Enabled: True

2. Listing installed plugins...
   ✓ Found 1 plugin(s)
     - qa-test-plugin: enabled=True

3. Loading installed plugins...
   ✓ Loaded 1 plugin(s)
     - qa-test-plugin v1.0.0

4. Disabling plugin...
   ✓ Disabled: True
   ✓ Loaded after disable: 0 plugin(s)

5. Enabling plugin...
   ✓ Enabled: True
   ✓ Loaded after enable: 1 plugin(s)

6. Getting plugin info...
   ✓ Name: qa-test-plugin
   ✓ Version: 1.0.0
   ✓ Description: QA test plugin for extensions refactor

7. Updating plugin...
   ✓ Updated: qa-test-plugin

8. Uninstalling plugin...
   ✓ Uninstalled: True
   ✓ Remaining plugins: 0

Skill Operations (all passed ✓):

1. Installing skill from local path...
   ✓ Installed: test-skill
   ✓ Source: /tmp/qa-test-extension/test-skill
   ✓ Enabled: True

2-8. List/Load/Disable/Enable/Get/Update/Uninstall...
   ✓ All operations completed successfully

Test Command:

uv run python /tmp/qa_test_extensions.py

CI Test Results

All 17 CI checks passed:

  • ✓ sdk-tests (115 new unit tests added)
  • ✓ agent-server-tests
  • ✓ tools-tests
  • ✓ cross-tests
  • ✓ build-binary-and-test (ubuntu + macos)
  • ✓ pre-commit (linting, type checking)
  • ✓ Python API compatibility check
  • ✓ REST API compatibility check

New Test Coverage:

  • 81 tests for openhands.sdk.extensions module
  • 16 tests for refactored openhands.sdk.plugin.installed
  • 18 tests for refactored openhands.sdk.skills.installed

Issues Found

🟡 Minor: Breaking change — InstalledPluginsMetadata and InstalledSkillsMetadata removed from public exports. These were internal implementation classes, but their removal is technically a breaking API change. The main user-facing API (all functions like install_plugin, list_installed_plugins, etc.) remains backward compatible.

Base automatically changed from feat/extensions-utils to main April 13, 2026 16:29
@csmith49 csmith49 requested a review from all-hands-bot April 13, 2026 20:32
Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linus-Style Code Review

Taste Rating: 🟡 Acceptable - Good architectural thinking with clean abstraction, but has eval risk that requires human review

[EVAL RISK]

  • This refactors plugin/skill loading; needs human review + lightweight evals before merge

[ARCHITECTURAL ASSESSMENT]
Good taste here: The generic InstallationManager[T] abstraction is solid. You eliminated code duplication between plugins and skills by identifying the common pattern and extracting it. The protocol-based approach is elegant.

Security done right: Extension name validation prevents path-traversal attacks. Self-healing metadata reconciliation is pragmatic.

[TESTING CONCERNS]
🟡 Test code has confusing double-encoding pattern (see inline comments)

🟠 Missing evidence: PR description says "A human has tested these changes" but the box is unchecked. Given this changes extension loading (used in production), should verify end-to-end:

  • Install/uninstall/enable/disable a real plugin
  • Install/uninstall/enable/disable a real skill
  • Verify existing installed extensions still load correctly after upgrade

Unit tests are good but don't prove the refactor works in practice.

VERDICT:
⚠️ Needs human review - Core architecture is sound, but eval risk requires human decision + lightweight evals

KEY INSIGHT:
This is a textbook example of good abstraction - you found the common pattern in two similar implementations and extracted it without adding unnecessary complexity. The self-healing metadata sync is particularly smart. Main concern is ensuring benchmark stability.

"""Write a mock extension manifest to a directory."""
directory.mkdir(parents=True, exist_ok=True)
ext = MockExtension(name=name, version=version, description=description)
with (directory / "extension.json").open("w") as f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Confusing pattern: This double-encodes JSON:

  1. ext.model_dump_json() returns a JSON string
  2. json.dump(string, f) encodes that string as JSON again
  3. Later json.load(f) decodes it back to a string
  4. model_validate_json(string) finally parses it

It works by accident (encode/decode cancels out) but is unnecessarily convoluted. Use the direct pattern shown in the README:

Suggested change
with (directory / "extension.json").open("w") as f:
with (directory / "extension.json").open("w") as f:
f.write(ext.model_dump_json())

Then in the loader, model_validate_json() makes sense because it's reading a JSON string from disk.

def load_from_dir(extension_dir: Path) -> MockExtension:
extension_path: Path = extension_dir / "extension.json"
with extension_path.open() as f:
return MockExtension.model_validate_json(json.load(f))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion - Simplify: Since the helper writes JSON with model_dump_json() (which returns a string), the loader should read the string directly:

Suggested change
return MockExtension.model_validate_json(json.load(f))
return MockExtension.model_validate_json((extension_dir / "extension.json").read_text())

This matches the pattern shown in the module README and avoids the confusing double-encode/decode dance.

directory: Path,
name: str = "mock-extension",
version: str = "0.0.1",
description: str = "Mock extension",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Same issue as in test_installation_manager.py - this double-encodes JSON. Should be:

Suggested change
description: str = "Mock extension",
with (directory / "extension.json").open("w") as f:
f.write(ext.model_dump_json())

@staticmethod
def load_from_dir(extension_dir: Path) -> MockExtension:
extension_path: Path = extension_dir / "extension.json"
with extension_path.open() as f:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: Same simplification as in test_installation_manager.py:

Suggested change
with extension_path.open() as f:
return MockExtension.model_validate_json((extension_dir / "extension.json").read_text())

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ QA Report: PASS

This PR successfully consolidates duplicated installation logic from plugins and skills into a generic, type-parameterized extension installation module while maintaining full backward compatibility.

Does this PR achieve its stated goal?

Yes. The PR set out to "unify skills and plugins installation/loading with one mechanism that can be reused for other extensions in the future" — and it delivers exactly that:

  1. Duplication eliminated: The old plugin/installed.py (589 lines) and skills/installed.py (586 lines) implementations are replaced with a shared extensions/installation module (654 lines) plus thin API wrappers (141 + 230 lines). Net result: -150 lines and zero duplicated logic.

  2. Both APIs delegate to shared machinery: Code inspection confirms install_plugin(), list_installed_plugins(), install_skill(), and list_installed_skills() all delegate to InstallationManager. Metadata handling, enable/disable, and self-healing directory sync are now unified.

  3. Backward compatibility maintained: All public plugin and skill APIs work exactly as before. Existing code will continue to function without changes.

  4. Generic and extensible: The new module is parameterized by type T and uses an InstallationInterface[T] for loading. Future extension types can reuse this machinery without modification.

Phase Result
Environment Setup ✅ Dependencies installed, 659/659 tests pass
CI & Tests ✅ All checks passing: sdk-tests, tools-tests, agent-server-tests, pre-commit
Functional Verification ✅ Generic module works, plugins work, skills work, delegation confirmed
Functional Verification

Test 1: Generic Installation Module End-to-End

Created a mock extension type and exercised all InstallationManager operations:

Setup: Created MockExtension type with MockInterface loader and temporary install directory.

Operations tested:

1. Installing extension from local path...
   ✓ Installed: mock-ext v1.0.0
   ✓ Source: /tmp/.../mock-ext
   ✓ Enabled: True

2. Listing installed extensions...
   ✓ Found 1 extension(s)

3. Loading installed extensions...
   ✓ Loaded 1 extension(s)

4. Disabling extension...
   ✓ Disabled: True
   ✓ After disable, loaded count: 0

5. Re-enabling extension...
   ✓ Enabled: True
   ✓ After enable, loaded count: 1

6. Getting specific extension info...
   ✓ Found: mock-ext

7. Uninstalling extension...
   ✓ Uninstalled: True
   ✓ After uninstall, count: 0

Result: Generic installation module works correctly for all operations.


Test 2: Plugin API Backward Compatibility

Exercised the public plugin API using the same operations:

1. Installing plugin via old API...
   ✓ Installed: mock-plugin v1.0.0

2. Listing plugins via old API...
   ✓ Found 1 plugin(s)

3. Loading plugins via old API...
   ✓ Loaded 1 plugin(s)

4. Getting plugin via old API...
   ✓ Found: mock-plugin

5. Disabling plugin via old API...
   ✓ Disabled: True

6. Enabling plugin via old API...
   ✓ Enabled: True

7. Uninstalling plugin via old API...
   ✓ Uninstalled: True

Result: Plugin API maintains full backward compatibility.


Test 3: Skill API Backward Compatibility

Exercised the public skill API using the same operations:

1. Installing skill via old API...
   ✓ Installed: mock-skill

2. Listing skills via old API...
   ✓ Found 1 skill(s)

3. Loading skills via old API...
   ✓ Loaded 1 skill(s)

4. Getting skill via old API...
   ✓ Found: mock-skill

5. Disabling skill via old API...
   ✓ Disabled: True

6. Enabling skill via old API...
   ✓ Enabled: True

7. Uninstalling skill via old API...
   ✓ Uninstalled: True

Result: Skill API maintains full backward compatibility.


Test 4: Code Delegation Verification

Used Python's inspect module to verify the old implementations actually delegate:

1. Checking plugin.installed.py:
   ✓ install_plugin delegates to InstallationManager
   ✓ list_installed_plugins delegates to InstallationManager

2. Checking skills.installed.py:
   ✓ install_skill delegates to InstallationManager
   ✓ list_installed_skills delegates to InstallationManager

Result: Duplication is truly eliminated. The old APIs are now thin wrappers.


Test 5: Code Reduction Metrics

Compared line counts before and after:

BEFORE this PR (origin/main):
  plugin/installed.py: 589 lines
  skills/installed.py: 586 lines
  Total: 1175 lines

AFTER this PR (current):
  extensions/installation/*.py: 654 lines (NEW generic framework)
  plugin/installed.py: 141 lines (simplified wrapper)
  skills/installed.py: 230 lines (simplified wrapper)
  Total: 1025 lines

Net change: -150 lines

Duplication eliminated:
  ✓ Metadata handling unified
  ✓ Install/uninstall/enable/disable unified
  ✓ Self-healing directory sync unified

Result: The refactor reduced total code by 150 lines while adding a generic, reusable framework.

Issues Found

None.


Test Evidence: All verification commands and outputs are available in the collapsed section above. The changes work as claimed and maintain full backward compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants