Skip to content

feat: base machinery for AST-precise extraction via WASM plugins#44

Open
the-wondersmith wants to merge 9 commits into
Houseofmvps:mainfrom
the-wondersmith:feat/native-ast-wasm-plugins
Open

feat: base machinery for AST-precise extraction via WASM plugins#44
the-wondersmith wants to merge 9 commits into
Houseofmvps:mainfrom
the-wondersmith:feat/native-ast-wasm-plugins

Conversation

@the-wondersmith

Copy link
Copy Markdown

Summary

PR adds machinery (gated behind explicit opt-in via cli flag) for expanding codesight's repetoire of languages with AST-precision support via out-of-band (i.e. user supplied) WASM plugins. PR is intentionally scope-limited to only the host side of the plugin system: the plugin ABI, discovery, the opt-in CLI/env/config surface, the dispatch wiring, and a reference plugin used for conformance testing. Without explicit user opt-in (via cli flag) behavior is byte-identical to today, and the package maintains its zero runtime dependencies status. Additionally, the new machinery is explicitly "off" by default.

Motivation

codesight's edge is AST-precise context, cheaply — but that precision is TypeScript-only today, because it's powered by the project's own TS compiler. The other 14 supported languages fall back to regex detection, which can much more easily miss things like framework patterns and/or mislabel routes/models. As a concrete example, during personal testing a rust project implementing an axum service with a .route("/x", get(h)) was detected as actix and yielded 0 routes using the built-in regex.

Captured Wins

  • Zero cost when unused. Default-off and byte-identical to today; runtime deps stay at 0 (assemblyscript is dev-only; npm pack excludes the fixture).
  • AST accuracy everywhere. With a plugin present, extraction is as precise as the plugin's parser.
    • for example, validation of the aforementioned rust codebase against a real syn-based plugin recovered actix attribute routes, axum route chains, and struct fields that the built-in regex misidentified or missed entirely.
  • Safe by design. Plugins run as a no-imports WASM module — no WASI, no syscalls, no filesystem or network — a far safer way to gain precision than adding heavy deps or executing native helper binaries.
  • Trustworthy in CI. Strict mode plus bidirectional drift detection let codesight users assert native parsing ran where expected, rather than hoping.
  • A reusable extension point. The capability-agnostic host + stable, self-describing ABI lets anyone extend codesight to their language/framework without changes to codesight, and the same mechanism can host non-AST capabilities later.
  • Offloading AST parsing/extraction to plugins:
    • extends AST-grade precision beyond TypeScript without breaking two of codesight's big promises: zero runtime dependencies and no required toolchain/setup
    • makes it easy to expand the roster of supported languages at a rate that's decoupled from maintaining/developing codesight itself
    • (in the case of WASM-compatible languages) plugins can just straight up use the target language's own AST parsing functionality without having to implement it themselves
    • the heavy, language-specific parsing lives in a user-supplied binary outside the codesight codebase while codesight iself stays lean, safe, and zero-dep

What This PR Includes

  • WASM plugin ABI (src/wasm/plugin-host.ts): per-kind exports parseRoutes / parseSchemas / parseImports (capability detected by export presence) + a contractVersion() export. No manifest, no kind codes. UTF-8 in / JSON out over linear memory; alloc/dealloc/memory; packed i64 return.
  • Native-AST loader (src/ast/native-loader.ts): discovery waterfall (--plugin-dir~/.codesight/plugins$XDG_DATA_HOME/... → install dir), version gating, domain-type adapters that stamp confidence: "native", and strict-mode diagnostics.
  • Dispatch (src/detectors/*, src/core.ts): route/schema sites try the native plugin first, then fall back; native results are counted separately in the scan summary.
  • CLI/env/config: --native-ast[=langs], --native-ast-strict, --plugin-dir; CODESIGHT_NATIVE_AST / CODESIGHT_PLUGIN_DIR; a nativeAst config field (precedence CLI > env > config file, including the no-TS-loader config path).
  • Reference plugin (reference/ast-plugin/): a minimal AssemblyScript, marker-based fixture (committed prebuilt .wasm + checksums) that exercises the ABI end-to-end. Excluded from the npm package by the files allowlist.
  • Docs: the full contract in docs/wasm-plugins.md.
  • CI: a test-suite workflow and a wasm-plugin-abi workflow that rebuilds the reference plugin from source and runs conformance against the fresh build, with a checksum guard against a stale committed binary.

The 9 commits are ordered for ease of review: ignore noise → fix a flaky test → host/loader → dispatch → CLI → tests → docs → reference fixture → CI. Each feat commit builds on its own, all commits can be atomically rolled back without breaking builds.

Scope & limitations (intentional — not bugs)

  • The language set is currently fixed to rust/go/python. Plugins are only consulted at codesight's existing detector dispatch points, so "any user-specified language" is not yet literally true. Generalizing this (declared languageId/extensions + a language-driven pass) is a planned follow-up.
  • Schema gap: native schema extraction is only dispatched where a built-in ORM detector exists (Python SQLAlchemy/Django, Go GORM/Ent). Rust schemas are not dispatched (no built-in Rust ORM detector). Same root cause as above.
  • parseImports is defined in the contract but not dispatched during a scan. Dependency-graph edges must resolve to project-relative file paths, which a per-file plugin can't do without whole-project context; the export is reserved so enabling it later is purely additive. Built-in extraction handles imports today.
  • Strict mode surfaces diagnostics + a non-zero exit on the single-scan path only; monorepo/watch runs don't yet.
  • detectComponents takes the config param for symmetry but has no native component extraction (reserved).
  • The committed reference .wasm is a best-effort convenience copy; CI rebuilds it from source and the checksum guard catches drift/staleness.

Testing

  • Adds 20 tests across two new files (suite total now 136, all passing):
    • tests/native-ast.test.ts (10) — config/env/file resolution + precedence, and dispatch/strict-mode behavior via a mocked plugin provider.
    • tests/reference-plugin.test.ts (10) — real-wasm conformance through the actual host (raw ABI per kind + the domain adapter) and contractVersion gating.
    • Also hardens an existing test: tests/monorepo.test.ts is made isolation-safe (unrelated pre-existing flakiness fixed along the way).
  • Both CI workflows verified: tests runs the full suite green; wasm-plugin-abi builds the reference plugin, verifies checksums, and runs the 20-assertion conformance/gating set 10/10. The checksum step confirms asc rebuilds the wasm byte-identically across macOS→linux.
  • Drift detection confirmed in both directions: a reference-plugin change that violates the ABI fails CI, and an ABI/host change the plugin doesn't match fails CI.
  • Not covered: a full scan() integration test driving a real plugin end-to-end (host-level conformance + mocked dispatch cover the seams; the CLI path was verified manually).

Planned Follow-ups (post-merge)

  • Language generalization (declared languageId/extensions + a generic language-driven pass) — closes the schema gap and re-enables parseImports dispatch with project context.
  • Surface strict-mode diagnostics in monorepo/watch runs.
  • Full-scan integration test with the reference plugin; Node version matrix in CI.
  • Working plugin implementations for WASM-compatible languages:

Note

Where plugin implementations should live will need to be discussed/decided on with codesight maintainers

Reviewer notes

  • Start with docs/wasm-plugins.md for the contract, then src/wasm/plugin-host.ts and src/ast/native-loader.ts.
  • reference/ast-plugin/ is a test fixture, not a shipped plugin or a real parser — see its README.md (incl. a "do not copy this as a template" note).
  • Sanity check that the feature is inert by default: npm pack --dry-run excludes all of reference/, and a scan without --native-ast behaves exactly as before.

Untrack generated .windsurfrules fixtures (now gitignored) and clean each
fixture dir before writing, so the idempotent AI-config generators don't skip
and fail on a second run.
Opt-in host for user-supplied WASM AST plugins. Per-kind exports
(parseRoutes/parseSchemas/parseImports, capability by presence) + a
contractVersion() export; no manifest, no kind codes. Inert unless enabled.
Route + schema dispatch tries the native plugin first, falls back to the
existing extractor. Adds 'native' confidence and strict-mode diagnostics.
Imports are intentionally not dispatched (see graph.ts TODO).
--native-ast[=langs], --native-ast-strict, --plugin-dir; CODESIGHT_NATIVE_AST
and CODESIGHT_PLUGIN_DIR; nativeAst config field (incl. no-TS-loader parsing).
Strict mode reports unrun plugins and exits non-zero.
Minimal AssemblyScript marker plugin (committed prebuilt + checksums) exercising
the ABI end-to-end against the real host. Excluded from the npm package by the
files allowlist; assemblyscript added as a devDependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant