Skip to content

Improve sentry init runtime harness with Dirac-inspired patterns #711

@BYK

Description

@BYK

Summary

Adopt three patterns from the Dirac coding agent to strengthen the sentry init local runtime:

  1. Content hashing (from line-hashing.ts) — FNV-1a file hashing for stale-read detection
  2. Atomic patchsets with rollback (simplified from CheckpointTracker) — in-memory backup before writes
  3. Tree-sitter AST infrastructure (from services/tree-sitter/) — WASM-based code parsing for local framework detection, entry point discovery, and post-patch syntax validation

What We're NOT Picking

  • Hash-anchored line edits: Requires Mastra protocol changes (server must produce hash-referenced diffs). Out of scope.
  • React Ink TUI: We use Stricli + @clack/prompts. Different paradigm.
  • Full eval harness: The Dirac eval system is benchmark-focused, not applicable.

Phase 1: Content Hashing & Staleness Detection (no new deps)

Goal: Detect when files change between read-files and apply-patchset operations.

New file: src/lib/init/content-hash.ts

  • fnv1a(content: string): string — FNV-1a 32-bit hash → 8-char hex (5 lines, from Dirac's line-hashing.ts)
  • FileSnapshot type — Map<string, string> of path→hash
  • createSnapshot(), recordFile(), isStale() helpers

Modify: src/lib/init/local-ops.ts

  • Add module-scoped FileSnapshot instance, reset per wizard run
  • In readFiles(): record content hash for each file after reading
  • In applyEdits(): before applying edits, re-read the file and check staleness. If stale, log.warn() but continue (the fuzzy replacer may still succeed)
  • Return contentHashes in the read-files result data so the Mastra workflow can optionally use it later

Modify: src/lib/init/types.ts

  • Add optional contentHashes?: Record<string, string> to read-files result shape

New tests

  • test/lib/init/content-hash.test.ts — basic hash consistency
  • test/lib/init/content-hash.property.test.ts — determinism, collision resistance

Phase 2: Atomic Patchsets with Rollback (no new deps)

Goal: If any patch in a patchset fails, restore all previously-applied files.

Modify: src/lib/init/local-ops.ts

Restructure applyPatchset() with in-memory backup:

type FileBackup =
  | { type: 'existed'; path: string; absPath: string; content: string }
  | { type: 'created'; path: string; absPath: string };
  • Before each patch: snapshot the file's current content (or record it as nonexistent)
  • On failure: restore all backups in reverse order (write-back for modify/delete, unlink for create)
  • On success: discard backups

Why in-memory, not git stash?

  • git stash has side effects on user's stash stack
  • The wizard already checks for clean git state in git.ts — users can git checkout . for ultimate recovery
  • Patchsets are small (typically <10 files, <100KB total)

Modify: src/lib/init/types.ts

  • Add rolledBack?: boolean and rollbackErrors?: string[] to LocalOpResult

New test: test/lib/init/local-ops-rollback.test.ts


Phase 3: Tree-sitter AST Infrastructure

Goal: Add WASM-based tree-sitter parsing, lazy-loaded and cached.

New dependency

bun add -d web-tree-sitter

NOT adding tree-sitter-wasms as a dependency. Grammar WASMs downloaded on demand (~250-810KB each, cached in ~/.sentry/grammars/). The web-tree-sitter JS runtime (~60KB) bundles via esbuild. Its WASM runtime (~2.5MB) is also downloaded on demand alongside grammars.

New file: src/lib/ast/parser.ts

Adapted from Dirac's languageParser.ts:

  • EXTENSION_TO_LANGUAGE map — .jsjavascript, .tstypescript, .tsxtsx, .pypython, .gogo, .rbruby, .phpphp, .javajava
  • getParser(language) — lazy init of web-tree-sitter, loads grammar WASM on demand
  • parseFile(content, filePath) — returns Parser.Tree | null
  • isAstSupported(filePath) — extension check
  • Two-level cache: language→Parser cache + grammar→WASM cache (same as Dirac)

New file: src/lib/ast/grammar-loader.ts

  • ensureGrammar(language) — checks ~/.sentry/grammars/<version>/, downloads from CDN if missing
  • ensureRuntime() — same for the tree-sitter.wasm runtime
  • CDN source: unpkg.com or GitHub-hosted (configurable)
  • Downloads use fetch() with timeout, write via Bun.write()

New file: src/lib/ast/queries.ts

S-expression queries per language, inspired by Dirac's queries/ directory:

type FrameworkSignal = {
  framework: string;      // 'nextjs', 'express', 'django', etc.
  confidence: number;     // 0.0–1.0
  evidence: string;       // human-readable
  file: string;
  line: number;
};

type EntryPoint = {
  file: string;
  line: number;
  kind: 'server-start' | 'app-export' | 'main-function' | 'sdk-init';
  pattern: string;
};

Functions: detectFramework(), findEntryPoints(), findSentryConfig(), generateOutline()

New file: src/lib/ast/errors.ts

  • AstError extends CliError — for grammar load failures, parse errors

New file: src/lib/ast/index.ts

Barrel re-export.

Build impact

  • web-tree-sitter JS: ~60KB bundled (esbuild handles it)
  • WASM runtime: NOT bundled, downloaded on demand (~2.5MB, cached)
  • Grammar WASMs: NOT bundled, downloaded on demand (~250-810KB each, cached)
  • Net binary size increase: ~60KB (just the JS loader)

New test: test/lib/ast/parser.test.ts

  • Mock grammar download, parse a JS fixture, verify tree exists
  • Unsupported extension returns null

Phase 4: AST-Based Intelligence for Init

Goal: Use tree-sitter to send richer context to the Mastra workflow.

New file: src/lib/init/ast-context.ts

type ProjectAstContext = {
  frameworks: FrameworkSignal[];
  entryPoints: EntryPoint[];
  existingSentry: SentryConfig[];
  fileOutlines: Record<string, string>;
};

async function buildProjectAstContext(
  cwd: string,
  dirListing: DirEntry[]
): Promise<ProjectAstContext | null>

File selection heuristic (scan a targeted subset, not the whole tree):

  1. Known entry point filenames: src/index.ts, src/app.ts, app.py, manage.py, main.go, pages/_app.tsx, app/layout.tsx, next.config.js, etc.
  2. Files matching *sentry*
  3. Fallback: first 5 source files in dirListing

Modify: src/lib/init/wizard-runner.ts

After precomputeDirListing(), before run.startAsync():

let astContext = null;
try {
  const { buildProjectAstContext } = await import("./ast-context.js");
  astContext = await buildProjectAstContext(directory, dirListing);
} catch {
  // AST unavailable — continue without it
}

Pass astContext in inputData to the workflow.

New local-op: parse-files

Register as a new operation type in local-ops.ts for the Mastra workflow to request on-demand AST analysis.

Modify: src/lib/init/types.ts

  • Add ParseFilesPayload type
  • Add to LocalOpPayload union

Phase 5: AST Validation of Applied Patches

Goal: Verify modified files are syntactically valid after patching.

New file: src/lib/init/ast-validation.ts

type ValidationResult = {
  valid: boolean;
  errors: Array<{ line: number; column: number; message: string }>;
};

async function validateSyntax(content: string, filePath: string): Promise<ValidationResult>

Implementation: parse with tree-sitter, walk CST for ERROR/MISSING nodes.

Modify: src/lib/init/local-ops.ts

In applySinglePatch() after writing a modify patch:

const result = await validateSyntax(content, patch.path);
if (!result.valid) {
  log.warn(`Syntax issues in ${patch.path}: ${result.errors.length} error(s)`);
}

Advisory only (warn, don't fail). Validation errors are included in the patchset result metadata so the Mastra workflow can decide whether to retry.


Recommended Build Order

Phase 1 (content hashing)    ─── 1 day, no deps, immediate value
    ↓
Phase 2 (atomic rollback)    ─── 1.5 days, no deps, immediate value
    ↓
Phase 3 (tree-sitter infra)  ─── 3-4 days, adds web-tree-sitter
    ↓
Phase 4 (AST intelligence)   ─── 3-4 days, uses Phase 3
    ↓
Phase 5 (AST validation)     ─── 2 days, uses Phase 3

Phases 1-2 are dependency-free quick wins. Phase 3 is the risky foundation. Phases 4-5 build on it.

Total: ~11-13 days


Key Files Modified

File Phase Change
src/lib/init/local-ops.ts 1,2,4 Content hashing, atomic rollback, parse-files handler
src/lib/init/types.ts 1,2,4 New payload types, result fields
src/lib/init/wizard-runner.ts 4 Pass astContext to workflow
package.json 3 Add web-tree-sitter devDependency

Key Files Created

File Phase
src/lib/init/content-hash.ts 1
src/lib/ast/parser.ts 3
src/lib/ast/grammar-loader.ts 3
src/lib/ast/queries.ts 3
src/lib/ast/errors.ts 3
src/lib/ast/index.ts 3
src/lib/init/ast-context.ts 4
src/lib/init/ast-validation.ts 5

Risks

Risk Level Mitigation
web-tree-sitter WASM loading in Bun compiled binary High Download WASM on demand (not embedded). Test early in Phase 3.
Grammar CDN unreliable Medium Host grammars on GitHub Releases or bundle JS/TS grammars as fallback (~2MB)
Tree-sitter adds complexity for limited init-specific value Medium Phases 1-2 deliver value without tree-sitter. AST phases are additive, not critical path.
Mastra workflow ignores astContext initially Low Context is advisory — server can adopt it incrementally. Local value (validation) is standalone.

Verification

  1. Phase 1: bun test test/lib/init/content-hash — property tests pass
  2. Phase 2: bun test test/lib/init/local-ops-rollback — rollback tests pass
  3. Phase 3: bun test test/lib/ast/parser — JS/TS parsing works
  4. Phase 4: bun test test/lib/init/ast-context — framework detection returns signals
  5. Phase 5: bun test test/lib/init/ast-validation — catches broken syntax
  6. Integration: bun run dev -- init ./test-project --dry-run — wizard completes with AST context
  7. Build: bun run build — binary size increase < 100KB (only JS loader bundled)
  8. Full suite: bun run typecheck && bun run lint && bun test

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions