Skip to content

fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback#42

Merged
AbirAbbas merged 2 commits into
mainfrom
fix/pr-af-severity-vocab-and-hitl-callback
Jun 5, 2026
Merged

fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback#42
AbirAbbas merged 2 commits into
mainfrom
fix/pr-af-severity-vocab-and-hitl-callback

Conversation

@AbirAbbas
Copy link
Copy Markdown
Contributor

Summary

A real review with 3 findings on a PR (Agent-Field/github-buddy#73) found genuine bugs but posted nothing to GitHub. Root cause turned out to be two stacked issues; this PR fixes both.

Root cause

  1. Severity vocabulary mismatch (why nothing posted). PR-AF gates posting behind a HITL approval request (hax-sdk pr-af-review-v1). One finding came back with severity: "high". The hax-sdk template enforces a zod enum {critical, important, suggestion, nitpick}, so the approval-request create returned 422 (Invalid enum value … received 'high'). request_review_approval treats any create failure as REJECT ("never post an unreviewed review") → the whole review was dropped, and the run still "succeeded".
    • It slipped through every guard because PR-AF typed severity as a bare str (the allowed set was only a comment). So the model's "high" was a valid string to PR-AF's pydantic schema and to the SDK harness's schema-validation/retry loop — there was no validation failure to retry on. The enum lived only at the hax-sdk boundary, which has no retry/feedback loop back to the model.
  2. Cross-project callback (why the morning's answered review never resumed). approval_webhook_url() built the callback from AGENTFIELD_SERVER (the internal control-plane.railway.internal address the agent uses to reach the CP). hax-sdk runs in a separate Railway project, so the webhook fetch failed and the response never got back.

Fixes

  • schemas/severity.py — single source of truth. Severity = Annotated[Literal[...], BeforeValidator(normalize_severity)]: the JSON schema handed to the model now advertises the enum, and stray labels are coerced (high→important, medium→suggestion, low→nitpick) instead of failing — deterministic, with no retry storms or dropped findings.
  • Applied to ReviewFinding, ScoredFinding, _CompoundFinding, CrossRefInteraction; revised_severity normalized at consumption (model_copy bypasses validators); normalized once more in _finding_payload as a last line of defense before the hax create.
  • Reviewer prompts now state severity MUST be exactly one of the four values.
  • Surfaced the silent failures: the HITL "not posting" log now includes the underlying create error, and a total schema-parse failure in review_dimension no longer masquerades as "0 findings".
  • Cross-project callback: approval_webhook_url() now prefers AGENTFIELD_PUBLIC_URL (public, cross-project reachable), falling back to AGENTFIELD_SERVER for single-project/local setups — keeping agent→CP traffic internal.

Test Plan

  • ruff check src/ scripts/ clean (the CI lint gate)
  • pytest — 56 passed (py3.12), incl. new test_severity.py (normalize + model coercion + payload boundary) and approval_webhook_url precedence tests
  • Verified the generated JSON schema for severity now emits enum: [critical, important, suggestion, nitpick]

Companion ops change (not in this PR)

Set AGENTFIELD_PUBLIC_URL=https://<control-plane-public-domain> on the pr-af Railway service so the callback fix takes effect in production. (AGENTFIELD_SERVER stays internal.)

🤖 Generated with Claude Code

AbirAbbas and others added 2 commits June 5, 2026 13:18
…allback

The approval-response webhook is invoked by hax-sdk, which runs in a separate
Railway project. approval_webhook_url() built it from AGENTFIELD_SERVER — the
internal address (control-plane.railway.internal) the agent uses to reach the
CP — so the callback 'fetch failed' cross-project and answered reviews never
got back to resume the run.

Prefer AGENTFIELD_PUBLIC_URL (public, cross-project reachable) and fall back to
AGENTFIELD_SERVER for single-project/local setups, keeping agent->CP traffic
internal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…post

A finding with severity 'high' silently sank an entire review: the hax-sdk
pr-af-review-v1 template enforces a zod enum {critical,important,suggestion,
nitpick}, so the approval-request create 422'd, and the gate treats any create
failure as REJECT -> nothing posted. PR-AF's own models typed severity as a bare
str (the enum was only a comment), so neither the model nor the SDK's
schema-validation/retry loop ever caught 'high' — it was a valid string to them.

- Add schemas/severity.py: the single source of truth. Severity is an
  Annotated[Literal, BeforeValidator] so the JSON schema handed to the model
  advertises the enum AND stray labels are coerced (high->important,
  medium->suggestion, low->nitpick) instead of failing — deterministic, no
  retry storms or dropped findings.
- Apply it to ReviewFinding, ScoredFinding, _CompoundFinding, CrossRefInteraction;
  normalize revised_severity at consumption (model_copy bypasses validators);
  normalize once more in _finding_payload as a last line of defense.
- Constrain severity to the four values explicitly in the reviewer prompts.
- Surface the failures that hid this: the HITL no-post reason now includes the
  underlying create error, and a total schema-parse failure in review_dimension
  no longer masquerades as '0 findings'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas merged commit 30a5e1c into main Jun 5, 2026
2 checks passed
@AbirAbbas AbirAbbas deleted the fix/pr-af-severity-vocab-and-hitl-callback branch June 5, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant