fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback by AbirAbbas · Pull Request #42 · Agent-Field/pr-af

AbirAbbas · 2026-06-05T17:19:07Z

Summary

A real review with 3 findings on a PR (Agent-Field/github-buddy#73) found genuine bugs but posted nothing to GitHub. Root cause turned out to be two stacked issues; this PR fixes both.

Root cause

Severity vocabulary mismatch (why nothing posted). PR-AF gates posting behind a HITL approval request (hax-sdk pr-af-review-v1). One finding came back with severity: "high". The hax-sdk template enforces a zod enum {critical, important, suggestion, nitpick}, so the approval-request create returned 422 (Invalid enum value … received 'high'). request_review_approval treats any create failure as REJECT ("never post an unreviewed review") → the whole review was dropped, and the run still "succeeded".
- It slipped through every guard because PR-AF typed severity as a bare str (the allowed set was only a comment). So the model's "high" was a valid string to PR-AF's pydantic schema and to the SDK harness's schema-validation/retry loop — there was no validation failure to retry on. The enum lived only at the hax-sdk boundary, which has no retry/feedback loop back to the model.
Cross-project callback (why the morning's answered review never resumed). approval_webhook_url() built the callback from AGENTFIELD_SERVER (the internal control-plane.railway.internal address the agent uses to reach the CP). hax-sdk runs in a separate Railway project, so the webhook fetch failed and the response never got back.

Fixes

schemas/severity.py — single source of truth. Severity = Annotated[Literal[...], BeforeValidator(normalize_severity)]: the JSON schema handed to the model now advertises the enum, and stray labels are coerced (high→important, medium→suggestion, low→nitpick) instead of failing — deterministic, with no retry storms or dropped findings.
Applied to ReviewFinding, ScoredFinding, _CompoundFinding, CrossRefInteraction; revised_severity normalized at consumption (model_copy bypasses validators); normalized once more in _finding_payload as a last line of defense before the hax create.
Reviewer prompts now state severity MUST be exactly one of the four values.
Surfaced the silent failures: the HITL "not posting" log now includes the underlying create error, and a total schema-parse failure in review_dimension no longer masquerades as "0 findings".
Cross-project callback: approval_webhook_url() now prefers AGENTFIELD_PUBLIC_URL (public, cross-project reachable), falling back to AGENTFIELD_SERVER for single-project/local setups — keeping agent→CP traffic internal.

Test Plan

ruff check src/ scripts/ clean (the CI lint gate)
pytest — 56 passed (py3.12), incl. new test_severity.py (normalize + model coercion + payload boundary) and approval_webhook_url precedence tests
Verified the generated JSON schema for severity now emits enum: [critical, important, suggestion, nitpick]

Companion ops change (not in this PR)

Set AGENTFIELD_PUBLIC_URL=https://<control-plane-public-domain> on the pr-af Railway service so the callback fix takes effect in production. (AGENTFIELD_SERVER stays internal.)

🤖 Generated with Claude Code

…allback The approval-response webhook is invoked by hax-sdk, which runs in a separate Railway project. approval_webhook_url() built it from AGENTFIELD_SERVER — the internal address (control-plane.railway.internal) the agent uses to reach the CP — so the callback 'fetch failed' cross-project and answered reviews never got back to resume the run. Prefer AGENTFIELD_PUBLIC_URL (public, cross-project reachable) and fall back to AGENTFIELD_SERVER for single-project/local setups, keeping agent->CP traffic internal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…post A finding with severity 'high' silently sank an entire review: the hax-sdk pr-af-review-v1 template enforces a zod enum {critical,important,suggestion, nitpick}, so the approval-request create 422'd, and the gate treats any create failure as REJECT -> nothing posted. PR-AF's own models typed severity as a bare str (the enum was only a comment), so neither the model nor the SDK's schema-validation/retry loop ever caught 'high' — it was a valid string to them. - Add schemas/severity.py: the single source of truth. Severity is an Annotated[Literal, BeforeValidator] so the JSON schema handed to the model advertises the enum AND stray labels are coerced (high->important, medium->suggestion, low->nitpick) instead of failing — deterministic, no retry storms or dropped findings. - Apply it to ReviewFinding, ScoredFinding, _CompoundFinding, CrossRefInteraction; normalize revised_severity at consumption (model_copy bypasses validators); normalize once more in _finding_payload as a last line of defense. - Constrain severity to the four values explicitly in the reviewer prompts. - Surface the failures that hid this: the HITL no-post reason now includes the underlying create error, and a total schema-parse failure in review_dimension no longer masquerades as '0 findings'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AbirAbbas and others added 2 commits June 5, 2026 13:18

AbirAbbas merged commit 30a5e1c into main Jun 5, 2026
2 checks passed

AbirAbbas deleted the fix/pr-af-severity-vocab-and-hitl-callback branch June 5, 2026 18:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback#42

fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback#42
AbirAbbas merged 2 commits into
mainfrom
fix/pr-af-severity-vocab-and-hitl-callback

AbirAbbas commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AbirAbbas commented Jun 5, 2026

Summary

Root cause

Fixes

Test Plan

Companion ops change (not in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant