fix(pr-af): enforce canonical severity vocabulary + cross-project HITL callback#42
Merged
Merged
Conversation
…allback The approval-response webhook is invoked by hax-sdk, which runs in a separate Railway project. approval_webhook_url() built it from AGENTFIELD_SERVER — the internal address (control-plane.railway.internal) the agent uses to reach the CP — so the callback 'fetch failed' cross-project and answered reviews never got back to resume the run. Prefer AGENTFIELD_PUBLIC_URL (public, cross-project reachable) and fall back to AGENTFIELD_SERVER for single-project/local setups, keeping agent->CP traffic internal. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…post
A finding with severity 'high' silently sank an entire review: the hax-sdk
pr-af-review-v1 template enforces a zod enum {critical,important,suggestion,
nitpick}, so the approval-request create 422'd, and the gate treats any create
failure as REJECT -> nothing posted. PR-AF's own models typed severity as a bare
str (the enum was only a comment), so neither the model nor the SDK's
schema-validation/retry loop ever caught 'high' — it was a valid string to them.
- Add schemas/severity.py: the single source of truth. Severity is an
Annotated[Literal, BeforeValidator] so the JSON schema handed to the model
advertises the enum AND stray labels are coerced (high->important,
medium->suggestion, low->nitpick) instead of failing — deterministic, no
retry storms or dropped findings.
- Apply it to ReviewFinding, ScoredFinding, _CompoundFinding, CrossRefInteraction;
normalize revised_severity at consumption (model_copy bypasses validators);
normalize once more in _finding_payload as a last line of defense.
- Constrain severity to the four values explicitly in the reviewer prompts.
- Surface the failures that hid this: the HITL no-post reason now includes the
underlying create error, and a total schema-parse failure in review_dimension
no longer masquerades as '0 findings'.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A real review with 3 findings on a PR (Agent-Field/github-buddy#73) found genuine bugs but posted nothing to GitHub. Root cause turned out to be two stacked issues; this PR fixes both.
Root cause
pr-af-review-v1). One finding came back withseverity: "high". The hax-sdk template enforces a zod enum{critical, important, suggestion, nitpick}, so the approval-request create returned 422 (Invalid enum value … received 'high').request_review_approvaltreats any create failure as REJECT ("never post an unreviewed review") → the whole review was dropped, and the run still "succeeded".severityas a barestr(the allowed set was only a comment). So the model's"high"was a valid string to PR-AF's pydantic schema and to the SDK harness's schema-validation/retry loop — there was no validation failure to retry on. The enum lived only at the hax-sdk boundary, which has no retry/feedback loop back to the model.approval_webhook_url()built the callback fromAGENTFIELD_SERVER(the internalcontrol-plane.railway.internaladdress the agent uses to reach the CP). hax-sdk runs in a separate Railway project, so the webhookfetch failedand the response never got back.Fixes
schemas/severity.py— single source of truth.Severity = Annotated[Literal[...], BeforeValidator(normalize_severity)]: the JSON schema handed to the model now advertises the enum, and stray labels are coerced (high→important,medium→suggestion,low→nitpick) instead of failing — deterministic, with no retry storms or dropped findings.ReviewFinding,ScoredFinding,_CompoundFinding,CrossRefInteraction;revised_severitynormalized at consumption (model_copybypasses validators); normalized once more in_finding_payloadas a last line of defense before the hax create.review_dimensionno longer masquerades as "0 findings".approval_webhook_url()now prefersAGENTFIELD_PUBLIC_URL(public, cross-project reachable), falling back toAGENTFIELD_SERVERfor single-project/local setups — keeping agent→CP traffic internal.Test Plan
ruff check src/ scripts/clean (the CI lint gate)pytest— 56 passed (py3.12), incl. newtest_severity.py(normalize + model coercion + payload boundary) andapproval_webhook_urlprecedence testsseveritynow emitsenum: [critical, important, suggestion, nitpick]Companion ops change (not in this PR)
Set
AGENTFIELD_PUBLIC_URL=https://<control-plane-public-domain>on the pr-af Railway service so the callback fix takes effect in production. (AGENTFIELD_SERVERstays internal.)🤖 Generated with Claude Code