Skip to content

feat(providers/anthropic): surface refusal stop_reason as content-filter#41

Draft
jscottmiller wants to merge 2 commits into
coder_2_33from
scott/anthropic-refusal-content-filter
Draft

feat(providers/anthropic): surface refusal stop_reason as content-filter#41
jscottmiller wants to merge 2 commits into
coder_2_33from
scott/anthropic-refusal-content-filter

Conversation

@jscottmiller

Copy link
Copy Markdown

Summary

Map Anthropic's refusal stop reason to a content-filter finish reason and
preserve the block details, so downstream callers can tell a safety block apart
from a normal stop and surface why a turn produced no content.

Anthropic's real-time safety classifiers end a response with
stop_reason: "refusal" and a stop_details object (category,
explanation), and emit no content. Previously mapFinishReason fell through
to FinishReasonUnknown and the details were dropped.

Changes

  • mapFinishReason: "refusal" -> FinishReasonContentFilter.
  • Parse stop_details from the message_delta raw JSON (the Anthropic SDK does
    not model it) and attach it to the finish stream part via a new
    RefusalMetadata{Category, Explanation} provider-metadata type
    (registered, marshal/unmarshal, GetRefusalMetadata getter).

Testing

  • go test ./providers/anthropic/ (new refusal_test.go covers the finish-reason
    mapping, stop_details parsing, and metadata round-trip).

Notes

Consumed by coder/coder PR for chat "Response blocked" messaging.


Coder Agents generated.

Map Anthropic's "refusal" stop_reason to FinishReasonContentFilter and
capture stop_details (category, explanation) into the finish part's
ProviderMetadata via a new RefusalMetadata type, so callers can surface why
a turn produced no content.

Coder Agents generated.

Copy link
Copy Markdown
Author

Verified the claim in the message_delta comment ("The Anthropic SDK does not model stop_details"): true for the SDK we pin, with one nuance.

  • Our chain (charmbracelet/anthropic-sdk-go replaced by the coder/anthropic-sdk-go fork) is based on upstream v1.26.0; no stop_details/StopDetails anywhere in the module (it does model the refusal stop-reason enum).
  • Upstream added structured stop_details in v1.29.0, and v1.46.1 fixed the beta accumulator dropping stop_details from message_delta events.

So the raw-JSON parsing is the right call today. Suggested follow-ups, not blocking:

  1. Reword the comment to "the SDK fork we pin (v1.26.0 base) does not model stop_details" so it does not read as a claim about current upstream.
  2. When the fork is rebased onto >= v1.29.0 (ideally >= v1.46.1), replace parseAnthropicRefusal with the typed field.

Coder Agents generated (on behalf of @jscottmiller).

Copy link
Copy Markdown
Author

Merge-order note relative to #40 (the main -> coder_2_33 sync):

  • A test merge (git merge-tree) of the two branches shows this PR adds exactly one conflicted file to the sync, providers/anthropic/anthropic.go; provider_options.go and the new test auto-merge. chore: merge main into coder_2_33 #40 already conflicts with current coder_2_33 in two OpenAI files (it was cut before fix(providers/anthropic): send thinking display #39 merged), so it needs a refresh regardless.
  • Resolution is mechanical either way: upstream main keeps the same structure at all three insertion points (mapFinishReason switch, stream event switch, finish-part ProviderMetadata), and contains no refusal/stop_details handling of its own.
  • chore: merge main into coder_2_33 #40 does not bump the coder/anthropic-sdk-go pin (still v1.26.0 base), so the raw-JSON stop_details parsing here remains necessary after the sync.

Since the refusal mapping is not Coder-specific, it is also a candidate to send upstream to charmbracelet/fantasy rather than carrying it as a fork patch long-term.


Coder Agents generated (on behalf of @jscottmiller).

Copy link
Copy Markdown
Author

/coder-agents-review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant