Skip to content

User errors in alarms are marked as retryCountsAgainstLimit#6443

Open
jqmmes wants to merge 4 commits intomainfrom
joaquim/alarm-jsg-errors-should-be-user
Open

User errors in alarms are marked as retryCountsAgainstLimit#6443
jqmmes wants to merge 4 commits intomainfrom
joaquim/alarm-jsg-errors-should-be-user

Conversation

@jqmmes
Copy link
Copy Markdown
Contributor

@jqmmes jqmmes commented Mar 27, 2026

This PR adds a isExceptionJsgError to check if an exception was generated by a user, marking it correctly as retryCountsAgainstLimit in runAlarm.

Copy link
Copy Markdown
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR adds isExceptionJsgError to detect user-generated jsg.Error exceptions in alarm error descriptions, ensuring they count against retry limits.

Findings (by severity):

  1. [MEDIUM] The header comment claims the function strips "remote." prefixes, but the implementation only strips "remote exception: " — not the "remote." prefix that annotateBroken() can produce. The comment should match the actual behavior, or (better) the function should handle "remote." too.
  2. [MEDIUM] Consider reusing tunneledErrorType() instead of re-implementing prefix stripping — it already handles "remote.", "expected ", and "broken." prefixes correctly, and would be more robust against future format changes.
  3. [LOW] No unit tests for isExceptionJsgError. The function does non-trivial string parsing with a loop; tests for the key cases (plain jsg.Error, with broken.* prefix, with remote exception: prefix, negative cases like jsg-internal.Error, jsg.TypeError, and malformed broken. without delimiter) would increase confidence.

This review was generated by an AI assistant and may contain inaccuracies.

@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 27, 2026

I'm Bonk, and I've done a quick review of your PR.

This PR adds isExceptionJsgError to detect user-generated jsg.Error exceptions in alarm error descriptions, ensuring they count against retry limits.

Findings posted as inline review comments (by severity):

  1. [MEDIUM] The header comment in exception.h:160 says the function strips "remote." prefixes, but the implementation only strips "remote exception: " -- not the "remote." prefix that annotateBroken() can produce. Suggested a comment fix.

  2. [MEDIUM] The hand-rolled prefix-stripping loop in exception.c++:174-184 could be replaced by reusing tunneledErrorType(), which already handles "remote.", "expected ", and "broken." prefixes correctly. Suggested a one-liner replacement.

  3. [LOW] No unit tests for isExceptionJsgError -- the function does non-trivial string parsing and would benefit from test coverage.

github run

@jqmmes jqmmes force-pushed the joaquim/alarm-jsg-errors-should-be-user branch from e376767 to e60199e Compare March 27, 2026 17:45
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 27, 2026

The code is consistent. Both error-handling paths (the exception catch at line 540 and the output lock failure at line 609) use the same pattern with isJsgError.

Let me verify one more subtle thing — the exception-test.c++ test needs to be in the BUILD.bazel deps correctly. The glob at line 296-310 picks up *-test.c++ excluding specific files, and exception-test.c++ is not in the exclude list. The deps are [":jsg"]. Let me check that exception.h is reachable through :jsg:

Looking at line 18-19 of BUILD.bazel, :jsg depends on :exception. The :exception target (line 215-223) has exception.c++ as srcs and exception.h as hdrs. The test #include "exception.h" will work via the transitive dep.

However, there's an issue: the test only uses symbols from :exception (just isExceptionJsgError and isExceptionFromInputGateBroken), but the glob-based rule links it against :jsg which pulls in V8 and everything else. This is wasteful but functional — and it's the existing pattern for the glob rule, so it's fine. Not worth flagging.

Let me now check if there are any remaining issues I should flag. The comment at line 176 says "starts with 'Error'" but the code checks "Error:"_kj — this is a minor inconsistency in the comment but not a bug. Actually, the comment is fine: "starts with 'Error'" is informal, and the code correctly checks "Error:" to distinguish from "TypeError:", "RangeError:", etc.

Let me think about whether this change needs a compat flag or autogate. This is a behavioral change in how alarm retries are classified — user-thrown jsg.Error exceptions will now count against retry limits when they previously didn't. However:

  1. This is a bug fix — user errors should have been counting against limits already.
  2. The change is in server-side retry behavior, not in the JavaScript API surface.
  3. The effect is that alarms with user errors stop retrying sooner (a good thing — prevents infinite retry loops for user bugs).

This doesn't need a compat flag because it's fixing incorrect server-side behavior, not changing the JS API contract. It also doesn't seem risky enough to need an autogate since it's narrowly scoped to jsg.Error specifically.

All my previous review findings have been addressed. The implementation is clean, uses tunneledErrorType() as I suggested, has comprehensive tests, and the logic is consistent across both error paths. There are no new actionable issues.

LGTM

github run

@jqmmes jqmmes force-pushed the joaquim/alarm-jsg-errors-should-be-user branch 2 times, most recently from f19ad88 to d5ba916 Compare March 27, 2026 18:41
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 27, 2026

Everything looks consistent and correct. The implementation is clean, the test coverage is comprehensive, and the previous review findings have been addressed. The approach of not using tunneledErrorType() is actually the correct design choice given the need for an allowlist of broken prefixes.

LGTM

github run

jqmmes added 4 commits March 27, 2026 18:55
These jsg errors are thrown from the internal system. They should be jsg-internal
Adds a isExceptionJsgError to check if an exception was generated by a user
Checks if an exception is generated from user code, and classifies it correctly.
@jqmmes jqmmes force-pushed the joaquim/alarm-jsg-errors-should-be-user branch from d5ba916 to 22fb2f7 Compare March 27, 2026 18:55
@jqmmes jqmmes marked this pull request as ready for review March 27, 2026 18:58
@jqmmes jqmmes requested review from a team as code owners March 27, 2026 18:58
@ask-bonk
Copy link
Copy Markdown
Contributor

ask-bonk bot commented Mar 27, 2026

The implementation is clean and correct. The prefix stripping order is correct: first "remote exception: " (KJ RPC tunneling), then "remote." (annotateBroken), then one allowlisted broken.* prefix.

One subtle observation: isExceptionFromInputGateBroken (line 168-171) does NOT strip "remote." prefixes - it only strips "remote exception: ". This means remote.broken.inputGateBroken; jsg.Error: would NOT be caught by isExceptionFromInputGateBroken but WOULD be caught by isExceptionJsgError (since it strips "remote." then sees broken. which isn't in the allowlist, so it doesn't match). Wait - actually no. After stripping "remote.", we'd have broken.inputGateBroken; jsg.Error: which is NOT in the allowlist, so isExceptionJsgError returns false. Good - the test at line 46-48 confirms this.

But wait - there's actually an edge case here. If someone has remote.broken.outputGateBroken; jsg.Error:, isExceptionJsgError strips "remote." to get broken.outputGateBroken; jsg.Error: which matches. But isExceptionFromInputGateBroken can't handle remote.broken.inputGateBroken; ... - the test at line 116-120 documents this gap. However this is a pre-existing issue with isExceptionFromInputGateBroken, not introduced by this PR.

I've completed my analysis. The PR is well-structured, addresses the previous review feedback, has comprehensive tests, and the behavioral changes are correct. The jsg.Error to jsg-internal.Error changes in actor-cache.c++ and worker.c++ are proper defensive hardening. There are no actionable issues.

LGTM

github run

// Returns true if the exception is a jsg.Error in one of these specific forms (after stripping
// any "remote." prefixes): plain jsg.Error, or jsg.Error inside a broken.outputGateBroken,
// broken.exceededCpu, or broken.exceededMemory annotation. Other broken prefixes are excluded.
bool isExceptionJsgError(kj::StringPtr description);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually any different than isTunneledException() in practice?

Even if it is, please extend tunneledErrorType() to cover what you want, and have this function call it. Currently you've implemented entirely new, separate parsing code, duplicating the code that already exists -- let's not do that.

break;
}
}
return description.startsWith("jsg.Error:"_kj);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only catches errors of type Error, not e.g. TypeError, RangeError, etc.

Per my other comment, please don't write new parsing code -- reuse the existing parsing code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants