revert: remove all Sentry error suppression (12 PRs)#2959
Conversation
…tinyhumansai#2850 tinyhumansai#2899 tinyhumansai#2906 tinyhumansai#2915 tinyhumansai#2923 tinyhumansai#2924 tinyhumansai#2928 tinyhumansai#2930 tinyhumansai#2934 tinyhumansai#2937 tinyhumansai#2939 These PRs suppressed/demoted errors instead of fixing root causes. Errors will now properly fire Sentry events again so they can be tracked and fixed with proper root-cause solutions.
📝 WalkthroughWalkthroughThis PR removes several observability error classification pathways and Sentry suppression filters. Core changes: ChangesError classification pathway removal and Sentry filter simplification
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/inference/ops.rs (1)
182-182: ⚡ Quick winAdd correlation fields to the error log.
Include
workloadandprovideron this error event to keep failure logs queryable with the same context as the start event.As per coding guidelines: “Prefix debug logs with stable, grep-friendly identifiers … and include correlation fields such as request IDs, method names, or entity IDs.”Suggested diff
- Err(err) => error!(error = %err, "{LOG_PREFIX} test_provider_model:error"), + Err(err) => error!( + workload, + provider, + error = %err, + "{LOG_PREFIX} test_provider_model:error" + ),🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/ops.rs` at line 182, The error log at the Err branch currently only logs the error; update the error! invocation in the Err(err) arm to include the correlation fields workload and provider (e.g. error!(workload = %workload, provider = %provider, error = %err, "{LOG_PREFIX} test_provider_model:error")), keeping the existing LOG_PREFIX message and error formatting; ensure workload and provider are in scope or passed into the function containing this Err(err) match arm (refer to the Err branch where error!(...) is called).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/provider/config_rejection.rs`:
- Around line 170-192: Run the project's formatter and commit the changes so CI
passes: execute pnpm format (which runs rustfmt/prettier) and ensure cargo fmt
has formatted Rust files; then add and commit the updated file containing the
string entries "thinking mode must be passed back" and "requires a subscription,
upgrade for access" so pnpm format:check and cargo fmt --check succeed in CI.
---
Nitpick comments:
In `@src/openhuman/inference/ops.rs`:
- Line 182: The error log at the Err branch currently only logs the error;
update the error! invocation in the Err(err) arm to include the correlation
fields workload and provider (e.g. error!(workload = %workload, provider =
%provider, error = %err, "{LOG_PREFIX} test_provider_model:error")), keeping the
existing LOG_PREFIX message and error formatting; ensure workload and provider
are in scope or passed into the function containing this Err(err) match arm
(refer to the Err branch where error!(...) is called).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 07dd0604-9948-4940-a388-c4c8b3baedc0
📒 Files selected for processing (8)
src/core/observability.rssrc/main.rssrc/openhuman/inference/ops.rssrc/openhuman/inference/ops_tests.rssrc/openhuman/inference/provider/config_rejection.rssrc/openhuman/inference/provider/ops.rssrc/openhuman/memory_store/factories.rssrc/openhuman/subconscious/store.rs
💤 Files with no reviewable changes (2)
- src/openhuman/inference/ops_tests.rs
- src/main.rs
| // thinking-mode model (DeepSeek-R1 / Moonshot K2-thinking on | ||
| // `provider=cloud` custom_openai) rejects a follow-up turn that | ||
| // doesn't echo the prior assistant's `reasoning_content` field. | ||
| // Body shape (backtick-quoted JSON literal in the upstream body): | ||
| // `{"error":{"message":"The `reasoning_content` in the thinking | ||
| // mode must be passed back to the API.",...}}`. The | ||
| // provider-contract gap is on our side, but until the thinking- | ||
| // mode round-tripping ships in the inference layer, every affected | ||
| // turn fires a fresh Sentry event — and the UI already surfaces | ||
| // the actionable error to the user. Anchor on the unique | ||
| // `thinking mode must be passed back` substring so the match | ||
| // doesn't depend on the upstream's backtick-quoting around | ||
| // `reasoning_content` (some provider versions ship without them). | ||
| "thinking mode must be passed back", | ||
| // TAURI-RUST-4XK (~649 events) — Ollama Cloud subscription gate. | ||
| // Body: `{"error":"this model requires a subscription, upgrade for | ||
| // access: https://ollama.com/upgrade (ref: <uuid>)"}` on a 403 | ||
| // Forbidden from `compatible::OpenAiCompatibleProvider` with | ||
| // `name = "ollama"`. User-state: the model picked in Settings is | ||
| // a paid-tier Ollama Cloud model the user's account doesn't | ||
| // cover. The UI surfaces an actionable upgrade link in the | ||
| // remediation message itself. | ||
| "requires a subscription, upgrade for access", |
There was a problem hiding this comment.
Run formatter to unblock CI.
cargo fmt --check and pnpm format:check are failing for this file, so this PR is currently blocked. Please run pnpm format and commit the formatter output.
As per coding guidelines **/*.{ts,tsx,rs,json,js}: “Run pnpm format … before committing; pnpm format:check … must pass in CI.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/inference/provider/config_rejection.rs` around lines 170 - 192,
Run the project's formatter and commit the changes so CI passes: execute pnpm
format (which runs rustfmt/prettier) and ensure cargo fmt has formatted Rust
files; then add and commit the updated file containing the string entries
"thinking mode must be passed back" and "requires a subscription, upgrade for
access" so pnpm format:check and cargo fmt --check succeed in CI.
# Conflicts: # src/core/observability.rs # src/openhuman/inference/provider/ops.rs
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus the revert itself reads clean — it's a symmetric removal of the suppression branches plus their before_send call sites in main.rs, and the inline tests were updated to match the reverted classifier behavior. The intent (resume firing these to Sentry and fix root causes separately) is clear. A couple of things to sort before I can approve:
CI is red, so I'm holding the full approve:
PR Submission Checklistis failing only because the two Test plan checkboxes are still- [ ]. Oncecargo check/ no-compile-errors are confirmed, tick them and that check goes green.E2E (Playwright) web lane 2/4and3/4are failing. This is a Rust-only change with no frontend diff, so these look unrelated/flaky — but they need to be green (or re-run) before merge, not waved through.
[major] Out-of-scope change in src/openhuman/inference/provider/ops.rs
The stated scope is reverting Sentry suppression across the 12 listed PRs, but list_models also drops the OpenAI Codex OAuth routing path — resolve_openai_codex_routing, the OAuth client_version query param, openai_codex_user_agent, and the OPENAI_CODEX_ACCOUNT_HEADER — replacing it with a plain endpoint + bare Bearer request. That helper (resolve_openai_codex_routing in openai_codex.rs) still exists on this branch, so it's now orphaned for this path, and Codex OAuth users would lose the account header + OAuth user-agent on model listing (likely 401s). This has nothing to do with error suppression — was it intentionally bundled in, or did the revert sweep up an unrelated commit? If it's not part of the suppression revert, please split it out.
Everything else looks like a faithful revert. Fix CI + confirm that Codex routing removal is intentional and I'll do the final pass and approve.
# Conflicts: # src/core/observability_tests.rs
…ider_model error log (addresses @coderabbitai on src/openhuman/inference/ops.rs:182) The Err branch logged only the error, while the :start event logs workload and provider. Add the same correlation fields to the :error event so failure logs stay queryable with the same context. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus heads up — CI isn't green yet (Rust Core Coverage, Frontend Coverage, and E2E web lane 1/4 are re-running/failing), so I'll hold a full approval until those settle. Thanks for adding the workload/provider correlation fields to the error log in the latest commit — good catch keeping the failure path queryable.
One thing I want to flag before this lands, separate from CI:
[major] src/openhuman/inference/provider/ops.rs — list_models: this revert also strips the OpenAI Codex OAuth routing (resolve_openai_codex_routing, the codex client_version query param, the codex user-agent / originator / account headers, and merge_openai_codex_model_hints). That code came from #2748 (Codex OAuth provider support), not from any of the 12 Sentry-suppression PRs this is meant to revert. The chat path in factory.rs still routes Codex through the codex backend, so after this PR model listing and chat disagree: list_models would hit the configured endpoint without the codex client_version/account header and skip the codex model hints, which breaks model discovery for Codex OAuth providers. Recommend dropping that hunk so the revert stays scoped to Sentry suppression — happy to re-check once it's split out.
Once CI is green and that's sorted I'll come back and approve.
…everted classifier The merge brought in main's `inference_provider_factory_and_classifiers_cover_user_state_edges` coverage test, which asserted `"unknown parameter: tools"` classifies as a provider-config rejection. PR tinyhumansai#2959 reverts that suppression (the pattern was removed from `config_rejection.rs`), so the shape now fires to Sentry again. Move the string out of the positive-assertion loop into a negative assertion documenting the reverted behavior. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus heads up — CI is still red, and at least one failure is caused by the revert itself, not flakiness, so I'll hold off on approving until it's sorted.
Rust Core Coverage fails in provider_ops_leftovers_cover_model_listing_error_shapes_and_auth_styles (tests/inference_compatible_admin_leftovers_raw_coverage_e2e.rs:345):
404 models unsupported: "provider returned 404: no model list"
Removing the 404 -> empty-list short-circuit in list_configured_models_from_config (src/openhuman/inference/provider/ops.rs) means a /models 404 now returns an Err instead of an empty { models: [], unsupported: true } payload. That's the intended behavior change for the suppression revert, but this pre-existing e2e test still asserts the old "empty list" contract and wasn't updated in this PR, so it hard-fails. Please update that test to expect the new error shape so the suite matches the revert.
Frontend Coverage (Vitest) and E2E web lane 1/4 are also red. Those look unrelated to a Rust-only diff, so a re-run or rebase on main should clear them — ping me if they persist.
The revert itself still reads clean and symmetric otherwise. Once the full suite is green I'll come back and approve.
…p only the intended tinyhumansai#2939/tinyhumansai#2899 reverts The PR branch predated tinyhumansai#2748 (Codex OAuth provider support), so merging current main auto-resolved provider/ops.rs toward the stale PR side — silently dropping tinyhumansai#2748's OAuth routing, the `data`/`models` envelope fallback, and `model_info_from_catalog_item`. tinyhumansai#2748 is NOT in the revert list, so this was an unintended regression. Restore main's provider/ops.rs (and its extracted ops_tests.rs) and re-apply ONLY the author's two intended Sentry-suppression reverts: - tinyhumansai#2939: list_models 404 no longer returns a synthetic `{unsupported: true}` success — it surfaces as an error so the failure fires to Sentry. - tinyhumansai#2899: api_error no longer suppresses rate-limit-body Sentry reports on non-429 responses. Their two suppression-specific test blocks are removed to match. Co-Authored-By: Claude <noreply@anthropic.com>
…ehavior Main's raw-coverage e2e tests asserted the tinyhumansai#2939 404→`unsupported:true` suppression. With that branch reverted, a 404 from /models now surfaces as an error ("provider returned 404: …"), so update the three affected assertions (direct calls + the inference_list_models RPC path) to expect the error. Co-Authored-By: Claude <noreply@anthropic.com>
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus heads up — CI is still red, so I'll hold a full approve until it's green. The latest commit correctly updated the 404 /models assertions, nice. But there's one more pre-existing test left asserting the old contract:
tests/inference_provider_admin_round22_raw_coverage_e2e.rs:254—provider_admin_model_listing_covers_openrouter_validation_and_local_synthesisnow fails:assertion failed: object_error.contains("nested provider failure"). The/object-error/modelsmock returns HTTP 200 with an error-shaped body ({"error":{"message":"nested provider failure"}}), and this test expects that to surface as an error. The revert dropped the logic that detected an embedded error in a 200 model-listing response, solist_configured_models("object-error")no longer returns anErr. Same class of leftover as the 404 tests you just fixed — please update (or remove) this assertion to match the reverted behavior so Rust Core Coverage goes green.
The other two red checks look unrelated to this Rust-only diff: Frontend Coverage (Vitest) fails on a notification-settings UI test, and E2E web lane 1/4 is failing on ECONNREFUSED to the mock backend (infra flake). A rebase + re-run should clear those. Once CI is fully green I'll come back and approve.
# Conflicts: # src/openhuman/inference/provider/ops_tests.rs
…app_data_sqlite_busy_errors The test function was missing its #[test] attribute, making it dead code that would never run. Introduced when the revert inlined tests from the external observability_tests.rs file.
|
@sanil-23 Addressed the outstanding items: Merge conflict resolved — Missing Re: Quality suite: |
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus thanks for the updates here.
The previous concern is resolved: the list_models 404 contract change is now reflected in the e2e coverage test (provider_ops_leftovers_cover_model_listing_error_shapes_and_auth_styles), which correctly asserts that a 404 from the models endpoint surfaces as a real error instead of a synthetic empty-list success. That's the right call — the whole point of the revert is to let these failures fire rather than be masked. The #2748 Codex OAuth model-listing path is preserved through the merge, and restoring the missing #[test] on classifies_whatsapp_data_sqlite_busy_errors means that case actually runs again. Good catch on that last one.
Code-wise this looks clean: the revert is symmetric and the tests now match the reverted behavior.
The only thing holding an approval is CI. E2E (Playwright / web lane 1/4) is currently red and the Rust Core / Frontend coverage jobs are still running. The web-lane failure looks unrelated to this Rust-only diff (likely flaky), so a rerun may well clear it — but let's get all required checks green first. Once they are, ping me and I'll approve.
sanil-23
left a comment
There was a problem hiding this comment.
@graycyrus the revert itself reads cleanly and the merge-conflict handling around #2748 (preserving the Codex OAuth model listing while keeping only the intended #2939/#2899 reverts) is well reasoned. A couple of things to sort before I can do a final approval pass:
CI is red. Two checks are failing:
Rust Core Coveragefails ontests/tools_approval_channels_raw_coverage_e2e.rs:1689(assertion failed: unknown_toolkit.output().contains("gmail_pro")). That file isn't touched by this PR and the assertion has nothing to do with Sentry suppression, so this looks like a flaky/unrelated failure rather than a regression from the revert. Please re-run it to confirm.E2E (Playwright / web lane 1/4)is also red while lanes 2-4 pass — same story, likely flaky. A re-run should settle both.
One correctness thing worth double-checking in config_rejection.rs. The revert dropped several patterns from is_provider_config_rejection_message, but a few of the removed patterns are substrings of patterns that were kept, so the suppression behaviour may be unchanged for those shapes:
- removed
"in the thinking mode must be passed back"but kept"thinking mode must be passed back"— the kept (shorter) substring still matches the same bodies. - removed
"requires a subscription"but kept"requires a subscription, upgrade for access"— bodies carrying the full phrase still match and stay suppressed.
If those two classes were meant to keep firing to Sentry, this revert doesn't actually un-suppress them. If they were intentionally left in scope (added by a PR not on the revert list), ignore this — but it's worth a sentence in the PR description either way so the next person doesn't assume they're back on. (The earlier CodeRabbit cargo fmt comment looks resolved — Rust Quality (fmt, clippy) is green now.)
Once CI is green I'll come back and approve. Let me know if you want a hand chasing down the flaky tests.
sanil-23
left a comment
There was a problem hiding this comment.
Thanks for unwinding the suppression layers — restoring real Sentry signal for genuine errors is the right call, and the bulk of this diff is exactly that: pure classification/before_send removals that let throttle, auth, and config-rejection events fire again. Tests moved cleanly and CI is green.
One change crosses from "restore Sentry noise" into "change runtime behavior," and I'd like it split out or reconsidered before merge:
[major] src/openhuman/inference/provider/ops.rs — list_configured_models_from_config
The removed block didn't just suppress a Sentry event; it returned a graceful result:
if status == NOT_FOUND {
return Ok(RpcOutcome::new(json!({ "models": [], "unsupported": true }), ...));
}
With it gone, a 404 from /models now falls straight through to the generic !status.is_success() path and returns Err(...). For OpenAI-compatible providers that legitimately have no /models endpoint (DeepSeek, Moonshot, Kimi, custom proxies — the cases the comment called out), model listing will now surface a hard error to the caller instead of an empty list. Any frontend reading the unsupported: true flag also loses it.
That's a user-facing regression, not restored telemetry. Could you keep the 404 graceful-degradation branch (you can still drop the Sentry suppression around it) and revert only the classification piece? If the intent is that a follow-up restores this behavior, it'd be safer to land that first.
Everything else (main.rs before_send filters, observability classifiers, config_rejection phrases, inference ops) is a clean, scoped revert — no concerns there.
…inyhumansai#3401) MiniMax was registered with endpoint https://api.minimax.io/anthropic + AuthStyle::Anthropic (tinyhumansai#3062). But OpenHuman has no Anthropic Messages provider — AuthStyle::Anthropic only changes the auth header; chat still builds {endpoint}/chat/completions and list_models builds {endpoint}/models. That base only serves MiniMax's Anthropic Messages API, so both /anthropic/chat/completions and /anthropic/models returned 404 — the model-listing 404 was Sentry TAURI-RUST-8X3 (236 events / 30 users). MiniMax exposes a full OpenAI-compatible surface at /v1 (verified: /v1/models and /v1/chat/completions both respond 401, i.e. exist). Point the catalog entry (Rust + frontend mirror) there with Bearer auth so both model-listing and chat resolve. No error-handling change — a genuine 404 from any provider still surfaces, per tinyhumansai#2959. Closes tinyhumansai#3401
…inyhumansai#3401) MiniMax was registered with endpoint https://api.minimax.io/anthropic + AuthStyle::Anthropic (tinyhumansai#3062). But OpenHuman has no Anthropic Messages provider — AuthStyle::Anthropic only changes the auth header; chat still builds {endpoint}/chat/completions and list_models builds {endpoint}/models. That base only serves MiniMax's Anthropic Messages API, so both /anthropic/chat/completions and /anthropic/models returned 404 — the model-listing 404 was Sentry TAURI-RUST-8X3 (236 events / 30 users). MiniMax exposes a full OpenAI-compatible surface at /v1 (verified: /v1/models and /v1/chat/completions both respond 401, i.e. exist). Point the catalog entry (Rust + frontend mirror) there with Bearer auth so both model-listing and chat resolve. No error-handling change — a genuine 404 from any provider still surfaces, per tinyhumansai#2959. Closes tinyhumansai#3401
Summary
Reverts all Sentry error suppression code from 12 PRs that hid bugs instead of fixing root causes.
Affected PRs: #2813, #2850, #2899, #2906, #2915, #2923, #2924, #2928, #2930, #2934, #2937, #2939
Errors will resume firing Sentry events. Each will get a proper root-cause fix in separate PRs.
Why
Suppressing errors hides bugs. The correct approach is to fix the root cause — add backoff, refresh tokens, handle gracefully — not silence the symptom.
Test plan
cargo checkpassesSummary by CodeRabbit
Bug Fixes
Chores