Handle partial remote installed plugin failures#25208
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2eebdf7f24
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Ok(scope_plugins) => { | ||
| any_scope_succeeded = true; | ||
| installed_plugins.extend(scope_plugins); | ||
| } |
There was a problem hiding this comment.
Keep bundle sync consistent with partial installed fetches
When plugin/installed runs with one scope endpoint failing and the other returning an installed remote plugin that is not already cached, this new partial-success path returns that plugin to the client, but the bundle sync started by the same request still uses tokio::try_join! in remote_installed_plugin_sync.rs and aborts on the failing scope before installing any bundle. That leaves the UI/API reporting the plugin as installed while it is not present in the local plugin cache/effective plugins until a later all-scopes-successful sync, so either the bundle sync needs the same per-scope tolerance or this fetch should not publish plugins that cannot be synced in that scenario.
Useful? React with 👍 / 👎.
| .collect::<Result<Vec<_>, _>>() | ||
| }) { | ||
| Ok(scope_plugins) => { | ||
| any_scope_succeeded = true; |
There was a problem hiding this comment.
Don't let hidden scopes mask visible-scope failures
When only one remote installed scope is enabled for the client (for example remote_plugin = true and plugin_sharing = false), this marks the whole fetch successful if the disabled/hidden workspace scope succeeds, even when the visible global scope failed; load_remote_installed_plugins then filters the successful workspace result out via visible_scopes and returns no remote marketplaces instead of surfacing the global fetch failure. The same happens in reverse for workspace-only configurations, so partial success should be evaluated against the scopes that can actually be shown or the fetch should not include disabled scopes.
Useful? React with 👍 / 👎.
| Ok(scope_plugins) => { | ||
| any_scope_succeeded = true; | ||
| installed_plugins.extend(scope_plugins); | ||
| } |
There was a problem hiding this comment.
Preserve cached plugins for the failed scope
On a transient failure for one installed-plugin scope after the cache has already been populated, this partial result contains only the successful scope and is still returned as Ok; both the synchronous plugin/installed path and the background refresh write that vector back into remote_installed_plugins_cache, which removes every previously cached plugin from the failed scope and triggers an effective-plugin change. In a two-scope setup, a temporary 500 from the workspace endpoint therefore disables all workspace-installed plugins even though their last known state was available, so the failed scope needs to retain its cached entries or avoid replacing the full cache with a partial snapshot.
Useful? React with 👍 / 👎.
| match result.and_then(|(_scope, plugins)| { | ||
| plugins | ||
| .into_iter() | ||
| .map(|plugin| remote_installed_plugin_to_cache_entry(&plugin)) | ||
| .collect::<Result<Vec<_>, _>>() |
There was a problem hiding this comment.
Don't treat one malformed plugin as a scope failure
If one installed plugin in a scope cannot be converted (for example a workspace item is missing required discoverability), this collect::<Result<Vec<_>, _>>() turns the entire successful scope response into an error; when the other scope succeeds, the error is only logged and every otherwise-valid plugin from the malformed plugin's scope is silently dropped. That hides backend/data-shape problems and can make unrelated installed plugins disappear from plugin/installed, so conversion failures should either surface as request failures or be isolated to the bad plugin rather than discarding the whole scope.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
this is existing behavior, if one plugin fail, we treat all fail, i can make a change to make this isolated.
|
@codex review, make sure you only focus on the change not existing or unrelated behaviors |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 956fcdb93c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ] { | ||
| match result { | ||
| Ok((_scope, plugins)) => { | ||
| any_scope_succeeded = true; |
There was a problem hiding this comment.
Preserve cached plugins for failed scopes
When one scope succeeds and the other scope returns an error, this marks the whole fetch as successful and returns only the plugins from the successful scope. Both build_and_cache_remote_installed_plugin_marketplaces and the background refresh path write that returned vector back to remote_installed_plugins_cache wholesale, so with both remote scopes enabled a transient 500/timeout for WORKSPACE after GLOBAL succeeds will drop all previously cached workspace-installed plugins and fire the effective-plugins-changed callback, effectively disabling those plugins until a later successful refresh. The partial response should not overwrite the failed scope's cached entries unless the fetch actually proved they are gone.
Useful? React with 👍 / 👎.
| .flat_map(|(_scope, plugins)| plugins) | ||
| .map(|plugin| remote_installed_plugin_to_cache_entry(&plugin)) | ||
| .collect::<Result<Vec<_>, _>>()?; | ||
| let (global, workspace) = tokio::join!(global, workspace); |
There was a problem hiding this comment.
Bound per-scope installed fetches
Switching from try_join! to join! means a quick failure from one scope no longer completes the request while the other in-flight scope is still pending. The default remote-plugin reqwest client does not set an application timeout, so during a partial backend outage where (for example) GLOBAL returns 500 but WORKSPACE stalls, plugin/installed can now block until the transport/OS gives up instead of returning the error promptly as it did before; add a per-scope timeout or other cancellation path before waiting for both results.
Useful? React with 👍 / 👎.
| Err(err) => { | ||
| tracing::warn!( | ||
| scope = scope.api_value(), | ||
| plugin_id = %plugin.plugin.id, | ||
| plugin_name = %plugin.plugin.name, | ||
| error = %err, | ||
| "skipping malformed remote installed plugin" | ||
| ); |
There was a problem hiding this comment.
Preserve cached plugins skipped as malformed
When a successful scope contains one malformed installed-plugin entry, this branch drops only that entry and still returns Ok for the shortened list. The manager then writes that shortened vector back to remote_installed_plugins_cache, so a transient bad payload for a single previously cached plugin, such as a workspace plugin missing discoverability, makes that plugin disappear from effective plugins until a later clean refresh instead of preserving the last known-good entry.
Useful? React with 👍 / 👎.
Summary
plugin/installedremote scope results when the other scope fails.Test Plan
just fmtcargo test -p codex-core-pluginscargo test -p codex-app-server --test all plugin_installed