Skip to content

Handle partial remote installed plugin failures#25208

Open
xli-oai wants to merge 3 commits into
mainfrom
xli-codex/plugin-installed-partial-remote-results
Open

Handle partial remote installed plugin failures#25208
xli-oai wants to merge 3 commits into
mainfrom
xli-codex/plugin-installed-partial-remote-results

Conversation

@xli-oai
Copy link
Copy Markdown
Contributor

@xli-oai xli-oai commented May 30, 2026

Summary

  • Keep successful plugin/installed remote scope results when the other scope fails.
  • Log per-scope remote installed plugin fetch failures and only return an error if all scopes fail.
  • Add app-server regression coverage for both global-success/workspace-failure and workspace-success/global-failure paths.

Test Plan

  • just fmt
  • cargo test -p codex-core-plugins
  • cargo test -p codex-app-server --test all plugin_installed

@xli-oai xli-oai added the oai PRs contributed by OpenAI employees label May 30, 2026 — with ChatGPT Codex Connector
@xli-oai xli-oai requested a review from xl-openai May 30, 2026 03:35
@xli-oai xli-oai marked this pull request as ready for review May 30, 2026 03:36
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2eebdf7f24

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core-plugins/src/remote.rs Outdated
Comment on lines +707 to +710
Ok(scope_plugins) => {
any_scope_succeeded = true;
installed_plugins.extend(scope_plugins);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep bundle sync consistent with partial installed fetches

When plugin/installed runs with one scope endpoint failing and the other returning an installed remote plugin that is not already cached, this new partial-success path returns that plugin to the client, but the bundle sync started by the same request still uses tokio::try_join! in remote_installed_plugin_sync.rs and aborts on the failing scope before installing any bundle. That leaves the UI/API reporting the plugin as installed while it is not present in the local plugin cache/effective plugins until a later all-scopes-successful sync, so either the bundle sync needs the same per-scope tolerance or this fetch should not publish plugins that cannot be synced in that scenario.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected

.collect::<Result<Vec<_>, _>>()
}) {
Ok(scope_plugins) => {
any_scope_succeeded = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't let hidden scopes mask visible-scope failures

When only one remote installed scope is enabled for the client (for example remote_plugin = true and plugin_sharing = false), this marks the whole fetch successful if the disabled/hidden workspace scope succeeds, even when the visible global scope failed; load_remote_installed_plugins then filters the successful workspace result out via visible_scopes and returns no remote marketplaces instead of surfacing the global fetch failure. The same happens in reverse for workspace-only configurations, so partial success should be evaluated against the scopes that can actually be shown or the fetch should not include disabled scopes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is expected

Comment thread codex-rs/core-plugins/src/remote.rs Outdated
Comment on lines +707 to +710
Ok(scope_plugins) => {
any_scope_succeeded = true;
installed_plugins.extend(scope_plugins);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cached plugins for the failed scope

On a transient failure for one installed-plugin scope after the cache has already been populated, this partial result contains only the successful scope and is still returned as Ok; both the synchronous plugin/installed path and the background refresh write that vector back into remote_installed_plugins_cache, which removes every previously cached plugin from the failed scope and triggers an effective-plugin change. In a two-scope setup, a temporary 500 from the workspace endpoint therefore disables all workspace-installed plugins even though their last known state was available, so the failed scope needs to retain its cached entries or avoid replacing the full cache with a partial snapshot.

Useful? React with 👍 / 👎.

Comment thread codex-rs/core-plugins/src/remote.rs Outdated
Comment on lines +701 to +705
match result.and_then(|(_scope, plugins)| {
plugins
.into_iter()
.map(|plugin| remote_installed_plugin_to_cache_entry(&plugin))
.collect::<Result<Vec<_>, _>>()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Don't treat one malformed plugin as a scope failure

If one installed plugin in a scope cannot be converted (for example a workspace item is missing required discoverability), this collect::<Result<Vec<_>, _>>() turns the entire successful scope response into an error; when the other scope succeeds, the error is only logged and every otherwise-valid plugin from the malformed plugin's scope is silently dropped. That hides backend/data-shape problems and can make unrelated installed plugins disappear from plugin/installed, so conversion failures should either surface as request failures or be isolated to the bad plugin rather than discarding the whole scope.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is existing behavior, if one plugin fail, we treat all fail, i can make a change to make this isolated.

@xli-oai
Copy link
Copy Markdown
Contributor Author

xli-oai commented May 30, 2026

@codex review, make sure you only focus on the change not existing or unrelated behaviors

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 956fcdb93c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

] {
match result {
Ok((_scope, plugins)) => {
any_scope_succeeded = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve cached plugins for failed scopes

When one scope succeeds and the other scope returns an error, this marks the whole fetch as successful and returns only the plugins from the successful scope. Both build_and_cache_remote_installed_plugin_marketplaces and the background refresh path write that returned vector back to remote_installed_plugins_cache wholesale, so with both remote scopes enabled a transient 500/timeout for WORKSPACE after GLOBAL succeeds will drop all previously cached workspace-installed plugins and fire the effective-plugins-changed callback, effectively disabling those plugins until a later successful refresh. The partial response should not overwrite the failed scope's cached entries unless the fetch actually proved they are gone.

Useful? React with 👍 / 👎.

.flat_map(|(_scope, plugins)| plugins)
.map(|plugin| remote_installed_plugin_to_cache_entry(&plugin))
.collect::<Result<Vec<_>, _>>()?;
let (global, workspace) = tokio::join!(global, workspace);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bound per-scope installed fetches

Switching from try_join! to join! means a quick failure from one scope no longer completes the request while the other in-flight scope is still pending. The default remote-plugin reqwest client does not set an application timeout, so during a partial backend outage where (for example) GLOBAL returns 500 but WORKSPACE stalls, plugin/installed can now block until the transport/OS gives up instead of returning the error promptly as it did before; add a per-scope timeout or other cancellation path before waiting for both results.

Useful? React with 👍 / 👎.

Comment on lines +707 to +714
Err(err) => {
tracing::warn!(
scope = scope.api_value(),
plugin_id = %plugin.plugin.id,
plugin_name = %plugin.plugin.name,
error = %err,
"skipping malformed remote installed plugin"
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve cached plugins skipped as malformed

When a successful scope contains one malformed installed-plugin entry, this branch drops only that entry and still returns Ok for the shortened list. The manager then writes that shortened vector back to remote_installed_plugins_cache, so a transient bad payload for a single previously cached plugin, such as a workspace plugin missing discoverability, makes that plugin disappear from effective plugins until a later clean refresh instead of preserving the last known-good entry.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oai PRs contributed by OpenAI employees

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants