feat(plugin): Added plugin-ready check endpoints and optimized local plugin startup logic#600
feat(plugin): Added plugin-ready check endpoints and optimized local plugin startup logic#600NieRonghua wants to merge 2 commits intolanggenius:mainfrom
Conversation
添加 /ready/check 端点用于检查插件启动就绪状态,支持 Kubernetes 就绪探针 引入插件最大重试次数配置,默认为 15 次,可自定义重试策略 优化本地插件监控逻辑,分离初始插件和运行时插件的就绪判断 修复 go.mod 中 trace 包依赖重复声明问题
Summary of ChangesHello @NieRonghua, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical race conditions and stability issues in the plugin daemon's readiness probe within Kubernetes environments. By introducing a sophisticated "Initial Plugin Set Locking Strategy," the system now accurately reports readiness only after all initial plugins have completed their startup attempts, preventing traffic from being routed to unready pods. Furthermore, it significantly improves startup performance through configurable retry limits and provides detailed observability into plugin states, ensuring that runtime plugin changes do not destabilize the pod's readiness. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
@Yeuoly please review this PR |
There was a problem hiding this comment.
Code Review
This pull request introduces a new readiness check mechanism to solve a race condition during plugin startup. However, a critical bug in the local plugin monitoring loop will cause the daemon to panic and crash when it encounters a new plugin, leading to a Denial of Service (DoS). Additionally, the new readiness endpoint exposes detailed plugin metadata publicly, which should be restricted or minimized to prevent information leakage. My review also identified a design issue with global state and opportunities to improve code clarity and adhere to Go idioms. Addressing these issues will significantly improve the robustness, security, and maintainability of the code.
…set in readiness check Convert global initialPlugins variable to ControlPanel member field to avoid concurrent access conflicts. Simplify return value structure of isInitialPluginsReady method to improve code readability. Remove unnecessary detail field from readiness check response and simplify controller function signature.
Description
Problem
In Kubernetes environments, the plugin daemon exhibits a race condition during startup that causes service disruption:
Additionally, when users install new plugins at runtime after the pod becomes Ready, the readiness probe returns 503, causing K8s to remove the pod from Service endpoints and interrupt traffic flow.
Root Causes Addressed
Solution: Initial Plugin Set Locking Strategy
Implements an intelligent readiness mechanism that:
Key principle: Once a pod is Ready, it will NEVER become NotReady due to runtime plugin additions.
Changes Made
Code Implementation
LocalReadinessSnapshotstructure separating initial/runtime plugin statesinitialPluginSetlocking mechanism (thread-safe withsync.RWMutex)isInitialPluginsReady()function for atomic readiness determinationlockInitialPlugins()one-time locking at first startupgetInitialPluginSet()atomic read-only accessDocumentation Updates
Configuration
PLUGIN_LOCAL_MAX_RETRY_COUNT(default: 5, was hardcoded 15)Performance Impact
API Response Format
New
/ready/checkendpoint returns:InitialPluginsReadyandRuntimePluginsLoadingfieldsBackward Compatibility
✅ Fully backward compatible
/health/checkendpoint unchangedChanges
internal/core/control_panel/readiness.go: Initial plugin set locking mechanismREADME.md: Updated Health Endpoints documentationTECHNICAL_PLAN.md: Complete technical specificationCOMMUNITY_ISSUE.md: Issue template with scenariosINITIAL_PLUGIN_SET_LOCKING_STRATEGY.md: Implementation guideType of Change
Essential Checklist
Testing
Bug Fix (if applicable)
Fixes #598)Additional Information
#598