-
Notifications
You must be signed in to change notification settings - Fork 277
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Self Checks
To make sure we get to you in time, please check the following :)
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- "Please do not modify this template :) and fill in all the required fields."
Versions
- dify-plugin-daemon Version: 0.5.1
- dify-api Version: 1.11.1
Describe the bug
In Kubernetes environments, the plugin daemon has a race condition during startup:
- HTTP service starts immediately (0s) → readiness probe gets 200 ✅
- Plugins start asynchronously in background (1-600+ seconds) ⏳
- K8s Gateway detects ready status (~5s) → starts forwarding traffic 🚨
- Plugins still initializing or startup fails (~30-600s) → requests return errors ❌
Root Causes
| # | Issue | Impact |
|---|---|---|
| 1 | No plugin state awareness | readiness probe only checks HTTP service, not plugin readiness |
| 2 | Time desynchronization | HTTP fast (1-5s), plugins slow (30-600s), async startup |
| 3 | Poor observability | Cannot see plugin startup progress or failure reasons |
| 4 | Hardcoded retry limit | 15 retries fixed in code, startup takes 600+ seconds |
Current Behavior
0s : HTTP server running ✅
readiness probe → 200 OK (but plugins not ready!)
~5s : K8s Gateway traffic begins flowing
Requests fail if plugins still loading
~30-600s: Plugins finally ready (or failed)
Too late! Traffic already flowing
Business Constraints
- Private cloud deployment - no manual intervention possible
- Partial availability acceptable - service should start even if some plugins fail
- Fast startup required - K8s gateway needs sufficient time to detect true ready state
To Reproduce
Steps to reproduce the behavior in Kubernetes:
- Deploy dify-plugin-daemon to Kubernetes with current readiness probe pointing to
/health/check - Observe initial readiness probe response time (returns 200 immediately)
- Wait for plugins to fully load (10-600+ seconds depending on plugin count/type)
- Monitor traffic during this window:
- Traffic arrives before plugins fully loaded
- Requests fail with "plugin not ready" or timeout errors
- Check logs to see retry attempts are hardcoded at 15 times
Expected behavior
- ✅ readiness probe returns 503 while plugins are still starting
- ✅ readiness probe returns 200 only after all plugin startup attempts complete (success or failure after max retries)
- ✅ Configurable retry limit instead of hardcoded 15
- ✅ Visible plugin startup state and error information
- ✅ Startup time reduced from 600s to ~225s (63% improvement)
Screenshots
The Kubernetes pod health check has passed, and the pod has started accepting API requests forwarded by the gateway.

However, the plugin in local mode is still starting up, causing the plugin to be called.
Additional context
Dify is deployed via Kubernetes.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working