BUG: Plugin Daemon Signals Readiness Before Plugins Complete Startup  in Kubernetes Deployments

**Self Checks**

To make sure we get to you in time, please check the following :)
- [x] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify-plugin-daemon/issues), including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/langgenius/dify/issues/1542)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:)
- [x] "Please do not modify this template :) and fill in all the required fields."

**Versions**
1. **dify-plugin-daemon Version**: 0.5.1
2. **dify-api Version**: 1.11.1

## Describe the bug

In Kubernetes environments, the plugin daemon has a **race condition during startup**:

1. **HTTP service starts immediately** (0s) → readiness probe gets 200 ✅
2. **Plugins start asynchronously** in background (1-600+ seconds) ⏳
3. **K8s Gateway detects ready status** (~5s) → starts forwarding traffic 🚨
4. **Plugins still initializing** or startup fails (~30-600s) → requests return errors ❌

### Root Causes

| # | Issue | Impact |
|---|-------|--------|
| 1 | **No plugin state awareness** | readiness probe only checks HTTP service, not plugin readiness |
| 2 | **Time desynchronization** | HTTP fast (1-5s), plugins slow (30-600s), async startup |
| 3 | **Poor observability** | Cannot see plugin startup progress or failure reasons |
| 4 | **Hardcoded retry limit** | 15 retries fixed in code, startup takes 600+ seconds |

### Current Behavior

```
0s      : HTTP server running ✅
         readiness probe → 200 OK (but plugins not ready!)

~5s     : K8s Gateway traffic begins flowing
         Requests fail if plugins still loading

~30-600s: Plugins finally ready (or failed)
         Too late! Traffic already flowing
```

### Business Constraints

- **Private cloud deployment** - no manual intervention possible
- **Partial availability acceptable** - service should start even if some plugins fail
- **Fast startup required** - K8s gateway needs sufficient time to detect true ready state

---

## To Reproduce

Steps to reproduce the behavior in Kubernetes:

1. Deploy dify-plugin-daemon to Kubernetes with current readiness probe pointing to `/health/check`
2. Observe initial readiness probe response time (returns 200 immediately)
3. Wait for plugins to fully load (10-600+ seconds depending on plugin count/type)
4. Monitor traffic during this window:
   - Traffic arrives before plugins fully loaded
   - Requests fail with "plugin not ready" or timeout errors
5. Check logs to see retry attempts are hardcoded at 15 times

---

## Expected behavior

- ✅ readiness probe returns **503** while plugins are still starting
- ✅ readiness probe returns **200** only after **all plugin startup attempts complete** (success or failure after max retries)
- ✅ Configurable retry limit instead of hardcoded 15
- ✅ Visible plugin startup state and error information
- ✅ Startup time reduced from 600s to ~225s (63% improvement)

---

**Screenshots**
The Kubernetes pod health check has passed, and the pod has started accepting API requests forwarded by the gateway.
<img width="1668" height="189" alt="Image" src="https://github.com/user-attachments/assets/b5b8fc69-885f-41e0-9a8d-c4ca7a577970" />

However, the plugin in local mode is still starting up, causing the plugin to be called.

<img width="465" height="164" alt="Image" src="https://github.com/user-attachments/assets/c5ef0164-af86-47bb-a5e1-0234c22603f1" />

**Additional context**
Dify is deployed via Kubernetes.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Plugin Daemon Signals Readiness Before Plugins Complete Startup in Kubernetes Deployments #598

Describe the bug

Root Causes

Current Behavior

Business Constraints

To Reproduce

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Issue	Impact
1	No plugin state awareness	readiness probe only checks HTTP service, not plugin readiness
2	Time desynchronization	HTTP fast (1-5s), plugins slow (30-600s), async startup
3	Poor observability	Cannot see plugin startup progress or failure reasons
4	Hardcoded retry limit	15 retries fixed in code, startup takes 600+ seconds

BUG: Plugin Daemon Signals Readiness Before Plugins Complete Startup in Kubernetes Deployments #598

Description

Describe the bug

Root Causes

Current Behavior

Business Constraints

To Reproduce

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions