test: retry live RBAC per-server access#5482
Open
lucarlig wants to merge 3 commits into
Open
Conversation
Signed-off-by: lucarlig <luca.carlig@ibm.com>
The dataplane publisher interval was a hardcoded 60s module constant, so runtime-created users and servers stayed invisible to the dataplane for up to a full minute. Expose it as DATAPLANE_PUBLISHER_INTERVAL_SECONDS (default 60, unchanged) and derive the UserConfig key TTL as interval + 10, preserving the current 60/70 relationship. Functional test stacks can now run a short interval while load benchmarks keep the default. Signed-off-by: lucarlig <luca.carlig@ibm.com>
A 5 x 100ms retry window cannot cover the dataplane publisher snapshot cycle (default 60s), so the allow-path tests still failed on the split stack. Switch the helper to a deadline-based wait that covers one full publish interval plus slack by default (75s), tunable via MCP_E2E_PUBLISHER_SYNC_DEADLINE for stacks that run a short DATAPLANE_PUBLISHER_INTERVAL_SECONDS. Signed-off-by: lucarlig <luca.carlig@ibm.com>
Collaborator
Author
|
Verified the updated approach against a live split control-plane/dataplane stack (
One note for anyone running the split stack: with a short publish interval the dataplane's per-subject config cache ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug-fix PR
Summary
The live RBAC per-server MCP allow-path tests fail when they exercise a freshly-created user/server before ContextForge has published dataplane config to Redis. This PR fixes that at both ends:
DATAPLANE_PUBLISHER_INTERVAL_SECONDS, default 60s, unchanged), so functional test stacks can run a short interval;MCP_E2E_PUBLISHER_SYNC_DEADLINE).Reproduction Steps
Run the live RBAC MCP transport tests against the split control-plane/dataplane stack. The failures were seen in:
test_public_token_accesses_public_servertest_team_member_accesses_team_serverBoth tests create fixtures, then immediately call the per-server MCP endpoint.
Root Cause
The dataplane publisher is a periodic full-snapshot loop with a hardcoded 60s interval (
REDIS_PUBLISHER_TIMEinmcpgateway/services/dataplane_publisher.py); there is no event-driven publish. A user/server created between snapshots is invisible to the dataplane until the next cycle, so convergence latency is uniform 0-60s. A short fixed retry (the previous revision of this PR used 5 x 100ms) cannot cover that window: verified against a live split stack, the tests still failed 3 runs out of 3, and a hand-widened 30s retry still missed when the fixture ran just after a publish cycle.Fix Description
mcpgateway/config.py: newdataplane_publisher_interval_secondssetting (default 60,ge=1).mcpgateway/services/dataplane_publisher.py: interval read from settings; UserConfig key TTL derived as interval + 10, preserving the existing 60/70 relationship at the default..env.example: document the new variable.tests/live_gateway/mcp/test_mcp_rbac_transport.py: allow-path helper now retries until a deadline (default 75s = one default publish interval + slack; 1s between attempts) instead of a fixed attempt count, and re-raises the original client error on timeout.MCP_E2E_PUBLISHER_SYNC_DEADLINElets stacks with a short publisher interval keep the wait proportionally short.Default-config deployments are behaviorally unchanged.
Reviewability
triageVerification
uv run ruff check mcpgateway/config.py mcpgateway/services/dataplane_publisher.py tests/live_gateway/mcp/test_mcp_rbac_transport.pyuv run ruff format --check(same files)uv run pytest tests/unit/mcpgateway/services/test_dataplane_publisher.pyuv run pytest tests/live_gateway/mcp/test_mcp_rbac_transport.py::TestMcpPerServerEndpoint --collect-only -qDATAPLANE_PUBLISHER_INTERVAL_SECONDS=2MCP Compliance (if relevant)
Checklist
make black isort pre-commit)