Skip to content

fix(tracing) [1/2]: re-enable HTTP keepalive in SGPAsyncTracingProcessor via lazy per-loop client#341

Closed
mohammadatallah-scale wants to merge 1 commit into
mainfrom
mohammad/ove-2a-keepalive-async-client
Closed

fix(tracing) [1/2]: re-enable HTTP keepalive in SGPAsyncTracingProcessor via lazy per-loop client#341
mohammadatallah-scale wants to merge 1 commit into
mainfrom
mohammad/ove-2a-keepalive-async-client

Conversation

@mohammadatallah-scale
Copy link
Copy Markdown

Part 1 of 2 for OVE-2 (https://linear.app/scale-epd/issue/OVE-2). Followed by a separate PR adding batched flushing on top of this branch.

Bug

SGPAsyncTracingProcessor previously built AsyncSGPClient in __init__ with httpx.Limits(max_keepalive_connections=0) to dodge "bound to a different event loop" errors that surface under sync-ACP / per-request event loops. Disabling keepalive paid a fresh TCP + TLS handshake on every span event.

Fix

Move client construction out of __init__ and lazy-init it on the running loop on first use. Track the loop the processor-owned client was constructed on (_client_owned_at_loop); recreate the client when that loop is no longer current. With the client always bound to the loop using it, httpx defaults can stay, so connection pooling and keepalive work normally.

Behavior preserved

  • Per-event upsert_batch(items=[one]) is unchanged. Batching is intentionally not part of this PR so it can be reviewed independently in part 2.
  • Externally injected clients (processor.sgp_async_client = X) are never replaced; _client_owned_at_loop stays None for those.
  • The 6 existing memory-leak / span-lifecycle tests pass unchanged.

New tests

  • test_client_constructed_without_disabling_keepaliveAsyncSGPClient is no longer constructed with a custom http_client that disables keepalive.
  • test_owned_client_recreated_after_loop_swap — owned client is replaced when the running loop changes.
  • test_injected_client_preserved — externally assigned client is never auto-replaced.

Test plan

  • pytest tests/lib/core/tracing/processors/test_sgp_tracing_processor.py (9/9 pass)
  • pytest tests/lib/core/tracing/ (16/16 pass)
  • ruff check clean
  • ruff format --check clean
  • python -c "import agentex" succeeds
  • CI passes

Related

…a lazy per-loop client

The previous implementation built AsyncSGPClient with
`httpx.Limits(max_keepalive_connections=0)` to dodge "bound to a
different event loop" errors that surfaced under sync-ACP / per-request
event loops. Disabling keepalive paid a fresh TCP+TLS handshake on
every span event.

This change moves client construction out of __init__ and lazy-inits it
on the running event loop on first use, with automatic recreation when
the loop changes (tracked via `_client_owned_at_loop`). With the
client always bound to the loop using it, httpx defaults can be left
alone, so connection pooling and keepalive work normally.

Behavior preserved:
- Per-event upsert_batch(items=[one]) is unchanged in this PR. Batching
  is intentionally left for a follow-up so it can be reviewed
  independently.
- Externally injected clients (`processor.sgp_async_client = X`) are
  always respected and never replaced; `_client_owned_at_loop` stays
  None for those.
- Tests for memory-leak / span-lifecycle behavior pass unchanged.

New tests:
- test_client_constructed_without_disabling_keepalive — verifies
  AsyncSGPClient is no longer constructed with a custom http_client
  that disables keepalive.
- test_owned_client_recreated_after_loop_swap — verifies that an
  owned client is replaced when the running loop changes.
- test_injected_client_preserved — verifies that an externally
  assigned client is never auto-replaced.

This is part 1 of the OVE-2 split. Part 2 (batching) builds on this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant