Skip to content

Emit scheduler.executor_events_duration per executor#68152

Open
dkranchii wants to merge 1 commit into
apache:mainfrom
dkranchii:metrics-process-executor-events-timer
Open

Emit scheduler.executor_events_duration per executor#68152
dkranchii wants to merge 1 commit into
apache:mainfrom
dkranchii:metrics-process-executor-events-timer

Conversation

@dkranchii
Copy link
Copy Markdown
Contributor

Summary

Wrap _process_executor_events() in a per-executor stats.timer named scheduler.executor_events_duration, tagged by executor class name. Multi-executor deployments can now attribute per-loop event-processing cost to each configured executor, instead of only seeing it baked into the aggregate scheduler.scheduler_loop_duration.

This mirrors the precedent set by #66808, which added scheduler.executor_heartbeat_duration for executor.heartbeat(). The two timers sit side-by-side, so operators can localize which stage of the scheduler loop a given executor is slowing down.

Why

Today, when the scheduler loop runs long in a multi-executor deployment, operators can see the aggregate via scheduler.scheduler_loop_duration but cannot tell which executor's event processing is to blame. Adding a per-executor timer for _process_executor_events gives the same granular signal the heartbeat timer added — single-file, additive, zero-cost when metrics are disabled.

Changes

  • airflow-core/src/airflow/jobs/scheduler_job_runner.py — wrap the per-executor _process_executor_events() call in stats.timer("scheduler.executor_events_duration", tags={"executor": type(executor).__name__}).
  • shared/observability/src/airflow_shared/observability/metrics/metrics_template.yaml — declare the new timer in the metrics registry.
  • airflow-core/tests/unit/jobs/test_scheduler_job.py — add test_process_executor_events_emits_timer, mirroring the existing test_executor_heartbeat_emits_timer structure.

Test plan

  • New unit test test_process_executor_events_emits_timer asserts the timer is emitted once per executor with the expected tag.
  • Existing test_executor_heartbeat_emits_timer still passes (sibling test, same loop).
  • ruff format / ruff check clean.
  • prek run --from-ref main --stage pre-commit and --stage manual clean (excluding host-only mypy/breeze hooks; CI will run them).

Notes

  • Additive change — no behavioral impact when metrics are disabled.
  • No newsfragment per CLAUDE.md: new optional metrics are not major/breaking.
  • No config or schema changes.

Was generative AI tooling used to co-author this PR?
  • Yes — Cursor

@boring-cyborg boring-cyborg Bot added the area:Scheduler including HA (high availability) scheduler label Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant