fix(google): extract Cloud Logging labels from AF3 log path when task_instance is missing in supervisor context#68246
Open
goingforstudying-ctrl wants to merge 1 commit into
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
|
ffea93e to
83bc8a2
Compare
In Airflow 3 the supervisor process runs REMOTE_TASK_LOG handlers, but record.task_instance is never set in supervisor context (it is a task-subprocess concept). When task_instance is missing the proc() closure shipped log entries with empty labels, making Cloud Logging entries unsearchable by dag_id / task_id. Parse dag_id, task_id, and try_number from the structured AF3 log path (dag_id=<x>/run_id=<x>/task_id=<x>/attempt=<N>.log) instead. This requires zero DB access and works regardless of whether the handler runs in a task subprocess or the supervisor. relates to apache#68240
83bc8a2 to
1764fd4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ran into this while debugging a Cloud Logging setup on GKE — log entries were landing in Stackdriver with empty labels, making it impossible to filter by dag_id or task_id.
Turns out the StackdriverRemoteLogIO.processors proc() closure reads
record.task_instanceto populate labels, but in AF3's supervisor model the REMOTE_TASK_LOG handler runs in the supervisor process where that attribute is never set. So every log entry from the supervisor just gets empty labels.This grabs dag_id, task_id, and try_number from the log path instead. AF3's log path template is
dag_id=<x>/run_id=<x>/task_id=<x>/attempt=<N>.log— all four fields are already in the path with zero DB access needed.The fallback only kicks in when task_instance is genuinely missing, so the task-subprocess code path (where task_instance is available) is untouched.
Not sure if
run_idshould also be turned into a label here — left it out for now since the existing label set doesn't include it and the read-side filtering (bug 2) will need its own fix anyway. Happy to add it if maintainers think it belongs.relates to #68240