Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] Add lint rule to show error for removed context variables in airflow #15144

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sunank200
Copy link

@sunank200 sunank200 commented Dec 26, 2024

Summary

Airflow 3.0 removes following deprecated Airflow context variables:

conf
execution_date
next_ds
next_ds_nodash
next_execution_date
prev_ds
prev_ds_nodash
prev_execution_date
prev_execution_date_success
tomorrow_ds
yesterday_ds
yesterday_ds_nodash

They have been deprecated in 2.x, but the removal causes incompatibilities that we want to detect.

related: #44409, #41641

Test Plan

A test fixture is included in the PR.

Copy link
Contributor

github-actions bot commented Dec 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

from airflow.decorators import task

@task
def print_config(**context):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some more things we should check

  1. Named keyword arguments (e.g. def print_config(execution_date))
  2. Getting the context with get_current_context instead of function arguments.
  3. context in an operator’s execute.

These can be added in separate PRs instead after this is merged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added logic for other ways to access context value as well. It is part of tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Named keyword arguments (e.g. def print_config(execution_date)) I can create a separate PR

if id.as_str() == "context" {
if let Some(key) = extract_name_from_slice(slice) {
const REMOVED_CONTEXT_KEYS: [&str; 11] = [
"execution_date",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"conf" had also been removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tracked in apache/airflow#45212, but yep, we could do it here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add conf here, it would be awesome if we could include triggering_dataset_eventstriggering_asset_events. If not, I can make a separate PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added conf removal check. include triggering_dataset_eventstriggering_asset_events can be part of another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If conf has been added, could you please update the description as well? Thanks!

@task
def print_config(**context):
# This should not throw an error as logical_date is part of airflow context.
logical_date = context["logical_date"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunank200 and I discussed this earlier. What we're trying to check is whether there's a variable named as context in a function (most commonly seen in taskflow and python operator) and whether it's can be accessed like a dict with the keys we want to check. I think it's unlikely users are using something like this out of the airflow context. But would like to know whether there's any concern

@MichaReiser @uranusjr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added logic for other ways to access context value as well. It is part of tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s probably better to detect

  1. Arguments of a function decorated with @task (either ** or simple named arguments). (As a follow-up, any functions called by such a function)
  2. The execute function of a BaseOperator subclass (As a follow-up, any functions called by execute)
  3. The dict returned by get_current_context.

This should be better than detecting with variable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about python_callable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think python_callable takes the context though? It only accepts things you provide in self.op_args and self.op_kwargs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it'll be useful to guard this check by first verifying that the parameter is coming from a function which is decorated with a @task.

I think this can be done as a pre-check for context variables by using the checker.semantic().current_statements() method to traverse up the AST to find the function definition node and checking whether the function has a @task decorator that originates from the airflow module.

/// Returns an [`Iterator`] over the current statement hierarchy, from the current [`Stmt`]
/// through to any parents.
pub fn current_statements(&self) -> impl Iterator<Item = &'a Stmt> + '_ {
let id = self.node_id.expect("No current node");
self.nodes
.ancestor_ids(id)
.filter_map(move |id| self.nodes[id].as_statement())
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think python_callable takes the context though? It only accepts things you provide in self.op_args and self.op_kwargs.

I though we can still get it in the python_callable? https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonoperator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK I didn’t even realise you can do that… yeah in that case it’s probably a good idea to also detect python_callable arguments.

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 5c96f89 to 5103ef7 Compare December 27, 2024 04:52
@sunank200 sunank200 requested review from Lee-W and uranusjr December 27, 2024 04:53
@task
def print_config(**context):
# This should not throw an error as logical_date is part of airflow context.
logical_date = context["logical_date"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about python_callable?


if let Expr::Subscript(ExprSubscript { value, slice, .. }) = expr {
if let Expr::Name(ExprName { id, .. }) = &**value {
if id.as_str() == "context" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd also need to check where the context variable is originating from otherwise, I think, this will raise a violation on all variables that's named "context" and is using a similar access pattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., we need to make sure that the definition of context variable is the function parameter that's decorated with @task.

@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Dec 30, 2024
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch 2 times, most recently from 6da5dd9 to d580a4b Compare January 2, 2025 08:00
add lint rule to show error for removed context variables in airflow
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from d580a4b to c0a34d3 Compare January 2, 2025 08:03
pub(crate) fn removed_context_variable(checker: &mut Checker, expr: &Expr) {
const REMOVED_CONTEXT_KEYS: [&str; 12] = [
"conf",
"execution_date",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For execution_date there is actually a replacement - in the docs: https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html#deprecated-variables - can you add this?

(Same for next_execution_date, prev_execution_date)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants