Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] [Tests] Example for using dag defaults when declaring multiple dags within the same file #295

Open
cmarteepants opened this issue Nov 20, 2024 · 0 comments
Labels
enhancement New feature or request priority/medium

Comments

@cmarteepants
Copy link
Collaborator

We should document that we support using dag defaults when declaring multiple dags within the same file. Using this methodology, users can extend defaults as well as override them. It is even possible to have identical dags with different dag ids! We should also expand our test cases to account for these scenarios.

For example, if we define dags like this:

default:
  catchup: false
  default_args:
    start_date: "2024-01-01"
  schedule_interval: "0 0 * * *"
  tasks:
    extract:
      operator: airflow.operators.python.PythonOperator
      python_callable_file: /usr/local/airflow/include/etl_helpers.py
      python_callable_name: extract_helper
    load:
      dependencies:
      - transform
      operator: airflow.operators.python.PythonOperator
      python_callable_file: /usr/local/airflow/include/etl_helpers.py
      python_callable_name: load_helper
    transform:
      dependencies:
      - extract
      op_kwargs:
        ds_nodash: '{{ds_nodash}}'
      operator: airflow.operators.python.PythonOperator
      python_callable_file: /usr/local/airflow/include/etl_helpers.py
      python_callable_name: transform_helper


business_analytics:
  schedule_interval: "@daily"
  tasks:
    load:
      op_kwargs:
        database_name: BA
        table_name: inventory

data_science:
  tasks:
    load:
      op_kwargs:
        database_name: DS
        table_name: daily_sales

machine_learning:
  tasks:
    load:
      op_kwargs:
        database_name: ML
        table_name: training_data

They will be generated like this:

Image

Note that the schedule for business_analytics dag is using the override value.

Not sure how realistic this scenario is, but there is also support for generating the same dag with different ids, like this:

...
default:
  ...
business_analytics: {}

data_science: {}

machine_learning: {}

Let's update our readme, and make sure we have test cases that cover this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request priority/medium
Projects
None yet
Development

No branches or pull requests

1 participant