[Feature] customization hook for `load_yaml_dags` #299

wearpants · 2024-11-24T23:11:17Z

Description

Currently, file finding/reading is tightly coupled with DAG building in load_yaml_dags. I propose looser coupling or a hook to allow customizing the DAG via Python code, instead of directly translating the yaml.

DagFactory.__init__ can take either a yaml file path or a python dictionary. However, load_yaml_dags only allows passing file paths to DagFactory. It'd be nice to have a way to hook into it to pre-process the yaml file and pass a modified dict to DagFactory.

Basically, I see 3 parts here:

given a list of directories and recursive flag, find some files
process those files (maybe in context with each other, ie [Feature] Shared defaults for load_yaml_dags #297) and generate dicts
pass the dicts to DagFactory to build DAGs

Use case/motivation

I'd like to be able to tweak the "yaml DAG DSL" a bit for my application, instead of directly translating to Airflow DAG semantics - goal is to make it easier/less verbose for non-technical users & add certain features on top of base Airflow semantics without writing new operators & requiring users to understand them.

Basically, I only allow BashOperator (may eventually swap to DockerOperator) and I treat dag-factory as analogous to a Makefile, and I'd like to minimize the boilerplate required, as well as add additional task parameters (to specify which Bash runtime environment to use: ie, setting PATH/PYTHONPATH, etc).

Related issues

#289, #290, #297

Are you willing to submit a PR?

Yes, I am willing to submit a PR!

The text was updated successfully, but these errors were encountered:

tatiana · 2024-11-27T10:40:46Z

Hi, @wearpants. You have some great ideas for improving the DAG factory. Would you like to have a call to discuss them next week? Thirty minutes may be good enough to brainstorm.

wearpants · 2024-11-27T16:21:24Z

@tatiana happy to, you can schedule here https://calendly.com/pete-fein/30min

wearpants · 2024-12-14T04:48:26Z

to follow up from our call, the kind of dsl i am imagining would allow me to abstract away/hide airflow specific constructs - dag factory feels like it could be the basis for a generic dag specification portable between different orchestrators, or even transpiled to a makefile or casey/just… For my purposes, the less my semi technical users need to understand airflow the better

wearpants added enhancement New feature or request triage-needed labels Nov 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] customization hook for `load_yaml_dags` #299

[Feature] customization hook for `load_yaml_dags` #299

wearpants commented Nov 24, 2024 •

edited

Loading

tatiana commented Nov 27, 2024

wearpants commented Nov 27, 2024

wearpants commented Dec 14, 2024 •

edited

Loading

[Feature] customization hook for load_yaml_dags #299

[Feature] customization hook for load_yaml_dags #299

Comments

wearpants commented Nov 24, 2024 • edited Loading

Description

Use case/motivation

Related issues

Are you willing to submit a PR?

tatiana commented Nov 27, 2024

wearpants commented Nov 27, 2024

wearpants commented Dec 14, 2024 • edited Loading

[Feature] customization hook for `load_yaml_dags` #299

[Feature] customization hook for `load_yaml_dags` #299

wearpants commented Nov 24, 2024 •

edited

Loading

wearpants commented Dec 14, 2024 •

edited

Loading