Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] customization hook for load_yaml_dags #299

Open
1 task done
wearpants opened this issue Nov 24, 2024 · 3 comments
Open
1 task done

[Feature] customization hook for load_yaml_dags #299

wearpants opened this issue Nov 24, 2024 · 3 comments
Labels
enhancement New feature or request triage-needed

Comments

@wearpants
Copy link

wearpants commented Nov 24, 2024

Description

Currently, file finding/reading is tightly coupled with DAG building in load_yaml_dags. I propose looser coupling or a hook to allow customizing the DAG via Python code, instead of directly translating the yaml.

DagFactory.__init__ can take either a yaml file path or a python dictionary. However, load_yaml_dags only allows passing file paths to DagFactory. It'd be nice to have a way to hook into it to pre-process the yaml file and pass a modified dict to DagFactory.

Basically, I see 3 parts here:

  1. given a list of directories and recursive flag, find some files
  2. process those files (maybe in context with each other, ie [Feature] Shared defaults for load_yaml_dags #297) and generate dicts
  3. pass the dicts to DagFactory to build DAGs

Use case/motivation

I'd like to be able to tweak the "yaml DAG DSL" a bit for my application, instead of directly translating to Airflow DAG semantics - goal is to make it easier/less verbose for non-technical users & add certain features on top of base Airflow semantics without writing new operators & requiring users to understand them.

Basically, I only allow BashOperator (may eventually swap to DockerOperator) and I treat dag-factory as analogous to a Makefile, and I'd like to minimize the boilerplate required, as well as add additional task parameters (to specify which Bash runtime environment to use: ie, setting PATH/PYTHONPATH, etc).

Related issues

#289, #290, #297

Are you willing to submit a PR?

  • Yes, I am willing to submit a PR!
@wearpants wearpants added enhancement New feature or request triage-needed labels Nov 24, 2024
@tatiana
Copy link
Collaborator

tatiana commented Nov 27, 2024

Hi, @wearpants. You have some great ideas for improving the DAG factory. Would you like to have a call to discuss them next week? Thirty minutes may be good enough to brainstorm.

@wearpants
Copy link
Author

@tatiana happy to, you can schedule here https://calendly.com/pete-fein/30min

@wearpants
Copy link
Author

wearpants commented Dec 14, 2024

to follow up from our call, the kind of dsl i am imagining would allow me to abstract away/hide airflow specific constructs - dag factory feels like it could be the basis for a generic dag specification portable between different orchestrators, or even transpiled to a makefile or casey/just… For my purposes, the less my semi technical users need to understand airflow the better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage-needed
Projects
None yet
Development

No branches or pull requests

2 participants