Replies: 5 comments 7 replies
-
My general comment is that IMHO what we shodl come up here is something that would be very similar conceptually to OTEL and OPEN lineage approach :). 'Airflow as a platform".. One Comment about usage of "provider" name. I think we do not need a separate "User Management Provider Package" - at least no in the sense of having a separate provider package. I think the "common" part of the user management could be (and should be ) easily part of "airflow.user_management" or similar package in Airflow Core. Simply - this common part should be the same for everyone, no matter which implementation they choose. This should be really just an API - that should expose vey little of the actual code - just integration with the core of Airflow - and tapping-in where it is needed. No UI, No DB model (except for DAG/Task metadata and linking them to groups/users), no management CLI, just some code that the "concrete" implementation of the "User management" might tap into.
Some more comments:
While there might be a need for "multi-authentication-authorization", only one should be supported at a time. For simplicity - and similarity with Secret Backends, if you would need more versatility you could implement your own authentication backend that will "facade" other backends. Usually it is very complex to implement such "facade" implementation in generic way - but implementing them by the user for their specific case will be usually way better. Regarding the authentication/authorisation flow - nothing extraoridinary. I think the big difference vs. current FAB authentication is that there should be a way to mapp the user to resource access and it shoudl be generic enough to be implemented by both - FAB and others and allow for wide veriety in the way how this maping can be peformed. IT looks like indeed simple action + resource id should be universal enough to be able to map it the current FAB as well as more complex - cases - with user groups etc. That's all for now :) |
Beta Was this translation helpful? Give feedback.
-
I think Open Lineage is a bit different story, because it is far more complex and has far more "common" code that will handle extractors and some intrinsiv internal behaviours, for example parsing SQL in the common code (@mobuchowski @julienledem ?). It is very likely that bugs will be found and fixed and new features will be added to the Open Lineage common code, so having a separate provider package as being implemented in #29940 allows for a faster release cycle of those and adding new features even for already released airflow. So splitting Open-Lineage into stuff that is part of the core that is a minimal "integration" API with Airflow and separate provider that implements a lot of the common code used by that makes perfect sense. In contrast, user mangement integration will be very simple (as you mentioned just few methods) and even if we decide to change it, releasing those in the next airflow version is perfectly fine, so I see no need of extracting common code to separate package.
Yeah!
Very good question. I think FAB should really act as legacy implementation and KeyCloak might be our reference implementation. There are two reasons for that.
Also KeyCloak implementation does not have to be done by AWS team. I am quite sure that setting up the interface and creating the task to implement KeyCloak integration on top might be something someone will volunteer to do ( and I know a PMC member who has done it in the past in less "pluggable" way and I can ask for mentor support there ;) ).
Absolutely. There will be proably other providers that will not have Operators/Hooks but only Auth. No problem with it whatsoever. |
Beta Was this translation helpful? Give feedback.
-
Thank you @shubham22 for answering while I was away. As explained, I think the word "provider" here was misused and I basically intended to create all the base classes in core Airflow and any provider specific implementation would go in their respective provider package. I totally agree on the confusion and will update this. I was thinking creating a new API and deprecating this one. Does this sound good to you @potiuk |
Beta Was this translation helpful? Give feedback.
-
AIP: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-56+Extensible+user+management |
Beta Was this translation helpful? Give feedback.
-
TL;DR. This discussion is a follow up on a discussion started in the dev email list regarding multi tenancy. It had been suggested that, instead of adding new features in the user management part of Airflow (such as having tenants), to extract this part of core Airflow and move it to something new: the user management provider. This discussion compiles my thought. I apologize in advance for the extensive length but I could not find any way to make it shorter. Also, this discussion/document is not complete and some sections are missing but I rather have early feedbacks than waiting all the way to the end. Please feel very free to give your feedbacks, to correct me if something is wrong or to ask questions if something is not clear.
Current situation
Today, the user management is part of core Airflow. Users, roles and permissions are stored in the Airflow metastore and managed through Flask-AppBuilder (FAB). Any additional feature in the user management part of Airflow means modifying core Airflow and more importantly, verifying it fits everyone needs from individuals to teams within enterprises.
Proposal
The proposal is to extract the whole user management part of Airflow outside of core Airflow and introduce the user management provider (UMP). The goal of the UMP is to manage all features and resources related to users, roles and permissions. This way you could simply chose between a very minimalist/simple UMP and a more advanced one with notion of groups/tenants.
Everything under the FAB security manager is extracted out from core Airflow and handled by the UMP.
The base user management provider is an interface each UMP needs to inherit from. This interface defines the common API of a UMP and is the only integration point with core Airflow. In other words, any action related to user management is done through classes inheriting from this interface..
Since it is impossible to forecast what feature/view each UMP is going to offer, the “Security” tab in the nav bar will be configured by each UMP.
UMPs are “pluggable”, meaning you can swap them based on your installation needs. Airflow can only have one UMP configured at a time; this is set by the
user_management_provider
option in the[core]
section of the configuration file (https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html).Examples
Minimalist FAB provider (backward compatible)
The target of the FAB provider is to offer a backward compatible experience to the users. To put it simple, it moves the FAB security manager out of core Airflow to the FAB provider. All the different pages are still served through the web server. The “Security” tab is configured to be as it is today. End users should see no difference between before UMP and after UMP when using this provider.
Provider using KeyCloak
Common API
All UMPs have a common API defined in the base user management provider class. You can find in the table below the common API needed from all UMPs.
Authentication flow
The authentication flow allows a user to log in Airflow. The flow follows the oauth 2.0 protocol.
To simplify the example diagrams below, we consider the user is not logged in and the authentication on the server side succeed.
FAB provider
FAB provider is different from the other providers. Instead of delegating the login experience to an external service, it includes and defines the login page within the provider. The page is still served through the web server. The goal is to have the login page as it is today.
KeyCloak provider
Authorization API
The
is_authorized
API is the API each UMP needs to implement to check whether the current user has permissions to make a specific action. Here are some examples of usage:is_authorized([(permissions.ACTION_CAN_READ, permissions.RESOURCE_VARIABLE)])
is_authorized([(permissions.ACTION_CAN_READ, permissions.RESOURCE_DAG)], "dag_id")
In order to understand how this API is implemented in different UMPs, let’s take the use case of “User clicks on Variables in the Admin menu”.
FAB provider
The
is_authorized
API in the FAB provider checks if the current user has the specified permissions. The implementation is really close to check_authorization (https://github.com/apache/airflow/blob/main/airflow/www/security.py#L708) in the security manager.KeyCloak provider
KeyCloak provider leverages KeyCloak (https://www.keycloak.org/) solution. The whole user management part is delegated to KeyCloak and admins have to configure roles and permissions in KeyCloak directly. When logging in, users are issued an access token stored in the metastore. This access token is used by KeyCloak to figure out if the current user has permissions to access a given resource.
TODO: explain and reference the keycloak API to test authz (https://www.keycloak.org/docs/latest/authorization_services/#_service_obtaining_permissions)
Additional providers
Here are some examples of additional provider which could be offered in Airflow.
TODO: to be completed
IAM provider
Work in progress
Authentication flow
Authorization API
Appendix
Beta Was this translation helpful? Give feedback.
All reactions