Bring Your Own Component Registry Types #2224
Replies: 4 comments 9 replies
-
Thanks for starting this @kiersten-stokes. I think I now understand what is meant by Component Registry Types. One of the goals for making things extensible is that folks don't need to modify our framework and I feel like we should probably do something similar here. Most of the above is still applicable, it's just that we'd likely make the readers entry-point based and, just as the ComponentRegistry schema dynamically determines its I also suspect that some reader implementations will require metadata that is not currently reflected in the component registry schema and, if users were to require their own schemas, it seems like they could do that via a SchemasProvider in the ComponentRegistry schemaspace. This could get a bit hairy though since these schemas would need to derive from the existing schema or have a way to reference it in theirs. Hmm - we could instead build something where the ComponentRegistry Schemaspace instance "validates" its schemas (beyond what the SchemaManager does already - i.e., ensure they are applicable to the schemaspace). This could be the way we enforce that a BYO component registry schema has the requisite categories, runtime, location_type and paths properties defined w/o having to derive or reference the other. We'd also need to make the registry instance available to the reader - but it looks like we have the registry instances in our hands when we go to get the reader, so this could be passed to the reader's constructor, for example. Just some thoughts. Looking forward to seeing this all come about. |
Beta Was this translation helpful? Give feedback.
-
Based on observations I've made while trying to build a first prototype for the Machine Learning Exchange there are a few things that stood out. I essentially created a sub-class of
|
Beta Was this translation helpful? Give feedback.
-
I got a little lost on today's call and have some clarifying questions around the terminology registry/catalog... If I understand correctly, currently, a user has the option to create a
When a
The goal is to allow a user to create their own
|
Beta Was this translation helpful? Give feedback.
-
Components feels like packages and package management. I know byo catalog_type isn't something that most package managers have to deal with, but it might be helpful to look at other package managers as a metaphor. I think Carthage is a good example of a decentralized package manager that is popular in iOS development. They have the concept of origins Origins include
This seems similar to what we are trying to do, but they delegate the resolving of the scheme ( This might be a separate discussion, but there's some other package managery things we might want to think about, like versioning or a distinction between global and project components. I know versioning might not be feasible, but I think project components would be useful. It would be nice to add a component to a pipeline and not have it added to the global palette. It would also be nice to share pipelines with people without them having to manually add all the components that I used in the pipeline. |
Beta Was this translation helpful? Give feedback.
-
Related issue: #2220
In order to BYO registry location type, the following 3 things must be implemented:
ComponentReader
subclass to thecomponent.py
fileUrlComponentReader
,FilenameComponentReader
, orComponentReader
depending on where the component definition will ultimately be read from (remote url, local file system, proprietary catalog)get_absolute_locations
functionget_absolute_locations()
could receive a list of links to Github repos and parse each of those links to get the direct links to all files in that repo (this assumes that all the files in that repo are component definitions)paths = ['repo1_link', 'repo2_link', ...]
absolute_paths = ['repo1_component1_link', 'repo1_component2_link', 'repo2_component1_link', ...]
location_type
variableresource_type
that is defined as thelocation_type
of it's superclass (you can use the example here exactly as-is)get_reader
(component_registry.py
) like so:[YourType]ComponentReader.location_type: [YourType]ComponentReader(file_types)
component_registry.py
location_type
variable for your new reader to the component-registry schema (elyra/metadata/schemas/component-registry.json
)location_type
metadata field enum:"enum": ["URL", "Filename", "Directory", "YourType"]
location_type
variable value once converted to lowercase (in this example,location_type = "yourtype"
)In order to process components from these types of registries, both KFP and Airflow rely on some aspect of the value of the
location
attribute on theComponent
object (the value of which is currently determined by theComponentReader
class'sresource_type
attribute). The following must be taken into consideration for the existing built-in processors:kfp.components.load_component()
to construct the factory function that ultimately will be executed to create a KFPContainerOp
objecturl
,filename
, ortext
values as parameters (see here)ComponentReader
class'sresource_type
(which then becomes thelocation_type
of theComponent
object)text='component definition...'
option, but it would be easy enough to implement with minor tweaks by either saving the component definition on theComponent
object or re-parsing a location and it's corresponding definition on an as-needed basisfrom airflow.operators.{{operation.module_name}} import {{ operation.class_name }}
(where operator classname is parsed from theComponent
objectid
)Beta Was this translation helpful? Give feedback.
All reactions