You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We have been incorporating Eodag into our EO pipeline for search and discovery as intended. However, it is really frustrating to configure Eodag to search and download data from a specific provider. We do not want it to attempt to be super smart and never fail and just fallback to different unconfigured providers as this makes it really hard to debug things like credential issues at one provider. At the moment the only way to configure Eodag is with a yml file or env vars. Neither of these are suited for dynamic programming and suits a CLI / server more.
Importantly, this must be async safe, download large data is naturally an async operation, so this solution cannot rely on songle hardcoded conf file names or env variables.
According to the docs we can use env variables to point to custom configs. These are not all exposed to the dag init and these are not even all used so the docs are not accurate. The providers_cfg_file var for example does not work.
Describe the solution you'd like
I would like to search and download data from only creodias in one coroutine and from only geodes in another
I would like to be able to simply instantiate EODataAccessGateway with all and full configuration dictionaries - all the configurations mentioned in the docs here should be exposed to the API and not overridden again
dag = EODataAccessGateway(
user_provider_conf={},
providers_dict={},
product_typs_dict={}
locations_dict={},
)
Ideally only the user provider conf would be needed - as far as I understand this is the one with the dynamic variables like auth, the providers dict represents the external interface. It would be fine to keep a default external interface config but it must be possible to entirely remove providers from the configuration (not just by deprioritizing which causes unexpected behavior)
Describe alternatives you've considered
We have attempted to use priorities for this - but set preferred priority has not been reliable (PEPS being a default search provider, and PEPS being depreciated is a problem). A system silently falling back to an unconfigured provider is a bad system.
We tried to prune unconfigured providers with "_prune_providers_list" but that doesn't actually clean up the list since most providers allow some anuauthorized access?
We have tried to use the methods like add_provider and update_providers_config or simply accessing provders and deleting them dag.providers = []. But these changes never stick and Eodag always finds a way to behave unexpectedly.
Set ENV variables and init the dag in a syncronous method and then use the dag in async methods. This is only safe as long as the search / download etc. methods never checks these variables again. This is not safe since Eodag often seems to want to reconfigure itself by calling load_default_config etc.At the moment we pre configure yml files per provider and then instantiate the class with this provider in a sync method (async dag init fails - ticket) - and then later async download from the dag hoping that these methods do not attempt to access the environment again.
Additional context
We really thought that Eodag would allow dynamic config but the more we work with the package the more it appears that this has not been fully implemented yet. I guess it makes sense since this page is really limited and does not achieve the full dynamic config promised.
I think this is a really important functionality for the library. At the moment I guess the server and CLI works well enough for data discovery by data scientists, but to operationalize Eodag into full programmatic Python pipelines it needs to be fully and configurable in Python without yml files or env variables.
I don't think it will be too hard to do, but it will require some cleanup of methods that are randomly called that depend on these env variables / yml files.
The text was updated successfully, but these errors were encountered:
Hello @andretheronsa and many thanks for your feedback. Documentation needs to be updated to make some things more clear, and we should also find some ways to make the configuration easier (thanks for your inputs 👍).
I would like to search and download data from only creodias in one coroutine and from only geodes in another
This can be performed using the provider parameter:
I would like to be able to simply instantiate EODataAccessGateway with all and full configuration dictionaries
If you want to update provider configuration dynamically you can use update_providers_config method which accepts input configuration as yaml or dict
We tried to prune unconfigured providers with "_prune_providers_list" but that doesn't actually clean up the list since most providers allow some anuauthorized access?
_prune_providers_list only impacts providers that need auth for search
Indeed - I missed the fact that provider in search functionally then propagates to dag.download(search_result). This works exactly as intended.
I was attempting to force / limit the eventual outcome to a provider by cleaning all providers configs except the one I am interested in but that was not possible. Using the search(provider= solves this need. Although I do still think it would be beneficial if the full config (providers, user config, product types and locations) would be accepted in dag.init. Or at least safe public methods to properly replace the internal configs.
Indeed - the issue is the lack of any other safe prune method based on other criteria. Just pruning by name even. Although for my case 1. solves this too.
Ill close the request - unless you think 2. warrants a feature request.
Is your feature request related to a problem? Please describe.
We have been incorporating Eodag into our EO pipeline for search and discovery as intended. However, it is really frustrating to configure Eodag to search and download data from a specific provider. We do not want it to attempt to be super smart and never fail and just fallback to different unconfigured providers as this makes it really hard to debug things like credential issues at one provider. At the moment the only way to configure Eodag is with a yml file or env vars. Neither of these are suited for dynamic programming and suits a CLI / server more.
Importantly, this must be async safe, download large data is naturally an async operation, so this solution cannot rely on songle hardcoded conf file names or env variables.
According to the docs we can use env variables to point to custom configs. These are not all exposed to the dag init and these are not even all used so the docs are not accurate. The providers_cfg_file var for example does not work.
Describe the solution you'd like
Describe alternatives you've considered
add_provider
andupdate_providers_config
or simply accessing provders and deleting themdag.providers = []
. But these changes neverstick
and Eodag always finds a way to behave unexpectedly.load_default_config
etc.At the moment we pre configure yml files per provider and then instantiate the class with this provider in a sync method (async dag init fails - ticket) - and then later async download from the dag hoping that these methods do not attempt to access the environment again.Additional context
We really thought that Eodag would allow dynamic config but the more we work with the package the more it appears that this has not been fully implemented yet. I guess it makes sense since this page is really limited and does not achieve the full dynamic config promised.
I think this is a really important functionality for the library. At the moment I guess the server and CLI works well enough for data discovery by data scientists, but to operationalize Eodag into full programmatic Python pipelines it needs to be fully and configurable in Python without yml files or env variables.
I don't think it will be too hard to do, but it will require some cleanup of methods that are randomly called that depend on these env variables / yml files.
The text was updated successfully, but these errors were encountered: