Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full dynamic configuration during EODataAccessGateway init #1522

Closed
andretheronsa opened this issue Feb 18, 2025 · 2 comments
Closed

Full dynamic configuration during EODataAccessGateway init #1522

andretheronsa opened this issue Feb 18, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@andretheronsa
Copy link

Is your feature request related to a problem? Please describe.
We have been incorporating Eodag into our EO pipeline for search and discovery as intended. However, it is really frustrating to configure Eodag to search and download data from a specific provider. We do not want it to attempt to be super smart and never fail and just fallback to different unconfigured providers as this makes it really hard to debug things like credential issues at one provider. At the moment the only way to configure Eodag is with a yml file or env vars. Neither of these are suited for dynamic programming and suits a CLI / server more.

Importantly, this must be async safe, download large data is naturally an async operation, so this solution cannot rely on songle hardcoded conf file names or env variables.

According to the docs we can use env variables to point to custom configs. These are not all exposed to the dag init and these are not even all used so the docs are not accurate. The providers_cfg_file var for example does not work.

Describe the solution you'd like

  1. I would like to search and download data from only creodias in one coroutine and from only geodes in another
  2. I would like to be able to simply instantiate EODataAccessGateway with all and full configuration dictionaries - all the configurations mentioned in the docs here should be exposed to the API and not overridden again
dag = EODataAccessGateway(
    user_provider_conf={},
    providers_dict={},
    product_typs_dict={}
    locations_dict={},
)
  1. Ideally only the user provider conf would be needed - as far as I understand this is the one with the dynamic variables like auth, the providers dict represents the external interface. It would be fine to keep a default external interface config but it must be possible to entirely remove providers from the configuration (not just by deprioritizing which causes unexpected behavior)

Describe alternatives you've considered

  1. We have attempted to use priorities for this - but set preferred priority has not been reliable (PEPS being a default search provider, and PEPS being depreciated is a problem). A system silently falling back to an unconfigured provider is a bad system.
  2. We tried to prune unconfigured providers with "_prune_providers_list" but that doesn't actually clean up the list since most providers allow some anuauthorized access?
  3. We have tried to use the methods like add_provider and update_providers_config or simply accessing provders and deleting them dag.providers = []. But these changes never stick and Eodag always finds a way to behave unexpectedly.
  4. Set ENV variables and init the dag in a syncronous method and then use the dag in async methods. This is only safe as long as the search / download etc. methods never checks these variables again. This is not safe since Eodag often seems to want to reconfigure itself by calling load_default_config etc.At the moment we pre configure yml files per provider and then instantiate the class with this provider in a sync method (async dag init fails - ticket) - and then later async download from the dag hoping that these methods do not attempt to access the environment again.
class Download:
  def __init__(self, provider_name: str):
          os.environ["EODAG_CFG_FILE"] = str(DIR / "eodag" / f"user_conf_{self.provider_name}.yml")
          os.environ["EODAG_PROVIDERS_CFG_FILE"] = str(DIR / "eodag" / f"providers_{self.provider_name}.yml")
          os.environ["EODAG_EXT_PRODUCT_TYPES_CFG_FIL"] = str(DIR / "eodag" / "product_types.yml")
          self.dag = EODataAccessGateway()
  
  async def download(self)
      search_result = self.dag.search()
      self.dag.download(search_result[0])

Additional context
We really thought that Eodag would allow dynamic config but the more we work with the package the more it appears that this has not been fully implemented yet. I guess it makes sense since this page is really limited and does not achieve the full dynamic config promised.

I think this is a really important functionality for the library. At the moment I guess the server and CLI works well enough for data discovery by data scientists, but to operationalize Eodag into full programmatic Python pipelines it needs to be fully and configurable in Python without yml files or env variables.

I don't think it will be too hard to do, but it will require some cleanup of methods that are randomly called that depend on these env variables / yml files.

@andretheronsa andretheronsa added the enhancement New feature or request label Feb 18, 2025
@sbrunato
Copy link
Collaborator

Hello @andretheronsa and many thanks for your feedback. Documentation needs to be updated to make some things more clear, and we should also find some ways to make the configuration easier (thanks for your inputs 👍).

  1. I would like to search and download data from only creodias in one coroutine and from only geodes in another

This can be performed using the provider parameter:

results = dag.search(provider="creodias", **other_criteria)

See API User guide / search (not very visible in the documentation...)


  1. I would like to be able to simply instantiate EODataAccessGateway with all and full configuration dictionaries

If you want to update provider configuration dynamically you can use update_providers_config method which accepts input configuration as yaml or dict


  1. We tried to prune unconfigured providers with "_prune_providers_list" but that doesn't actually clean up the list since most providers allow some anuauthorized access?

_prune_providers_list only impacts providers that need auth for search

@andretheronsa
Copy link
Author

Thanks for the quick response @sbrunato!

  1. Indeed - I missed the fact that provider in search functionally then propagates to dag.download(search_result). This works exactly as intended.
  2. I was attempting to force / limit the eventual outcome to a provider by cleaning all providers configs except the one I am interested in but that was not possible. Using the search(provider= solves this need. Although I do still think it would be beneficial if the full config (providers, user config, product types and locations) would be accepted in dag.init. Or at least safe public methods to properly replace the internal configs.
  3. Indeed - the issue is the lack of any other safe prune method based on other criteria. Just pruning by name even. Although for my case 1. solves this too.

Ill close the request - unless you think 2. warrants a feature request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants