Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "OpenEO-Identifier" header to synchronous processing response #533

Open
wants to merge 1 commit into
base: draft
Choose a base branch
from

Conversation

soxofaan
Copy link
Member

@soxofaan soxofaan commented May 2, 2024

For accounting purposes, we want detailed tracking of processing costs. For batch job it's trivial to use the batch job id to associate processing costs. For synchronous requests we use internal "request ids", but that is not exposed in a standard way to the end user.

This PR adds a "OpenEO-Identifier" header to the sync processing response, much like the "create batch job" endpoint

Copy link
Member

@m-mohr m-mohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Synchronous processing was meant to be somewhat stateless, but providing an identifier here would somewhat imply state. The other thing is that in the API the identifier would have no further purpose right now, so I'm somewhat hesistant to add it to the spec quite yet.

You can still implement it if you want, but I'm not convinced yet that this should be part of the core specification (while I'm currently trying to slim sown the core spec as well). If we'd have broader demand I'd reconsider it.

@soxofaan
Copy link
Member Author

soxofaan commented Oct 8, 2024

Purely at API surface, synchronous processing is indeed stateless, but in reality on a real back-end you are consuming real resources, which involves accounting or credit spending. You are probably triggering logging/events at the back-end which need some kind of correlation id anyway. And with concepts like export_workspace you could even trigger non-ephemeral storage of your result.

I understand you want to slim down a core spec, but this one would be a simple, optional response header, which just aligns sync and batch jobs.

@m-mohr
Copy link
Member

m-mohr commented Oct 12, 2024

The way to add GET /results/{OpenEO-Identifier} to retrieve results (again) from sync processing is not far then... (and GET /results/{OpenEO-Identifier}/logs etc.).

On the other hand, OGC currently discusses to add synchronous processing to the /jobs endpoints (see opengeospatial/ogcapi-processes#437 ).

So I'm really hesitant to have in the end /results and /jobs doing very similar things when alignment happens...

(And lastly, it seems there's something like Content-ID somewhere defined for HTTP it seems, which feels more favorable than our proprietary OpenEO-Identifier in the long term [2.0+])

@soxofaan
Copy link
Member Author

I'm not sure I get what you mean. On the one hand you say it makes sense to align synchronous and batch more (or at least that there are forces that want to align things), but then you don't want this alignment?

I don't think alignments efforts (like this minor request) will lead to ugly redundant definitions in the openEO API, as there is enough of structural difference between sync and batch.

@soxofaan
Copy link
Member Author

it seems there's something like Content-ID somewhere defined for HTTP it seems, which feels more favorable than our proprietary OpenEO-Identifier in the long term [2.0+])

that's also an interesting pointer indeed

@m-mohr
Copy link
Member

m-mohr commented Oct 14, 2024

On the one hand you say it makes sense to align synchronous and batch more

I think I didn't say that.

or at least that there are forces that want to align things

That's right. But on a more general level. There is an effort to align OGC API - Processes job handling and openEO job handling, but that's pretty messy and the outcome is unknown at this point.

What I probably want to say: There are some unknowns right now that make it a little difficult to judge this. We may add it and then shortly after change it again, which would not be ideal. And generally I'm a bit concerned that sync jobs and batch jobs move so close together that we have two similar parallel instances in the end. (OGC API - Processes for example seems to want to run synchronous processes via /jobs.)

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2024

I'd like to migrate to something standardized such as Content-Identifier (need to check the RFC for the name). At the same time, I'd like to deprecate OpenEO-Identifier. With the migration, I'm happy to add it to /results as well. Then we would just follow HTTP/REST semantics. Thoughts?

@soxofaan
Copy link
Member Author

soxofaan commented Nov 7, 2024

I'm all for integrating with wider standards.
However, from what I've (superficially) found about Content-Identifier/Content-ID, I'm still a bit confused: it seems to be something email-centric and some sources even seem to claim that the id should look like an email-address. I hope I just went down the wrong rabbit hole here

@m-mohr
Copy link
Member

m-mohr commented Nov 7, 2024

Yeah, I still need to get confirmation from OGC how/why this was chosen. I also only find e-mail related RFCs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants