-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Add FailureLocation support to SageMaker Runtime #7905
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @gael-ft! Thank you for submitting a PR for this.
My main concern with having a new endpoint is that it involves a breaking change for all current users, who are using the regular /endpoint-results
endpoint.
How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not? Something like:
expected_results = {
"account_id": "123456789012", # This is the default - can be omitted
"region": "us-east-1", # This is the default - can be omitted
"results": [
{
"Body": "first body",
"ContentType": "text/xml",
"InvokedProductionVariant": "prod",
"CustomAttributes": "my_attr",
"_async_failure": True, # False by default - ignored for sync results
},
# other results as required
],
}
As far as I can see, that would give 3 advantages:
- Existing users will not have to change anything
- All users can easily swap between sync and async results (or even use both)
- And it simplifies the required implementation
Hi @bblommers, thanks for reviewing my proposal !
I first thought I could do something similar. {
"Body": "first body",
"ContentType": "text/xml",
"InvokedProductionVariant": "prod",
"CustomAttributes": "my_attr",
}, Follows SageMaker real-time inference response 👍 As described above, I can consider my async inference output to be: {
"bbox": "..."
"text": "..."
} And I am not expecting (in real world or moto) any To be honest, I considered as well doing some kind of condition to say:
But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference. Given that statement above, I thought it was cleaner to have a second endpoint, to avoid having to set fields (because we expect them to be present in current Moto code) that will finally be completly discarded. (Talking about |
Ah, right, I understand - thank you for the explanation @gael-ft.
Me neither, based on your explanation - but if there's one thing I've learned as a maintainer, is that people will always find a way to So, second proposal: we add the async endpoint, but leave the current implementation as a fallback. People can/should use the proper async endpoint as it's supposed to be used, but people who somehow got the sync-endpoint to work can also still use it. As a very rough draft:
Later on we can always phase out the old/existing implementation. Does that work for you? |
Sounds good to me ! |
…ponses # Conflicts: # moto/moto_api/_internal/urls.py
Hello @bblommers, Added fallback to "sync" results queue, as discussed, and tested accordingly. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7905 +/- ##
=======================================
Coverage 94.45% 94.45%
=======================================
Files 1141 1141
Lines 98184 98203 +19
=======================================
+ Hits 92737 92762 +25
+ Misses 5447 5441 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great - thank you for implementing this @gael-ft!
This is now part of moto >= 5.0.14.dev66 |
Not sure if that's a feature or a bugfix actually.
SageMaker
invoke_endpoint_async
is supposed to return docs:This makes code hard to test when we assume this field to be returned by AWS endpoint, and making it optional because cannot be properly mocked in tests does not look like a good direction.
Doing so, I added a specific endpoint as well to Moto backend.
Idea is that we were not really able to customize the (async) inference output that is stored in S3.
Previously, the output schema was forced to something like:
That's the expected format for the sync inference response, but should not be forced on async endpoint IMHO.
For example, if I expect my endpoint to produce and output like:
Well we have to hack the test flow to make it work