Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add FailureLocation support to SageMaker Runtime #7905

Merged
merged 5 commits into from
Sep 7, 2024

Conversation

gael-ft
Copy link
Contributor

@gael-ft gael-ft commented Jul 30, 2024

Not sure if that's a feature or a bugfix actually.

SageMaker invoke_endpoint_async is supposed to return docs:

  • InferenceId
  • OutputLocation
  • FailureLocation <-- this one is currently missing in Moto implementation

This makes code hard to test when we assume this field to be returned by AWS endpoint, and making it optional because cannot be properly mocked in tests does not look like a good direction.


Doing so, I added a specific endpoint as well to Moto backend.
Idea is that we were not really able to customize the (async) inference output that is stored in S3.
Previously, the output schema was forced to something like:

{
    "Body": body,
    "ContentType": _type,
    "InvokedProductionVariant": variant,
    "CustomAttributes": attrs,
}

That's the expected format for the sync inference response, but should not be forced on async endpoint IMHO.

For example, if I expect my endpoint to produce and output like:

{
  "bbox": "..."
  "text": "..."
}

Well we have to hack the test flow to make it work

Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gael-ft! Thank you for submitting a PR for this.

My main concern with having a new endpoint is that it involves a breaking change for all current users, who are using the regular /endpoint-results endpoint.

How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not? Something like:

expected_results = {
    "account_id": "123456789012",  # This is the default - can be omitted
    "region": "us-east-1",  # This is the default - can be omitted
    "results": [
        {
             "Body": "first body",
             "ContentType": "text/xml",
             "InvokedProductionVariant": "prod",
             "CustomAttributes": "my_attr",
            "_async_failure": True,  # False by default - ignored for sync results
         },
        # other results as required
    ],
}

As far as I can see, that would give 3 advantages:

  1. Existing users will not have to change anything
  2. All users can easily swap between sync and async results (or even use both)
  3. And it simplifies the required implementation

@gael-ft
Copy link
Contributor Author

gael-ft commented Aug 24, 2024

Hi @bblommers, thanks for reviewing my proposal !

How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not?

I first thought I could do something similar.
But then the problem is the "result" payload:

{
  "Body": "first body",
  "ContentType": "text/xml",
  "InvokedProductionVariant": "prod",
  "CustomAttributes": "my_attr",
},

Follows SageMaker real-time inference response 👍
But with SageMaker async inference, the "result" payload that will be stored in S3 has no specific schema.
So users basically push what they want/need.

As described above, I can consider my async inference output to be:

{
  "bbox": "..."
  "text": "..."
}

And I am not expecting (in real world or moto) any Body, ContentType, ... fields.


To be honest, I considered as well doing some kind of condition to say:

if async inference, then only upload Body field, and discard everything else

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference.
If they do, they must anyway (here is the PR to avoid it) override the S3 file right after so that it fits what they expect as result from their inference.

Given that statement above, I thought it was cleaner to have a second endpoint, to avoid having to set fields (because we expect them to be present in current Moto code) that will finally be completly discarded. (Talking about ContentType, CustomAttributes, ...)

@bblommers
Copy link
Collaborator

Ah, right, I understand - thank you for the explanation @gael-ft.

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference.

Me neither, based on your explanation - but if there's one thing I've learned as a maintainer, is that people will always find a way to abuse Moto in a way that I've never thought about. 🙂

So, second proposal: we add the async endpoint, but leave the current implementation as a fallback. People can/should use the proper async endpoint as it's supposed to be used, but people who somehow got the sync-endpoint to work can also still use it.

As a very rough draft:

if self.async_results_queue:
    # new implementation
    ...
elif self.results_queue:
    # existing implementation
    ...

Later on we can always phase out the old/existing implementation.

Does that work for you?

@gael-ft
Copy link
Contributor Author

gael-ft commented Aug 25, 2024

Sounds good to me !
I'll try to have a look at it in the coming days.

@gael-ft
Copy link
Contributor Author

gael-ft commented Sep 6, 2024

Hello @bblommers,

Added fallback to "sync" results queue, as discussed, and tested accordingly.

Copy link

codecov bot commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.45%. Comparing base (9da032b) to head (4efcbb9).
Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
moto/sagemakerruntime/models.py 84.61% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7905   +/-   ##
=======================================
  Coverage   94.45%   94.45%           
=======================================
  Files        1141     1141           
  Lines       98184    98203   +19     
=======================================
+ Hits        92737    92762   +25     
+ Misses       5447     5441    -6     
Flag Coverage Δ
servertests 28.95% <6.89%> (-0.01%) ⬇️
unittests 94.43% <93.10%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@bblommers bblommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thank you for implementing this @gael-ft!

@bblommers bblommers added this to the 5.0.14 milestone Sep 7, 2024
@bblommers bblommers merged commit 17c2fa9 into getmoto:master Sep 7, 2024
51 checks passed
Copy link
Contributor

github-actions bot commented Sep 7, 2024

This is now part of moto >= 5.0.14.dev66

@gael-ft gael-ft deleted the sagemaker-async-responses branch September 7, 2024 13:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants