Feature: Add FailureLocation support to SageMaker Runtime #7905

gael-ft · 2024-07-30T12:17:37Z

Not sure if that's a feature or a bugfix actually.

SageMaker invoke_endpoint_async is supposed to return docs:

InferenceId
OutputLocation
FailureLocation <-- this one is currently missing in Moto implementation

This makes code hard to test when we assume this field to be returned by AWS endpoint, and making it optional because cannot be properly mocked in tests does not look like a good direction.

Doing so, I added a specific endpoint as well to Moto backend.
Idea is that we were not really able to customize the (async) inference output that is stored in S3.
Previously, the output schema was forced to something like:

{
    "Body": body,
    "ContentType": _type,
    "InvokedProductionVariant": variant,
    "CustomAttributes": attrs,
}

That's the expected format for the sync inference response, but should not be forced on async endpoint IMHO.

For example, if I expect my endpoint to produce and output like:

{
  "bbox": "..."
  "text": "..."
}

Well we have to hack the test flow to make it work

bblommers

Hi @gael-ft! Thank you for submitting a PR for this.

My main concern with having a new endpoint is that it involves a breaking change for all current users, who are using the regular /endpoint-results endpoint.

How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not? Something like:

expected_results = {
    "account_id": "123456789012",  # This is the default - can be omitted
    "region": "us-east-1",  # This is the default - can be omitted
    "results": [
        {
             "Body": "first body",
             "ContentType": "text/xml",
             "InvokedProductionVariant": "prod",
             "CustomAttributes": "my_attr",
            "_async_failure": True,  # False by default - ignored for sync results
         },
        # other results as required
    ],
}

As far as I can see, that would give 3 advantages:

Existing users will not have to change anything
All users can easily swap between sync and async results (or even use both)
And it simplifies the required implementation

gael-ft · 2024-08-24T13:54:32Z

Hi @bblommers, thanks for reviewing my proposal !

How would you feel about keeping the existing endpoint, but adding another field to the supplied results to indicate whether a result is a failure or not?

I first thought I could do something similar.
But then the problem is the "result" payload:

{
  "Body": "first body",
  "ContentType": "text/xml",
  "InvokedProductionVariant": "prod",
  "CustomAttributes": "my_attr",
},

Follows SageMaker real-time inference response 👍
But with SageMaker async inference, the "result" payload that will be stored in S3 has no specific schema.
So users basically push what they want/need.

As described above, I can consider my async inference output to be:

{
  "bbox": "..."
  "text": "..."
}

And I am not expecting (in real world or moto) any Body, ContentType, ... fields.

To be honest, I considered as well doing some kind of condition to say:

if async inference, then only upload Body field, and discard everything else

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference.
If they do, they must anyway (here is the PR to avoid it) override the S3 file right after so that it fits what they expect as result from their inference.

Given that statement above, I thought it was cleaner to have a second endpoint, to avoid having to set fields (because we expect them to be present in current Moto code) that will finally be completly discarded. (Talking about ContentType, CustomAttributes, ...)

bblommers · 2024-08-25T10:00:57Z

Ah, right, I understand - thank you for the explanation @gael-ft.

But then I just realized that I couldn't imagine any current existing test using this endpoint to validate async inference.

Me neither, based on your explanation - but if there's one thing I've learned as a maintainer, is that people will always find a way to abuse Moto in a way that I've never thought about. 🙂

So, second proposal: we add the async endpoint, but leave the current implementation as a fallback. People can/should use the proper async endpoint as it's supposed to be used, but people who somehow got the sync-endpoint to work can also still use it.

As a very rough draft:

if self.async_results_queue:
    # new implementation
    ...
elif self.results_queue:
    # existing implementation
    ...

Later on we can always phase out the old/existing implementation.

Does that work for you?

gael-ft · 2024-08-25T11:15:27Z

Sounds good to me !
I'll try to have a look at it in the coming days.

…ponses # Conflicts: # moto/moto_api/_internal/urls.py

gael-ft · 2024-09-06T12:18:28Z

Hello @bblommers,

Added fallback to "sync" results queue, as discussed, and tested accordingly.

codecov · 2024-09-07T10:44:06Z

Codecov Report

Attention: Patch coverage is 93.10345% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.45%. Comparing base (9da032b) to head (4efcbb9).
Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
moto/sagemakerruntime/models.py	84.61%	2 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #7905   +/-   ##
=======================================
  Coverage   94.45%   94.45%           
=======================================
  Files        1141     1141           
  Lines       98184    98203   +19     
=======================================
+ Hits        92737    92762   +25     
+ Misses       5447     5441    -6

Flag	Coverage Δ
servertests	`28.95% <6.89%> (-0.01%)`	⬇️
unittests	`94.43% <93.10%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bblommers

Great - thank you for implementing this @gael-ft!

github-actions · 2024-09-07T11:22:33Z

This is now part of moto >= 5.0.14.dev66

Add FailureLocation support to SageMaker Runtime

b9d4af6

bblommers reviewed Aug 8, 2024

View reviewed changes

bblommers mentioned this pull request Aug 9, 2024

Sagemaker runtime: InvokeEndpointAsync does not provide FailureLocation parameter #7956

Closed

gael-ft added 2 commits September 6, 2024 14:11

Fallback to sync queue for backward compat

6c75cf0

Merge remote-tracking branch 'origin/master' into sagemaker-async-res…

a84c6bb

…ponses # Conflicts: # moto/moto_api/_internal/urls.py

bblommers added 2 commits September 7, 2024 09:54

Linting

77ba838

Linting

4efcbb9

bblommers approved these changes Sep 7, 2024

View reviewed changes

bblommers added this to the 5.0.14 milestone Sep 7, 2024

bblommers merged commit 17c2fa9 into getmoto:master Sep 7, 2024
51 checks passed

gael-ft deleted the sagemaker-async-responses branch September 7, 2024 13:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Add FailureLocation support to SageMaker Runtime #7905

Feature: Add FailureLocation support to SageMaker Runtime #7905

gael-ft commented Jul 30, 2024 •

edited

Loading

bblommers left a comment

gael-ft commented Aug 24, 2024

bblommers commented Aug 25, 2024

gael-ft commented Aug 25, 2024

gael-ft commented Sep 6, 2024

codecov bot commented Sep 7, 2024 •

edited

Loading

bblommers left a comment

github-actions bot commented Sep 7, 2024

Feature: Add FailureLocation support to SageMaker Runtime #7905

Feature: Add FailureLocation support to SageMaker Runtime #7905

Conversation

gael-ft commented Jul 30, 2024 • edited Loading

bblommers left a comment

Choose a reason for hiding this comment

gael-ft commented Aug 24, 2024

bblommers commented Aug 25, 2024

gael-ft commented Aug 25, 2024

gael-ft commented Sep 6, 2024

codecov bot commented Sep 7, 2024 • edited Loading

Codecov Report

bblommers left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 7, 2024

gael-ft commented Jul 30, 2024 •

edited

Loading

codecov bot commented Sep 7, 2024 •

edited

Loading