Allow timeout during trained model download process #129003

dan-rubinstein · 2025-06-05T17:45:18Z

Description

We currently allow users to provide a timeout during inference endpoint creation and when performing an inference request. When creating an endpoint requiring a trained model deployment to be started or performing an inference request to a default endpoint that does not have a trained model deployment started we will download the model before starting a deployment if it has not been previously downloaded. During this download process, we do not currently timeout if the user's requested timeout is exceeded and instead download the model fully and then timeout during the model deployment starting process. This change fixes this poor experience and allows the system to timeout during the model download. If this timeout occurs, we should still retain the experience that the model will be downloaded and a trained model deployment will be started in the background so the user does not have to take any further action for the process to complete.

Testing

Tested that locally creating an ElasticsearchInternalService endpoint with a small timeout (1 second) will throw the ModelDeploymentTimeoutException and will complete the download/deployment start asynchronously.
Tested that calling inference on a default endpoint with no model downloaded/no trained model deployment started has the same experience as the test above.
Should we have some QA tests or IT tests for this?
- Discussed with Wei and he will be working on QA tests for this as part of this issue

elasticsearchmachine · 2025-06-05T17:45:44Z

Hi @dan-rubinstein, I've created a changelog YAML for you.

dan-rubinstein · 2025-06-06T15:39:17Z

@elasticmachine merge upstream

elasticsearchmachine · 2025-06-06T17:27:11Z

Pinging @elastic/ml-core (Team:ML)

dan-rubinstein · 2025-07-02T13:35:17Z

@elasticmachine merge upstream

* Allow timeout during trained model download process * Update docs/changelog/129003.yaml * Update timeout message --------- Co-authored-by: Elastic Machine <[email protected]>

Allow timeout during trained model download process

Loading
Loading status checks…

e4a7481

dan-rubinstein added >bug :ml Team:ML v8.19.0 v9.1.0 labels Jun 5, 2025

Update docs/changelog/129003.yaml

Loading
Loading status checks…

1af3137

Merge branch 'main' into timeout-during-model-download

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

2edafbf

dan-rubinstein marked this pull request as ready for review June 6, 2025 17:26

jonathan-buttner approved these changes Jun 17, 2025

View reviewed changes

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

dan-rubinstein removed the v8.19.0 label Jul 2, 2025

elasticmachine and others added 2 commits July 2, 2025 15:35

Merge branch 'main' into timeout-during-model-download

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

Loading
Loading status checks…

208c1d9

Update timeout message

Loading
Loading status checks…

bf2d6c5

dan-rubinstein requested a review from jonathan-buttner July 2, 2025 14:32

jonathan-buttner approved these changes Jul 2, 2025

View reviewed changes

dan-rubinstein merged commit 136442d into elastic:main Jul 2, 2025
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allow timeout during trained model download process #129003

Allow timeout during trained model download process #129003

dan-rubinstein commented Jun 5, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

dan-rubinstein commented Jun 6, 2025

Uh oh!

elasticsearchmachine commented Jun 6, 2025

Uh oh!

dan-rubinstein commented Jul 2, 2025

Uh oh!

Uh oh!

Allow timeout during trained model download process #129003

Allow timeout during trained model download process #129003

Conversation

dan-rubinstein commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 5, 2025

Uh oh!

dan-rubinstein commented Jun 6, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 6, 2025

Uh oh!

dan-rubinstein commented Jul 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dan-rubinstein commented Jun 5, 2025 •

edited

Loading