You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'll throw in a little bit more context to let this be a good-first-issue.
Hydro-serving is able to shadow data between multiple model variants in a serving application.
i.e. A 5% canary test can look like this
Application ‘A’
|
| - Variant 1: model ‘a’ version 1. weight=95
| - Variant 2: model ‘a’ version 2. weight=5
How shadowing is done:
Whenever a serving application endpoint receives a request it shadows received data to all model variants for processing.
Only after all model variants produce outputs we choose an output from one of these models randomly, according to the weights associated with each of these model variants.
Thus, we shadow incoming data to all model variants but return output only from a single one.
Since we wait for all model variants to finish output calculation we are left with incorrect latency which is a maximum latency of all model variants.
To improve throughput and calculate latency properly per each model variant we need to stop waiting for all model variants to produce their outputs and choose the model which output will be returned before outputs are calculated.
Improve A/B execution by defining return value BEFORE execution happens.
The text was updated successfully, but these errors were encountered: