You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently at the end of the test duration, the main process waits for the user processes to finish all active requests. This behavior can produce strange results when load test concurrency goes above the maximum batch size that the runtime can handle for a given model. In cases like these, the server side throughput looks lower because of the time spent finishing up the last few pending requests, not fully utilizing the server side resources.
Some potential solutions:
In streaming case, user processes can check if the test is over between each token
Main process can communicate expected end time to the user processes, and user processes can add a timeout to the http requests based on the end time of the test.
Keep existing test behavior and filter out the results for requests that ended after the test end time in the results processing code.
The text was updated successfully, but these errors were encountered:
Currently at the end of the test duration, the main process waits for the user processes to finish all active requests. This behavior can produce strange results when load test concurrency goes above the maximum batch size that the runtime can handle for a given model. In cases like these, the server side throughput looks lower because of the time spent finishing up the last few pending requests, not fully utilizing the server side resources.
Some potential solutions:
The text was updated successfully, but these errors were encountered: