Interrupt ongoing requests at end of test #37

dagrayvid · 2024-03-18T14:51:34Z

Currently at the end of the test duration, the main process waits for the user processes to finish all active requests. This behavior can produce strange results when load test concurrency goes above the maximum batch size that the runtime can handle for a given model. In cases like these, the server side throughput looks lower because of the time spent finishing up the last few pending requests, not fully utilizing the server side resources.

Some potential solutions:

In streaming case, user processes can check if the test is over between each token
Main process can communicate expected end time to the user processes, and user processes can add a timeout to the http requests based on the end time of the test.
Keep existing test behavior and filter out the results for requests that ended after the test end time in the results processing code.

npalaska mentioned this issue Mar 19, 2024

Track output tokens before test duration timeout and update throughput accordingly #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interrupt ongoing requests at end of test #37

Interrupt ongoing requests at end of test #37

dagrayvid commented Mar 18, 2024

Interrupt ongoing requests at end of test #37

Interrupt ongoing requests at end of test #37

Comments

dagrayvid commented Mar 18, 2024