-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cool down when hitting rate limit with online processors #256
Conversation
Test
Observations: Having something that is more responsive, as discussed in #233, like exponential retries that intelligently adapts and finds the correct rate limit would be better. Right now we are relying on the local rate limit being correctly set. This PR helps mitigate failures when we hit rate limits randomly in patches (e.g. connection limit isn't actually a RPM rate limit so longer requests cause our estimation of RPM to be too large and we hit rate limits. This is also potentially better solved by #253) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes #252
We had cooldown before for openai:
curator/src/bespokelabs/curator/request_processor/openai_online_request_processor.py
Lines 456 to 473 in 40aa7df
However, the way the main loop is running has changed so we need to change it a bit