-
@eyurtsev
How to do it by langserve? Thanks in advances |
Beta Was this translation helpful? Give feedback.
Answered by
eyurtsev
Jun 26, 2024
Replies: 1 comment
-
Easiest way is to prepend a runnable lambda to your runnable. That runnable lambda should be a passthrough that implements a rate limiting algorithm. The easiest way to do this is using a token bucket algorithm. We haven't built in support into the framework for this yet. In the meanwhile you can adapt the code here: https://github.com/langchain-ai/langchain-benchmarks/blob/main/langchain_benchmarks/rate_limiting.py#L90 |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
kientv
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Easiest way is to prepend a runnable lambda to your runnable. That runnable lambda should be a passthrough that implements a rate limiting algorithm. The easiest way to do this is using a token bucket algorithm. We haven't built in support into the framework for this yet. In the meanwhile you can adapt the code here: https://github.com/langchain-ai/langchain-benchmarks/blob/main/langchain_benchmarks/rate_limiting.py#L90