Add rate limits for LLMs and Embedding Models #520

mskarlin · 2024-10-03T00:03:58Z

LiteLLM's rate limits weren't suitable for PaperQA in that we wanted rate limits that could span models. This PR adds them in with both an in-memory based rate limiter, as well as a Redis based one for rate limiting across processes.

The implementation adds a new decorator, rate_limited to the LiteLLMModel class across all 4 inference methods. This decorator checks for rate limits before (with prompt tokens) and after (with completion tokens) inference. If tokens aren't known (like when using the *_iter methods), then it uses an estimate with character count divided by the CHARACTERS_PER_TOKEN constant (4). It's technically possible, with low rate limits that don't correspond with a max_token cutoff, that the completions tokens could exceed your maximum allowable tokens in your window of time (say your limit is 20 tokens per second and 100 tokens come back). In this case the rate limiter will wait it out such that your amortized rate will fall back to 20 tokens per second.

The implementation is similar for the LiteLLMModel and the LiteLLMEmbeddingModel, where you give the config attribute a key for rate limits like this:

llm = LiteLLMModel(name='gpt-4o-mini', config={
            "rate_limit": {"gpt-4o-mini": RateLimitItemPerSecond(20, 3)},
        })

or

llm = LiteLLMModel(name='gpt-4o-mini', config={
            "model_list": [
                {
                    "model_name": "gpt-4o-mini",
                    "litellm_params": {
                        "model": "gpt-4o-mini",
                        "temperature": 0,
                    },
                }
            ],
            "rate_limit": {"gpt-4o-mini": RateLimitItemPerSecond(20, 1)},
        },
    })

and for the embedding model:

embedding = LiteLLMEmbeddingModel(name='text-embedding-3-small', config={"rate_limit": RateLimitItemPerSecond(20, 5)})

paperqa/llms.py

paperqa/rate_limiter.py

paperqa/llms.py

whitead · 2024-10-03T03:21:46Z

I tried this on pqa ask with three papers, and I get a rate limit error.

I turned on default settings (confusingly pqa ask defaults to high_quality)

Was using our tier 1 project, just three medium size pdfs in the index.

…ehints to return properties, add basal tokens to completion

paperqa/rate_limiter.py

…through, update default rate limits

mskarlin · 2024-10-03T23:09:22Z

I tried this on pqa ask with three papers, and I get a rate limit error.

I turned on default settings (confusingly pqa ask defaults to high_quality)

Was using our tier 1 project, just three medium size pdfs in the index.

You can ask via pqa --settings 'tier1_limits' ask 'can pigs fly?' and you should be good now

README.md

paperqa/configs/tier2_limits.json

paperqa/rate_limiter.py

…rate_limit classes

README.md

jamesbraza · 2024-10-03T23:57:27Z

paperqa/rate_limiter.py

+# RATE_CONFIG keys are tuples, corresponding to a namespace and primary key.
+# Anything defined with MATCH_ALL variable, will match all non-matched requests for that namespace.
+# For the "get" namespace, all primary key urls will be parsed down to the domain level.
+# For example, you're trying to do a get request to "https://google.com", "google.com" will get
+# its own limit, and it will use the ("get", MATCH_ALL) for its limits.
+# machine_id is a unique identifier for the machine making the request, it's used to limit the
+# rate of requests per machine. If the primary_key is in the NO_MACHINE_ID_EXTENSIONS list, then
+# the dynamic IP of the machine will be used to limit the rate of requests, otherwise the
+# user input machine_id will be used.


Wdyt of moving this directly above RATE_CONFIG? It's nice to keep docs next to their usage

jamesbraza

Co-authored-by: James Braza <[email protected]>

…de to not use json, and add warning for users

…n-determinism in crossref

…rectory

…imits

…rectory

pyproject.toml

mskarlin added 8 commits September 23, 2024 09:53

added some rate limit checks into llm wrapper class

5d2fa3a

added rate limit tests

f98394b

update type annotations

6eb4a2c

merge with main branch

dfc9e2d

remove CHARACTERS_PER_TOKEN as a classvar

1bd1776

need to await achat in _achat

83b9f68

remove un-needed fixtures

cbc4454

merge with main

d6d5c43

mskarlin requested review from whitead and jamesbraza October 3, 2024 00:03

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Oct 3, 2024

jamesbraza reviewed Oct 3, 2024

View reviewed changes

paperqa/llms.py Outdated Show resolved Hide resolved

paperqa/rate_limiter.py Show resolved Hide resolved

paperqa/rate_limiter.py Outdated Show resolved Hide resolved

paperqa/llms.py Outdated Show resolved Hide resolved

mskarlin added 2 commits October 3, 2024 08:51

rename CHARACTERS_PER_TOKEN->CHARACTERS_PER_TOKEN_ASSUMPTION, add typ…

0c0936c

…ehints to return properties, add basal tokens to completion

refurb update to remove defaults

8331f9b

jamesbraza reviewed Oct 3, 2024

View reviewed changes

add passthrough from settings into llm rate limits, add test for pass…

6b60dc8

…through, update default rate limits

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Oct 3, 2024

mskarlin added 4 commits October 3, 2024 15:42

add rate limits to readme and refurb non-default fix

d7c77ed

Merge branch 'main' into rate-limits

d5a85ee

add google style docstring to rate limiter

e1c24f5

make imports reference the package

ff1b7ea

jamesbraza reviewed Oct 3, 2024

View reviewed changes

README.md Show resolved Hide resolved

paperqa/configs/tier2_limits.json Outdated Show resolved Hide resolved

jamesbraza reviewed Oct 3, 2024

View reviewed changes

paperqa/rate_limiter.py Outdated Show resolved Hide resolved

paperqa/rate_limiter.py Outdated Show resolved Hide resolved

paperqa/rate_limiter.py Outdated Show resolved Hide resolved

paperqa/rate_limiter.py Show resolved Hide resolved

paperqa/rate_limiter.py Outdated Show resolved Hide resolved

modify readme, rename timeout->acquire_timeout, remove defaults from …

1efd7aa

…rate_limit classes

jamesbraza reviewed Oct 3, 2024

View reviewed changes

jamesbraza approved these changes Oct 3, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 3, 2024

mskarlin and others added 11 commits October 3, 2024 16:58

Merge branch 'main' into rate-limits

e2c240f

Update README.md

d3785b4

Co-authored-by: James Braza <[email protected]>

Update README.md

5230302

Co-authored-by: James Braza <[email protected]>

update tests to support default use of json, change debug and fast mo…

033c033

…de to not use json, and add warning for users

re-recorded vcr cassette with new settings + added docs for dockey no…

3452018

…n-determinism in crossref

merged with main

b62ae7b

move debug.json settings under parsing heading

45fe0f8

Merge branch 'main' into rate-limits

9d271e3

ensure test_gather_evidence_rejects_empty_docs uses the paper stub di…

5f8948e

…rectory

Merge branch 'rate-limits' of github.com:whitead/paper-qa into rate-l…

b7286d9

…imits

ensure test_gather_evidence_rejects_empty_docs uses the paper stub di…

546e6d2

…rectory

mskarlin merged commit ef1a027 into main Oct 4, 2024
5 checks passed

mskarlin deleted the rate-limits branch October 4, 2024 22:13

jamesbraza mentioned this pull request Oct 5, 2024

Propagating AgentSettings.agent_type default for synchrony #533

Merged

jamesbraza reviewed Oct 5, 2024

View reviewed changes

pyproject.toml Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add rate limits for LLMs and Embedding Models #520

Add rate limits for LLMs and Embedding Models #520

mskarlin commented Oct 3, 2024

whitead commented Oct 3, 2024

mskarlin commented Oct 3, 2024

jamesbraza Oct 3, 2024

jamesbraza left a comment

Add rate limits for LLMs and Embedding Models #520

Add rate limits for LLMs and Embedding Models #520

Conversation

mskarlin commented Oct 3, 2024

whitead commented Oct 3, 2024

mskarlin commented Oct 3, 2024

jamesbraza Oct 3, 2024

Choose a reason for hiding this comment

jamesbraza left a comment

Choose a reason for hiding this comment