Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rate limits for LLMs and Embedding Models #520

Merged
merged 27 commits into from
Oct 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
5d2fa3a
added some rate limit checks into llm wrapper class
mskarlin Sep 23, 2024
f98394b
added rate limit tests
mskarlin Oct 2, 2024
6eb4a2c
update type annotations
mskarlin Oct 2, 2024
dfc9e2d
merge with main branch
mskarlin Oct 2, 2024
1bd1776
remove CHARACTERS_PER_TOKEN as a classvar
mskarlin Oct 2, 2024
83b9f68
need to await achat in _achat
mskarlin Oct 2, 2024
cbc4454
remove un-needed fixtures
mskarlin Oct 2, 2024
d6d5c43
merge with main
mskarlin Oct 2, 2024
0c0936c
rename CHARACTERS_PER_TOKEN->CHARACTERS_PER_TOKEN_ASSUMPTION, add typ…
mskarlin Oct 3, 2024
8331f9b
refurb update to remove defaults
mskarlin Oct 3, 2024
6b60dc8
add passthrough from settings into llm rate limits, add test for pass…
mskarlin Oct 3, 2024
d7c77ed
add rate limits to readme and refurb non-default fix
mskarlin Oct 3, 2024
d5a85ee
Merge branch 'main' into rate-limits
mskarlin Oct 3, 2024
e1c24f5
add google style docstring to rate limiter
mskarlin Oct 3, 2024
ff1b7ea
make imports reference the package
mskarlin Oct 3, 2024
1efd7aa
modify readme, rename timeout->acquire_timeout, remove defaults from …
mskarlin Oct 3, 2024
e2c240f
Merge branch 'main' into rate-limits
mskarlin Oct 3, 2024
d3785b4
Update README.md
mskarlin Oct 3, 2024
5230302
Update README.md
mskarlin Oct 3, 2024
033c033
update tests to support default use of json, change debug and fast mo…
mskarlin Oct 4, 2024
3452018
re-recorded vcr cassette with new settings + added docs for dockey no…
mskarlin Oct 4, 2024
b62ae7b
merged with main
mskarlin Oct 4, 2024
45fe0f8
move debug.json settings under parsing heading
mskarlin Oct 4, 2024
9d271e3
Merge branch 'main' into rate-limits
mskarlin Oct 4, 2024
5f8948e
ensure test_gather_evidence_rejects_empty_docs uses the paper stub di…
mskarlin Oct 4, 2024
b7286d9
Merge branch 'rate-limits' of github.com:whitead/paper-qa into rate-l…
mskarlin Oct 4, 2024
546e6d2
ensure test_gather_evidence_rejects_empty_docs uses the paper stub di…
mskarlin Oct 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ question answering, summarization, and contradiction detection.
- [Installation](#installation)
- [CLI Usage](#cli-usage)
- [Bundled Settings](#bundled-settings)
- [Rate Limits](#rate-limits)
- [Library Usage](#library-usage)
- [`ask` manually](#ask-manually)
- [Adding Documents Manually](#adding-documents-manually)
Expand Down Expand Up @@ -250,6 +251,39 @@ Inside [`paperqa/configs`](paperqa/configs) we bundle known useful settings:
| wikicrow | Setting to emulate the Wikipedia article writing used in our WikiCrow publication. |
| contracrow | Setting to find contradictions in papers, your query should be a claim that needs to be flagged as a contradiction (or not). |
| debug | Setting useful solely for debugging, but not in any actual application beyond debugging. |
| tier1_limits | Settings that match OpenAI rate limits for each tier, you can use `tier<1-5>_limits` to specify the tier. |

### Rate Limits

If you are hitting rate limits, say with the OpenAI Tier 1 plan, you can add them into PaperQA2.
For each OpenAI tier, a pre-built setting exists to limit usage.

```bash
pqa --settings 'tier1_limits' ask 'Are there nm scale features in thermoelectric materials?'
```

This will limit your system to use the [tier1_limits](paperqa/config/tier1_limits.json),
and slow down your queries to accommodate.

You can also specify them manually with any rate limit string that matches the specification in the [limits](https://limits.readthedocs.io/en/stable/quickstart.html#rate-limit-string-notation) module:

```bash
pqa --summary_llm_config '{"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}}' ask 'Are there nm scale features in thermoelectric materials?'
```

Or by adding into a `Settings` object, if calling imperatively:

```python
from paperqa import Settings, ask

answer = ask(
"What manufacturing challenges are unique to bispecific antibodies?",
settings=Settings(
llm_config={"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}},
mskarlin marked this conversation as resolved.
Show resolved Hide resolved
summary_llm_config={"rate_limit": {"gpt-4o-2024-08-06": "30000 per 1 minute"}},
),
)
```

## Library Usage

Expand Down
3 changes: 3 additions & 0 deletions paperqa/configs/debug.json
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,8 @@
"parsing": {
"use_doc_details": false,
"defer_embedding": true
},
"prompts": {
"use_json": false
}
}
6 changes: 6 additions & 0 deletions paperqa/configs/fast.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,11 @@
},
"parsing": {
"use_doc_details": false
},
"prompts": {
"use_json": false
},
"agent": {
"agent_type": "fake"
}
}
6 changes: 0 additions & 6 deletions paperqa/configs/high_quality.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,5 @@
"use_doc_details": true,
"chunk_size": 7000,
"overlap": 250
},
"prompts": {
"use_json": true
},
"agent": {
"agent_type": "ToolSelector"
}
}
53 changes: 53 additions & 0 deletions paperqa/configs/tier1_limits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
{
"answer": {
"evidence_k": 5,
"evidence_detailed_citations": false,
"evidence_summary_length": "25 to 50 words",
"answer_max_sources": 3,
"answer_length": "50 to 100 words",
"max_concurrent_requests": 5
},
"parsing": {
"use_doc_details": false
},
"prompts": {
"use_json": true
},
"llm_config": {
"rate_limit": {
"gpt-4o": "30000 per 1 minute",
"gpt-4o-2024-08-06": "30000 per 1 minute",
"gpt-4o-2024-05-13": "30000 per 1 minute",
"gpt-4o-mini": "200000 per 1 minute",
"gpt-4o-mini-2024-07-18": "200000 per 1 minute",
"gpt-4-turbo": "30000 per 1 minute",
"gpt-4-turbo-2024-04-09": "30000 per 1 minute",
"gpt-4-0613": "10000 per 1 minute",
"gpt-4-0314": "10000 per 1 minute",
"gpt-4": "10000 per 1 minute",
"gpt-3.5-turbo-0125": "200000 per 1 minute",
"gpt-3.5-turbo": "200000 per 1 minute",
"gpt-3.5-turbo-1106": "200000 per 1 minute"
}
},
"summary_llm_config": {
"rate_limit": {
"gpt-4o": "30000 per 1 minute",
"gpt-4o-2024-08-06": "30000 per 1 minute",
"gpt-4o-2024-05-13": "30000 per 1 minute",
"gpt-4o-mini": "200000 per 1 minute",
"gpt-4o-mini-2024-07-18": "200000 per 1 minute",
"gpt-4-turbo": "30000 per 1 minute",
"gpt-4-turbo-2024-04-09": "30000 per 1 minute",
"gpt-4-0613": "10000 per 1 minute",
"gpt-4-0314": "10000 per 1 minute",
"gpt-4": "10000 per 1 minute",
"gpt-3.5-turbo-0125": "200000 per 1 minute",
"gpt-3.5-turbo": "200000 per 1 minute",
"gpt-3.5-turbo-1106": "200000 per 1 minute"
}
},
"embedding_config": {
"rate_limit": "1000000 per 1 minute"
}
}
52 changes: 52 additions & 0 deletions paperqa/configs/tier2_limits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"answer": {
"evidence_k": 8,
"answer_max_sources": 3,
"max_concurrent_requests": 8
},
"parsing": {
"use_doc_details": true,
"chunk_size": 7000,
"overlap": 250
},
"prompts": {
"use_json": true
},
"llm_config": {
"rate_limit": {
"gpt-4o": "450000 per 1 minute",
"gpt-4o-2024-08-06": "450000 per 1 minute",
"gpt-4o-2024-05-13": "450000 per 1 minute",
"gpt-4o-mini": "2000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "2000000 per 1 minute",
"gpt-4-turbo": "450000 per 1 minute",
"gpt-4-turbo-2024-04-09": "450000 per 1 minute",
"gpt-4-0613": "40000 per 1 minute",
"gpt-4-0314": "40000 per 1 minute",
"gpt-4": "40000 per 1 minute",
"gpt-3.5-turbo-0125": "2000000 per 1 minute",
"gpt-3.5-turbo": "2000000 per 1 minute",
"gpt-3.5-turbo-1106": "2000000 per 1 minute"
}
},
"summary_llm_config": {
"rate_limit": {
"gpt-4o": "450000 per 1 minute",
"gpt-4o-2024-08-06": "450000 per 1 minute",
"gpt-4o-2024-05-13": "450000 per 1 minute",
"gpt-4o-mini": "2000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "2000000 per 1 minute",
"gpt-4-turbo": "450000 per 1 minute",
"gpt-4-turbo-2024-04-09": "450000 per 1 minute",
"gpt-4-0613": "40000 per 1 minute",
"gpt-4-0314": "40000 per 1 minute",
"gpt-4": "40000 per 1 minute",
"gpt-3.5-turbo-0125": "2000000 per 1 minute",
"gpt-3.5-turbo": "2000000 per 1 minute",
"gpt-3.5-turbo-1106": "2000000 per 1 minute"
}
},
"embedding_config": {
"rate_limit": "1000000 per 1 minute"
}
}
52 changes: 52 additions & 0 deletions paperqa/configs/tier3_limits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"answer": {
"evidence_k": 8,
"answer_max_sources": 3,
"max_concurrent_requests": 8
},
"parsing": {
"use_doc_details": true,
"chunk_size": 7000,
"overlap": 250
},
"prompts": {
"use_json": true
},
"llm_config": {
"rate_limit": {
"gpt-4o": "800000 per 1 minute",
"gpt-4o-2024-08-06": "800000 per 1 minute",
"gpt-4o-2024-05-13": "800000 per 1 minute",
"gpt-4o-mini": "4000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "4000000 per 1 minute",
"gpt-4-turbo": "600000 per 1 minute",
"gpt-4-turbo-2024-04-09": "600000 per 1 minute",
"gpt-4-0613": "80000 per 1 minute",
"gpt-4-0314": "80000 per 1 minute",
"gpt-4": "80000 per 1 minute",
"gpt-3.5-turbo-0125": "4000000 per 1 minute",
"gpt-3.5-turbo": "4000000 per 1 minute",
"gpt-3.5-turbo-1106": "4000000 per 1 minute"
}
},
"summary_llm_config": {
"rate_limit": {
"gpt-4o": "800000 per 1 minute",
"gpt-4o-2024-08-06": "800000 per 1 minute",
"gpt-4o-2024-05-13": "800000 per 1 minute",
"gpt-4o-mini": "4000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "4000000 per 1 minute",
"gpt-4-turbo": "600000 per 1 minute",
"gpt-4-turbo-2024-04-09": "600000 per 1 minute",
"gpt-4-0613": "80000 per 1 minute",
"gpt-4-0314": "80000 per 1 minute",
"gpt-4": "80000 per 1 minute",
"gpt-3.5-turbo-0125": "4000000 per 1 minute",
"gpt-3.5-turbo": "4000000 per 1 minute",
"gpt-3.5-turbo-1106": "4000000 per 1 minute"
}
},
"embedding_config": {
"rate_limit": "5000000 per 1 minute"
}
}
52 changes: 52 additions & 0 deletions paperqa/configs/tier4_limits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"answer": {
"evidence_k": 10,
"answer_max_sources": 5,
"max_concurrent_requests": 8
},
"parsing": {
"use_doc_details": true,
"chunk_size": 7000,
"overlap": 250
},
"prompts": {
"use_json": true
},
"llm_config": {
"rate_limit": {
"gpt-4o": "2000000 per 1 minute",
"gpt-4o-2024-08-06": "2000000 per 1 minute",
"gpt-4o-2024-05-13": "2000000 per 1 minute",
"gpt-4o-mini": "10000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "10000000 per 1 minute",
"gpt-4-turbo": "800000 per 1 minute",
"gpt-4-turbo-2024-04-09": "800000 per 1 minute",
"gpt-4-0613": "300000 per 1 minute",
"gpt-4-0314": "300000 per 1 minute",
"gpt-4": "300000 per 1 minute",
"gpt-3.5-turbo-0125": "10000000 per 1 minute",
"gpt-3.5-turbo": "10000000 per 1 minute",
"gpt-3.5-turbo-1106": "10000000 per 1 minute"
}
},
"summary_llm_config": {
"rate_limit": {
"gpt-4o": "2000000 per 1 minute",
"gpt-4o-2024-08-06": "2000000 per 1 minute",
"gpt-4o-2024-05-13": "2000000 per 1 minute",
"gpt-4o-mini": "10000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "10000000 per 1 minute",
"gpt-4-turbo": "800000 per 1 minute",
"gpt-4-turbo-2024-04-09": "800000 per 1 minute",
"gpt-4-0613": "300000 per 1 minute",
"gpt-4-0314": "300000 per 1 minute",
"gpt-4": "300000 per 1 minute",
"gpt-3.5-turbo-0125": "10000000 per 1 minute",
"gpt-3.5-turbo": "10000000 per 1 minute",
"gpt-3.5-turbo-1106": "10000000 per 1 minute"
}
},
"embedding_config": {
"rate_limit": "5000000 per 1 minute"
}
}
52 changes: 52 additions & 0 deletions paperqa/configs/tier5_limits.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"answer": {
"evidence_k": 15,
"answer_max_sources": 5,
"max_concurrent_requests": 8
},
"parsing": {
"use_doc_details": true,
"chunk_size": 7000,
"overlap": 250
},
"prompts": {
"use_json": true
},
"llm_config": {
"rate_limit": {
"gpt-4o": "30000000 per 1 minute",
"gpt-4o-2024-08-06": "30000000 per 1 minute",
"gpt-4o-2024-05-13": "30000000 per 1 minute",
"gpt-4o-mini": "150000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "150000000 per 1 minute",
"gpt-4-turbo": "2000000 per 1 minute",
"gpt-4-turbo-2024-04-09": "2000000 per 1 minute",
"gpt-4-0613": "1000000 per 1 minute",
"gpt-4-0314": "1000000 per 1 minute",
"gpt-4": "1000000 per 1 minute",
"gpt-3.5-turbo-0125": "50000000 per 1 minute",
"gpt-3.5-turbo": "50000000 per 1 minute",
"gpt-3.5-turbo-1106": "50000000 per 1 minute"
}
},
"summary_llm_config": {
"rate_limit": {
"gpt-4o": "30000000 per 1 minute",
"gpt-4o-2024-08-06": "30000000 per 1 minute",
"gpt-4o-2024-05-13": "30000000 per 1 minute",
"gpt-4o-mini": "150000000 per 1 minute",
"gpt-4o-mini-2024-07-18": "150000000 per 1 minute",
"gpt-4-turbo": "2000000 per 1 minute",
"gpt-4-turbo-2024-04-09": "2000000 per 1 minute",
"gpt-4-0613": "1000000 per 1 minute",
"gpt-4-0314": "1000000 per 1 minute",
"gpt-4": "1000000 per 1 minute",
"gpt-3.5-turbo-0125": "50000000 per 1 minute",
"gpt-3.5-turbo": "50000000 per 1 minute",
"gpt-3.5-turbo-1106": "50000000 per 1 minute"
}
},
"embedding_config": {
"rate_limit": "10000000 per 1 minute"
}
}
11 changes: 9 additions & 2 deletions paperqa/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,15 @@ def replace_newlines(match: re.Match) -> str:
# https://regex101.com/r/VFcDmB/1
pattern = r'"(?:[^"\\]|\\.)*"'
text = re.sub(pattern, replace_newlines, text)

return json.loads(text)
try:
return json.loads(text)
except json.JSONDecodeError as e:
raise ValueError(
"Failed to parse JSON. Your model may not "
"be capable of supporting JSON output. Try "
"a different model or with "
"`Settings(prompts={'use_json': False})`"
) from e


async def map_fxn_summary(
Expand Down
Loading