OnlineRequestProcessor V2 Megathread #204

CharlieJCJ · 2024-12-04T19:52:39Z

Capturing the action items after #141 comments.

Features:

[OnlineRequestProcessor Enhancement] Cooldown after rate limited #205
- Add LiteLLM+instructor (for structured output) backend for curator #141 (comment)
[OnlineRequestProcessor Enhancement] Better way to do output token estimation #206
- Add LiteLLM+instructor (for structured output) backend for curator #141 (comment)
[OnlineRequestProcessor Enhancement] implementation of has_capacity using hidden_params #207
- Add LiteLLM+instructor (for structured output) backend for curator #141 (comment)
Support generation configuration for LLM #62

The text was updated successfully, but these errors were encountered:

RyanMarten · 2024-12-17T04:51:49Z

Other things I have been thinking about:

context manager
retries
generation config

These are all variables to have control over and it would be good to have nice clean interfaces and abstractions, instead of 100s of arguments to the LLM class.

Let's break down these variables by category. The abstraction of curator is to create a dataset based on the input dataset, mapping, and relevant generation config. The user doesn't really care about how these get turned into requests and how those get turned into responses. They just want that dataset created accurately and as quickly as possible.

Labeling ones that don't exist in bold

LLM map definition (these should change the hash)

prompt_func
parse_func
response_format

curator configuration (things users shouldn't have to care / know about - these shouldn't change the hash)

batch (this one does change the hash because we don't have resuming from online to batch or vis-versa done)
batch
batch_size
batch_check_interval
delete_successful_batch_files
delete_failed_batch_files
max_retries
require_all_responses (this is is also special where we don't really expect this to change during re-runs)
backend
max_requests_per_minute
max_tokens_per_minute
rate limit logic
retry logic

generation configuration (arguments that go to API - these should also change the hash)

model_name
response_format
temperature
top_p
presence_penalty
frequency_penalty
anything else you can possible pass in the json body of a request to a completions API
things like gemini safety filters
things like openai strict: true

More on retry logic: which types of exceptions to retry on and the behavior like exponential, etc. LiteLLM has exception classes we can use that exhibit exceptions with the request --> response part. For the raw http processor (openai online, maybe can read it into OpenAI / litellm objects to get same exceptions there). But there are also Curator exception classes to create (e.g. which finish reasons are ok). More thinking in #261. It would be good to have the curator exceptions also be used when resubmitting batches as discussed in #226.

More on rate limit logic: allowing for automatic rate limit detection as discussed in #233. also allowing for other types of rate limits like connection number as discussed in #253

More on context manager: proposed in #250 reverted in #254

CharlieJCJ changed the title ~~OnlineRequestProcessor V2~~ OnlineRequestProcessor V2 Megathread Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OnlineRequestProcessor V2 Megathread #204

OnlineRequestProcessor V2 Megathread #204

CharlieJCJ commented Dec 4, 2024 •

edited by RyanMarten

Loading

RyanMarten commented Dec 17, 2024 •

edited

Loading

OnlineRequestProcessor V2 Megathread #204

OnlineRequestProcessor V2 Megathread #204

Comments

CharlieJCJ commented Dec 4, 2024 • edited by RyanMarten Loading

RyanMarten commented Dec 17, 2024 • edited Loading

CharlieJCJ commented Dec 4, 2024 •

edited by RyanMarten

Loading

RyanMarten commented Dec 17, 2024 •

edited

Loading