Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnlineRequestProcessor V2 Megathread #204

Open
1 of 4 tasks
CharlieJCJ opened this issue Dec 4, 2024 · 1 comment
Open
1 of 4 tasks

OnlineRequestProcessor V2 Megathread #204

CharlieJCJ opened this issue Dec 4, 2024 · 1 comment

Comments

@CharlieJCJ CharlieJCJ changed the title OnlineRequestProcessor V2 OnlineRequestProcessor V2 Megathread Dec 4, 2024
@RyanMarten
Copy link
Contributor

RyanMarten commented Dec 17, 2024

Other things I have been thinking about:

  • context manager
  • retries
  • generation config

These are all variables to have control over and it would be good to have nice clean interfaces and abstractions, instead of 100s of arguments to the LLM class.

Let's break down these variables by category. The abstraction of curator is to create a dataset based on the input dataset, mapping, and relevant generation config. The user doesn't really care about how these get turned into requests and how those get turned into responses. They just want that dataset created accurately and as quickly as possible.

Labeling ones that don't exist in bold

LLM map definition (these should change the hash)

  • prompt_func
  • parse_func
  • response_format

curator configuration (things users shouldn't have to care / know about - these shouldn't change the hash)

  • batch (this one does change the hash because we don't have resuming from online to batch or vis-versa done)
  • batch
  • batch_size
  • batch_check_interval
  • delete_successful_batch_files
  • delete_failed_batch_files
  • max_retries
  • require_all_responses (this is is also special where we don't really expect this to change during re-runs)
  • backend
  • max_requests_per_minute
  • max_tokens_per_minute
  • rate limit logic
  • retry logic

generation configuration (arguments that go to API - these should also change the hash)

  • model_name
  • response_format
  • temperature
  • top_p
  • presence_penalty
  • frequency_penalty
  • anything else you can possible pass in the json body of a request to a completions API
  • things like gemini safety filters
  • things like openai strict: true

More on retry logic: which types of exceptions to retry on and the behavior like exponential, etc. LiteLLM has exception classes we can use that exhibit exceptions with the request --> response part. For the raw http processor (openai online, maybe can read it into OpenAI / litellm objects to get same exceptions there). But there are also Curator exception classes to create (e.g. which finish reasons are ok). More thinking in #261. It would be good to have the curator exceptions also be used when resubmitting batches as discussed in #226.

More on rate limit logic: allowing for automatic rate limit detection as discussed in #233. also allowing for other types of rate limits like connection number as discussed in #253

More on context manager: proposed in #250 reverted in #254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants