Add more model support with liteLLM #74

RyanMarten · 2024-11-12T22:53:00Z

Have default litellm OnlineRequestProcessor that is a catch-all for anything that we haven't optimized or implemented ourselves

RyanMarten · 2024-11-12T22:53:26Z

#62 (comment)

https://docs.litellm.ai/docs/completion/batching#send-multiple-completion-calls-to-1-model
LiteLLM also supports messages being a list of lists, sending multiple completions
We should benchmark this vs doing threadpool over async completion with litellm

RyanMarten · 2024-11-12T22:57:35Z

Also will want to think about if we want to / can accommodate the instruct + litellm integration:
https://docs.litellm.ai/docs/completion/input
for structured output.

RyanMarten · 2024-11-12T23:25:12Z

Other nice things we can use:

https://docs.litellm.ai/docs/#exception-handling

https://docs.litellm.ai/docs/#logging-observability---log-llm-inputoutput-docs

https://docs.litellm.ai/docs/completion/token_usage

RyanMarten · 2024-11-12T23:34:30Z

Even outside of a full LiteLLMRequestProcessor, we can use their nice token counter
https://docs.litellm.ai/docs/completion/token_usage#3-token_counter

for rate limiting

And
https://docs.litellm.ai/docs/completion/token_usage#6-completion_cost
https://docs.litellm.ai/docs/completion/token_usage#8-model_cost
https://docs.litellm.ai/docs/completion/token_usage#9-register_model

For tracking cost

in OpenAIOnline/BatchRequestProcessor

RyanMarten · 2024-11-12T23:42:30Z

We should think about if we want to try to route more things towards our OpenAIOnlineBatchRequestProcessor and make that more general towards anything that has the same request and response format. - example of vllm
https://github.com/bespokelabsai/curator/pull/78/files

Or if we want to instead default heavily to litellm.

LiteLLM runs into the max 50 requests-per-second. Trung opened an issue here:
BerriAI/litellm#6592

RyanMarten · 2024-11-13T02:57:38Z

https://docs.litellm.ai/docs/routing
This would be very good for getting tokens from a bunch of providers at the same time

CharlieJCJ · 2024-11-14T23:12:27Z

To respond to some of the links,

Looked at

https://docs.litellm.ai/docs/completion/token_usage#6-completion_cost
https://docs.litellm.ai/docs/completion/token_usage#8-model_cost
https://docs.litellm.ai/docs/completion/token_usage#9-register_model

Pretty actively maintained community maintained metadata https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json. I like it 👍🏼

CharlieJCJ · 2024-11-14T23:54:48Z

Tasks:

Basic LiteLLMOnlineRequestProcessor
- Use https://docs.litellm.ai/docs/completion/input, https://docs.litellm.ai/docs/#exception-handling, https://docs.litellm.ai/docs/completion/token_usage#8-model_cost, https://docs.litellm.ai/docs/completion/token_usage#9-register_model
Add more structured output model coverage via instructor - https://docs.litellm.ai/docs/tutorials/instructor
Think more on Support generation configuration for LLM #62
LiteLLM Batch inference - https://docs.litellm.ai/docs/completion/batching#send-multiple-completion-calls-to-1-model
LiteLLM Routing (*) - https://docs.litellm.ai/docs/routing

(*) = P2

I'll work in this sequence

CharlieJCJ · 2024-11-19T19:31:49Z

for documentation purpose, here's the native way to get response costs from litellm, using hidden_params

CharlieJCJ · 2024-11-19T19:33:22Z

And example data schema from the response obj
ModelResponse(id='chatcmpl-a1084780-9c3f-4121-8c75-963e7b76854d', created=1731976691, model='claude-3-haiku-20240307', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content=None, role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=0, function=Function(arguments='{"title": "Brazilian Feijoada", "ingredients": ["black beans", "pork shoulder", "smoked sausage", "bacon", "onion", "garlic", "bay leaves", "orange slices", "cilantro"], "instructions": ["1. Soak the black beans overnight.", "2. In a large pot, cook the pork shoulder, smoked sausage, and bacon until browned.", "3. Add the onion and garlic and cook until softened.", "4. Drain and rinse the black beans, then add them to the pot along with the bay leaves.", "5. Cover with water and simmer for 2-3 hours, until the beans are very soft.", "6. Season with salt and pepper to taste.", "7. Serve the feijoada hot, garnished with orange slices and chopped cilantro."]}', name='Recipe'), id='toolu_01MhCsMGVuA3uNZFrsDmiVWB', type='function')], function_call=None))], usage=CompletionUsage(completion_tokens=249, prompt_tokens=538, total_tokens=787, completion_tokens_details=None, prompt_tokens_details=None))
and this is the resp._hidden_params
{'additional_headers': {'x-ratelimit-limit-requests': '4000', 'x-ratelimit-remaining-requests': '3999', 'x-ratelimit-limit-tokens': '400000', 'x-ratelimit-remaining-tokens': '400000', 'llm_provider-date': 'Tue, 19 Nov 2024 00:38:51 GMT', 'llm_provider-content-type': 'application/json', 'llm_provider-transfer-encoding': 'chunked', 'llm_provider-connection': 'keep-alive', 'llm_provider-anthropic-ratelimit-requests-limit': '4000', 'llm_provider-anthropic-ratelimit-requests-remaining': '3999', 'llm_provider-anthropic-ratelimit-requests-reset': '2024-11-19T00:38:48Z', 'llm_provider-anthropic-ratelimit-tokens-limit': '400000', 'llm_provider-anthropic-ratelimit-tokens-remaining': '400000', 'llm_provider-anthropic-ratelimit-tokens-reset': '2024-11-19T00:38:51Z', 'llm_provider-request-id': 'req_01EajWVBZtXQ5B8wKAVkLmgo', 'llm_provider-via': '1.1 google', 'llm_provider-cf-cache-status': 'DYNAMIC', 'llm_provider-x-robots-tag': 'none', 'llm_provider-server': 'cloudflare', 'llm_provider-cf-ray': '8e4c23bb2dbe29d0-ORD', 'llm_provider-content-encoding': 'gzip', 'llm_provider-x-ratelimit-limit-requests': '4000', 'llm_provider-x-ratelimit-remaining-requests': '3999', 'llm_provider-x-ratelimit-limit-tokens': '400000', 'llm_provider-x-ratelimit-remaining-tokens': '400000'}, 'optional_params': {'tools': [{'name': 'Recipe', 'input_schema': {'properties': {'title': {'description': 'Title of the recipe', 'title': 'Title', 'type': 'string'}, 'ingredients': {'description': 'List of ingredients needed', 'items': {'type': 'string'}, 'title': 'Ingredients', 'type': 'array'}, 'instructions': {'description': 'Step by step cooking instructions', 'items': {'type': 'string'}, 'title': 'Instructions', 'type': 'array'}}, 'required': ['ingredients', 'instructions', 'title'], 'type': 'object'}, 'description': 'Correctly extractedRecipewith all the required parameters with correct types'}], 'tool_choice': {'type': 'tool', 'name': 'Recipe'}}, 'model_id': None, 'api_base': None, 'response_cost': 0.0005682500000000001}

RyanMarten assigned vutrung96 and RyanMarten Nov 12, 2024

RyanMarten mentioned this issue Nov 12, 2024

Support generation configuration for LLM #62

Open

RyanMarten mentioned this issue Nov 13, 2024

Add temperature and top-p #77

Merged

CharlieJCJ self-assigned this Nov 13, 2024

This was referenced Nov 18, 2024

Add LiteLLM+instructor (for structured output) backend for curator #141

Merged

[Generic Response Expose Metrics]: generation costs #147

Closed

This was referenced Nov 20, 2024

add cost and token logging in openai online and batching with litellm completion_cost #159

Merged

openai online & batch only supported models are gpt4o and gpt4o-mini #164

Closed

[litellm+instructor] same prompt, but different prompt tokens, why? #166

Open

CharlieJCJ linked a pull request Dec 4, 2024 that will close this issue

Add LiteLLM+instructor (for structured output) backend for curator #141

Merged

CharlieJCJ closed this as completed Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more model support with liteLLM #74

Add more model support with liteLLM #74

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024 •

edited

Loading

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024 •

edited

Loading

RyanMarten commented Nov 12, 2024 •

edited

Loading

RyanMarten commented Nov 13, 2024

CharlieJCJ commented Nov 14, 2024

CharlieJCJ commented Nov 14, 2024 •

edited

Loading

CharlieJCJ commented Nov 19, 2024 •

edited

Loading

CharlieJCJ commented Nov 19, 2024 •

edited

Loading

Add more model support with liteLLM #74

Add more model support with liteLLM #74

Comments

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024 • edited Loading

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024

RyanMarten commented Nov 12, 2024 • edited Loading

RyanMarten commented Nov 12, 2024 • edited Loading

RyanMarten commented Nov 13, 2024

CharlieJCJ commented Nov 14, 2024

CharlieJCJ commented Nov 14, 2024 • edited Loading

CharlieJCJ commented Nov 19, 2024 • edited Loading

CharlieJCJ commented Nov 19, 2024 • edited Loading

RyanMarten commented Nov 12, 2024 •

edited

Loading

RyanMarten commented Nov 12, 2024 •

edited

Loading

RyanMarten commented Nov 12, 2024 •

edited

Loading

CharlieJCJ commented Nov 14, 2024 •

edited

Loading

CharlieJCJ commented Nov 19, 2024 •

edited

Loading

CharlieJCJ commented Nov 19, 2024 •

edited

Loading