Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more model support with liteLLM #74

Closed
RyanMarten opened this issue Nov 12, 2024 · 10 comments · Fixed by #141
Closed

Add more model support with liteLLM #74

RyanMarten opened this issue Nov 12, 2024 · 10 comments · Fixed by #141
Assignees

Comments

@RyanMarten
Copy link
Contributor

Have default litellm OnlineRequestProcessor that is a catch-all for anything that we haven't optimized or implemented ourselves

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

#62 (comment)

https://docs.litellm.ai/docs/completion/batching#send-multiple-completion-calls-to-1-model
LiteLLM also supports messages being a list of lists, sending multiple completions
We should benchmark this vs doing threadpool over async completion with litellm

@RyanMarten
Copy link
Contributor Author

Also will want to think about if we want to / can accommodate the instruct + litellm integration:
https://docs.litellm.ai/docs/completion/input
for structured output.

@RyanMarten
Copy link
Contributor Author

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

Even outside of a full LiteLLMRequestProcessor, we can use their nice token counter
https://docs.litellm.ai/docs/completion/token_usage#3-token_counter

for rate limiting

And
https://docs.litellm.ai/docs/completion/token_usage#6-completion_cost
https://docs.litellm.ai/docs/completion/token_usage#8-model_cost
https://docs.litellm.ai/docs/completion/token_usage#9-register_model

For tracking cost

in OpenAIOnline/BatchRequestProcessor

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Nov 12, 2024

We should think about if we want to try to route more things towards our OpenAIOnlineBatchRequestProcessor and make that more general towards anything that has the same request and response format. - example of vllm
https://github.com/bespokelabsai/curator/pull/78/files

Or if we want to instead default heavily to litellm.

LiteLLM runs into the max 50 requests-per-second. Trung opened an issue here:
BerriAI/litellm#6592

@RyanMarten
Copy link
Contributor Author

https://docs.litellm.ai/docs/routing
This would be very good for getting tokens from a bunch of providers at the same time

@CharlieJCJ CharlieJCJ self-assigned this Nov 13, 2024
@CharlieJCJ
Copy link
Contributor

CharlieJCJ commented Nov 14, 2024

@CharlieJCJ
Copy link
Contributor

CharlieJCJ commented Nov 19, 2024

for documentation purpose, here's the native way to get response costs from litellm, using hidden_params
image

@CharlieJCJ
Copy link
Contributor

CharlieJCJ commented Nov 19, 2024

And example data schema from the response obj
ModelResponse(id='chatcmpl-a1084780-9c3f-4121-8c75-963e7b76854d', created=1731976691, model='claude-3-haiku-20240307', object='chat.completion', system_fingerprint=None, choices=[Choices(finish_reason='tool_calls', index=0, message=Message(content=None, role='assistant', tool_calls=[ChatCompletionMessageToolCall(index=0, function=Function(arguments='{"title": "Brazilian Feijoada", "ingredients": ["black beans", "pork shoulder", "smoked sausage", "bacon", "onion", "garlic", "bay leaves", "orange slices", "cilantro"], "instructions": ["1. Soak the black beans overnight.", "2. In a large pot, cook the pork shoulder, smoked sausage, and bacon until browned.", "3. Add the onion and garlic and cook until softened.", "4. Drain and rinse the black beans, then add them to the pot along with the bay leaves.", "5. Cover with water and simmer for 2-3 hours, until the beans are very soft.", "6. Season with salt and pepper to taste.", "7. Serve the feijoada hot, garnished with orange slices and chopped cilantro."]}', name='Recipe'), id='toolu_01MhCsMGVuA3uNZFrsDmiVWB', type='function')], function_call=None))], usage=CompletionUsage(completion_tokens=249, prompt_tokens=538, total_tokens=787, completion_tokens_details=None, prompt_tokens_details=None))
and this is the resp._hidden_params
{'additional_headers': {'x-ratelimit-limit-requests': '4000', 'x-ratelimit-remaining-requests': '3999', 'x-ratelimit-limit-tokens': '400000', 'x-ratelimit-remaining-tokens': '400000', 'llm_provider-date': 'Tue, 19 Nov 2024 00:38:51 GMT', 'llm_provider-content-type': 'application/json', 'llm_provider-transfer-encoding': 'chunked', 'llm_provider-connection': 'keep-alive', 'llm_provider-anthropic-ratelimit-requests-limit': '4000', 'llm_provider-anthropic-ratelimit-requests-remaining': '3999', 'llm_provider-anthropic-ratelimit-requests-reset': '2024-11-19T00:38:48Z', 'llm_provider-anthropic-ratelimit-tokens-limit': '400000', 'llm_provider-anthropic-ratelimit-tokens-remaining': '400000', 'llm_provider-anthropic-ratelimit-tokens-reset': '2024-11-19T00:38:51Z', 'llm_provider-request-id': 'req_01EajWVBZtXQ5B8wKAVkLmgo', 'llm_provider-via': '1.1 google', 'llm_provider-cf-cache-status': 'DYNAMIC', 'llm_provider-x-robots-tag': 'none', 'llm_provider-server': 'cloudflare', 'llm_provider-cf-ray': '8e4c23bb2dbe29d0-ORD', 'llm_provider-content-encoding': 'gzip', 'llm_provider-x-ratelimit-limit-requests': '4000', 'llm_provider-x-ratelimit-remaining-requests': '3999', 'llm_provider-x-ratelimit-limit-tokens': '400000', 'llm_provider-x-ratelimit-remaining-tokens': '400000'}, 'optional_params': {'tools': [{'name': 'Recipe', 'input_schema': {'properties': {'title': {'description': 'Title of the recipe', 'title': 'Title', 'type': 'string'}, 'ingredients': {'description': 'List of ingredients needed', 'items': {'type': 'string'}, 'title': 'Ingredients', 'type': 'array'}, 'instructions': {'description': 'Step by step cooking instructions', 'items': {'type': 'string'}, 'title': 'Instructions', 'type': 'array'}}, 'required': ['ingredients', 'instructions', 'title'], 'type': 'object'}, 'description': 'Correctly extractedRecipewith all the required parameters with correct types'}], 'tool_choice': {'type': 'tool', 'name': 'Recipe'}}, 'model_id': None, 'api_base': None, 'response_cost': 0.0005682500000000001}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants