Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LiteLLM+instructor (for structured output) backend for curator #141

Merged
merged 74 commits into from
Dec 4, 2024

Conversation

CharlieJCJ
Copy link
Contributor

@CharlieJCJ CharlieJCJ commented Nov 18, 2024

Closes: #74
Closes #179
Closes #164

Changes:

  • Included two example scripts, illustrating litellm usage in prompting and structured output modes.
    /examples/litellm_recipe_prompting.py
    /examples/litellm_recipe_structured_output.py (Note: need to put OpenAI and Anthropic keys in environment).
  • Included a backend parameter when using prompter code link, right now defaults to OpenAI.
  • Integrated with instructor for structure output support for a large coverage of models. For models that can't use litellm + instructor (structured output), used a try except block before dataset generation to check whether instructor works on a simple example code link.
  • Added time and cost logging (cost using litellm.completion_cost, if model cost is in the community-maintained mapping here)
  • Added estimate_total_tokens that includes estimate_output_tokens which derives from get_max_tokens that gets max output token of the specified model. code link
  • Using the same async, retry strategy as OpenAI Online Request Processor
  • Get litellm rate limit through hidden params dict, including x-ratelimit-limit-requests, and x-ratelimit-limit-tokens for rpm and tpm.
  • litellm refactoring base online request processor #188
    • Includes the breakdown of the new abstract class, and instructions for any new inheritance of OnlineRequestProcessor.
  • Implement a robust OpenAI online request processor's check_structured_output_support.

Future Works:

  • Support a baseline structured output strategy for models from inference platforms that do not support structured outputs from litellm. Right now, will directly prompt user that the current model doesn't support structured output. Note that litellm structured output model / provider coverage isn't good, link. Need additional research.
  • More performance optimization for auto rate limiting / retry strategies, need battle testing & experiments and comparisons.

Example curator-viewer's view:
image

Tested on the following models, all working for litellm + instructor structured output.

"claude-3-5-sonnet-20240620", # https://docs.litellm.ai/docs/providers/anthropic # anthropic has a different hidden param tokens structure. 
"claude-3-5-haiku-20241022",
"claude-3-haiku-20240307",
"claude-3-opus-20240229",
"claude-3-sonnet-20240229",
"gpt-4o-mini", # https://docs.litellm.ai/docs/providers/openai
"gpt-4o-2024-08-06",
"gpt-4-0125-preview",
"gpt-3.5-turbo-1106",
"gemini/gemini-1.5-flash", # https://docs.litellm.ai/docs/providers/gemini; https://ai.google.dev/gemini-api/docs/models # 20-30 iter/s
"gemini/gemini-1.5-pro", # 20-30 iter/s
"together_ai/meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", # https://docs.together.ai/docs/serverless-models
"together_ai/meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
"together_ai/mistralai/Mixtral-8x7B-Instruct-v0.1",

Note that the following models does not support structured output (i.e. response_format in Prompter)

# "together_ai/nvidia/Llama-3.1-Nemotron-70B-Instruct-HF", # instructor not supported
# "deepinfra/nvidia/Llama-3.1-Nemotron-70B-Instruct" # instructor not supported

@CharlieJCJ CharlieJCJ changed the title Curator 28 add a lite llm backend for curator Add LiteLLM backend for curator Nov 18, 2024
@CharlieJCJ CharlieJCJ changed the base branch from main to dev November 18, 2024 08:05
@CharlieJCJ CharlieJCJ changed the title Add LiteLLM backend for curator Add LiteLLM+instructor (for structured output) backend for curator Nov 18, 2024
@CharlieJCJ
Copy link
Contributor Author

Works for Claude model_name="claude-3-opus-20240229"

@CharlieJCJ
Copy link
Contributor Author

image image

@CharlieJCJ
Copy link
Contributor Author

CharlieJCJ commented Nov 20, 2024

After #149's merge (that resolves #145), do performance comparison of litellm vs. OpenAI request processors

@CharlieJCJ
Copy link
Contributor Author

CharlieJCJ commented Nov 21, 2024

#159 is been merged, now costs have been appropriately logged. litellm also supports cost logging now.

@CharlieJCJ
Copy link
Contributor Author

CharlieJCJ commented Nov 21, 2024

TODO

@CharlieJCJ
Copy link
Contributor Author

Need to add a better default timeout.

@CharlieJCJ

This comment was marked as outdated.

@CharlieJCJ

This comment was marked as resolved.

@CharlieJCJ

This comment was marked as resolved.

@CharlieJCJ
Copy link
Contributor Author

Request review / approval @RyanMarten @vutrung96

Copy link
Contributor

@RyanMarten RyanMarten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes - let me know when they are addressed and I'll do another review

@CharlieJCJ

This comment was marked as resolved.

@CharlieJCJ

This comment was marked as resolved.

Copy link
Contributor

@vutrung96 vutrung96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@CharlieJCJ CharlieJCJ merged commit 860b6b9 into dev Dec 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants