Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎅 I WISH LITELLM HAD... #361

Open
krrishdholakia opened this issue Sep 13, 2023 · 238 comments
Open

🎅 I WISH LITELLM HAD... #361

krrishdholakia opened this issue Sep 13, 2023 · 238 comments

Comments

@krrishdholakia
Copy link
Contributor

krrishdholakia commented Sep 13, 2023

This is a ticket to track a wishlist of items you wish LiteLLM had.

COMMENT BELOW 👇

With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs

Respond with ❤️ to any request you would also like to see

P.S.: Come say hi 👋 on the Discord

@krrishdholakia krrishdholakia pinned this issue Sep 13, 2023
@krrishdholakia
Copy link
Contributor Author

[LiteLLM Client] Add new models via UI

Thinking aloud it seems intuitive that you'd be able to add new models / remap completion calls to different models via UI. Unsure on real problem though.

@krrishdholakia
Copy link
Contributor Author

User / API Access Management

Different users have access to different models. It'd be helpful if there was a way to maybe leverage the BudgetManager to gate access. E.g. GPT-4 is expensive, i don't want to expose that to my free users but i do want my paid users to be able to use it.

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 13, 2023

cc: @yujonglee @WilliamEspegren @zakhar-kogan @ishaan-jaff @PhucTranThanh feel free to add any requests / ideas here.

@ishaan-jaff
Copy link
Contributor

ishaan-jaff commented Sep 13, 2023

[Spend Dashboard] View analytics for spend per llm and per user

  • This allows me to see what my most expensive llms are and what users are using litellm heavily

@ishaan-jaff
Copy link
Contributor

Auto select the best LLM for a given task

If it's a simple task like responding to "hello" litlellm should auto-select a cheaper but faster llm like j2-light

@Pipboyguy
Copy link

Integration with NLP Cloud

@krrishdholakia
Copy link
Contributor Author

That's awesome @Pipboyguy - dm'ing on linkedin to learn more!

@krrishdholakia krrishdholakia changed the title LiteLLM Wishlist 🎅 I WISH LITELLM ADDED... Sep 14, 2023
@krrishdholakia krrishdholakia changed the title 🎅 I WISH LITELLM ADDED... 🎅 I WISH LITELLM HAD... Sep 14, 2023
@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 14, 2023

@ishaan-jaff check out this truncate param in the cohere api

This looks super interesting. Similar to your token trimmer. If the prompt exceeds context window, trim in a particular manner.
Screenshot 2023-09-14 at 10 54 50 AM

I would maybe only run trimming on user/assistant messages. Not touch the system prompt (works for RAG scenarios as well).

@haseeb-heaven
Copy link
Contributor

Option to use Inference API so we can use any model from Hugging Face 🤗

@krrishdholakia
Copy link
Contributor Author

krrishdholakia commented Sep 17, 2023

@haseeb-heaven you can already do this -

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

@haseeb-heaven
Copy link
Contributor

@haseeb-heaven you can already do this -

completion_url = f"https://api-inference.huggingface.co/models/{model}"

from litellm import completion 
response = completion(model="huggingface/gpt2", messages=[{"role": "user", "content": "Hey, how's it going?"}])
print(response) 

Wow great thanks its working. Nice feature

@smig23
Copy link

smig23 commented Sep 18, 2023

Support for inferencing using models hosted on Petals swarms (https://github.com/bigscience-workshop/petals), both public and private.

@ishaan-jaff
Copy link
Contributor

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

@shauryr
Copy link
Contributor

shauryr commented Sep 18, 2023

finetuning wrapper for openai, huggingface etc.

@krrishdholakia
Copy link
Contributor Author

@shauryr i created an issue to track this - feel free to add any missing details here

@smig23
Copy link

smig23 commented Sep 18, 2023

@smig23 what are you trying to use petals for ? We found it to be quite unstable and it would not consistently pass our tests

Specifically for my aims, I'm running a private swarm as a experiment with a view to implementing with in private organization, who have idle GPU resources, but it's distributed. The initial target would be inferencing and if litellm was able to be the abstraction layer, it would allow flexibility to go another direction with hosting in the future.

@ranjancse26
Copy link

I wish the litellm to have a direct support for finetuning the model. Based on the below blog post, I understand that in order to fine tune, one needs to have a specific understanding on the LLM provider and then follow their instructions or library for fine tuning the model. Why not the LiteLLM do all the abstraction and handle the fine-tuning aspects as well?

https://docs.litellm.ai/docs/tutorials/finetuned_chat_gpt
https://platform.openai.com/docs/guides/fine-tuning/preparing-your-dataset

@ranjancse26
Copy link

I wish LiteLLM has a support for open-source embeddings like sentence-transformers, hkunlp/instructor-large etc.

Sorry, based on the below documentation, it seems there's only support for the Open AI embedding.

https://docs.litellm.ai/docs/embedding/supported_embedding

@ranjancse26
Copy link

I wish LiteLLM has the integration to cerebrium platform. Please check the below link for the prebuilt-models.

https://docs.cerebrium.ai/cerebrium/prebuilt-models

@ishaan-jaff
Copy link
Contributor

@ranjancse26 what models on cerebrium do you want to use with LiteLLM ?

@ranjancse26
Copy link

@ishaan-jaff The cerebrium has got a lot of pre-built model. The focus should be on consuming the open-source models first ex: Lama 2, GPT4All, Falcon, FlanT5 etc. I am mentioning this as a first step. However, it's a good idea to have the Litellm take care of the internal communication with the custom-built models too. In-turn based on the API which the cerebrium is exposing.

image

@ishaan-jaff
Copy link
Contributor

@smig23 We've added support for petals to LiteLLM https://docs.litellm.ai/docs/providers/petals

@ranjancse26
Copy link

I wish Litellm has a built-in support for the majority of the provider operations than targeting the text generation alone. Consider an example of Cohere, the below one allows users to have conversations with a Large Language Model (LLM) from Cohere.

https://docs.cohere.com/reference/post_chat

@ranjancse26
Copy link

I wish Litellm has a ton of support and examples for users to develop apps with RAG pattern. It's kind of mandatory to go with the standard best practices and we all wish to have the same support.

@ranjancse26
Copy link

I wish Litellm has use-case driven examples for beginners. Keeping in mind of the day-to-day use-cases, it's a good idea to come up with a great sample which covers the following aspects.

  • Text classification
  • Text summarization
  • Text translation
  • Text generation
  • Code generation

@ranjancse26
Copy link

I wish Litellm to support for various known or popular vector db's. Here are couple of them to begin with.

  • Pinecone
  • Qdrant
  • Weaviate
  • Milvus
  • DuckDB
  • Sqlite

@ranjancse26
Copy link

ranjancse26 commented Sep 21, 2023

I wish Litellm has a built-in support for performing the web-scrapping or to get the real-time data using known provider like serpapi. It will be helpful for users to build the custom AI models or integrate with the LLMs for performing the retrieval augmented based generation.

https://serpapi.com/blog/llms-vs-serpapi/#serpapi-google-local-results-parser
https://colab.research.google.com/drive/1Q9VvVzjZJja7_y2Ls8qBkE_NApbLiqly?usp=sharing

@pazevedo-hyland
Copy link

Embedding models on langchain. (Currently only Chat Interface exists)

@ivanbelenky
Copy link

I wish it had no dependencies apart from httpx and pydantic and that the arrows coming out of the hype train not intersect with each other

image

@dym-ok
Copy link

dym-ok commented Dec 11, 2024

I wish this beautiful library supported Bedrock Inference Profiles.

We use them to attribute costs.

@abourget
Copy link

I wish it had an abstraction to submit traces to its different logging backends like langfuse and friends,
I wish it was a receptor of OpenTelemetry data, and would repackage and forward to its backends.
Does that exist?

@brooksc
Copy link

brooksc commented Dec 14, 2024

Have you thought about adding a "meta model" option where a user could specify

  1. Here are all the "services" I have access to - e.g. openai, anthropic, aws, ollama, etc.
  2. I want a model that can do coding well, vision, classification, tool using, etc.
  3. I want to prioritize a model based on cost, speed, quality or multiple criteria in this order...

And litellm with everything it knows would just pick the best available model.

I see you have a json file with pricing and model capabilities.

I didn't see anything like this exists nor did gemini research find anything. https://g.co/gemini/share/e704a93c8938

This would require collecting data on all the benchmarks, e.g. how well each did on coding benchmarks vs, others to make a selection. You have the data on cost. I didn't check if you have tokens per second.

There is probably some memory required - e.g. validate that project X works on each of the models due to the variations in model execution. but once you do a "benchmark" pass to validate functionality against various tests, it becomes a preferred model selection when considering ther algorithm on which one to pick.

I'm asking about this is because it feels like one of the major taxes of setting up a new project or 3rd party/oss project is figuring out which model to use, optimize for cost, etc. Sometimes I have a more powerful machine on my local network with ollama I want to use when it's available, other times use a cloud service or my local ollama.

I want that to all happen automagically... e.g. use AI to select the AI model

@brooksc
Copy link

brooksc commented Dec 14, 2024

Another suggestion.

I'm lazy, I don't want to read all of your docs to figure out the answer to what I want. I want to ask ChatGPT, Claude, Geimini, etc to get the answer for me. thing is they aren't very good at browsing your website yet.

one suggestion is to create a serialized version of the docs in a /llms.txt like https://llmstxt.org/ and I can just feed it this url. hopefully eventually they get smart enough to look for this if it exists.

For now I'll use https://uithub.com/BerriAI/litellm/tree/main/docs/my-website/docs?accept=text/html&maxTokens=50000&ext=md but this isn't well known and it may not contain what you want to prioritize in the index.

Ideally you'd also have links on your site off to "Ask ChatGPT about these docs" with a input box which then opens

https://chat.openai.com/?q=https%3A%2F%2Fdocs.litellm.ai%2Fllms.txt+yourquery&model=gpt-4o

sort of like the old google site search... hopefully we don't have to do that too long.

something would also enable is "I'm using litellm... analyze my code and look at llms.txt and see what other features I should consider leveraging"

@d4g
Copy link

d4g commented Dec 18, 2024

I wish I could enable citations for perplexity on litellm via the config.yaml so I would get citations in open-webui.
#6662

@krrishdholakia
Copy link
Contributor Author

@d4g we already return the perplexity citations. If there's a Param needed just add it under 'litellm_params'

@d4g
Copy link

d4g commented Dec 18, 2024

Where and how? In the yaml?

@krrishdholakia
Copy link
Contributor Author

just checked perplexity doc. no param needed, it should be returned automatically (see the 200 status code response) - https://docs.perplexity.ai/api-reference/chat-completions

For any provider-specific param, see here - https://docs.litellm.ai/docs/completion/provider_specific_params#proxy-usage

@githubuser16384
Copy link

I wish there was vision support for LLM providers that provide vision support through their official documentation. Case in point- Groq. Reference: https://console.groq.com/docs/vision

@krrishdholakia
Copy link
Contributor Author

@githubuser16384 litellm already supports vision on all models - https://docs.litellm.ai/docs/completion/vision

Created a ticket to add an example on groq docs for this.

@FireballDWF
Copy link

ui chat should render the output in markdown

@FireballDWF
Copy link

Admin UI chat added the model used either directly before or after the "Assistant" so that it's clear which model provided the given assistant output.

@ishaan-jaff
Copy link
Contributor

@FireballDWF - can you leave your additional feedback here #7440 ?

@databill86
Copy link

  1. Better support and documentation for ollama models (for examples the latest models such as https://ollama.com/library/qwen2.5, mistral-nemo, ...). Some results with these models are not the same when using litellm vs when using ollama directly. I tried to use the exact same params, but I'm not sure. Maybe it's due to the prompt templates.

  2. Better support and documentation for vllm. Basically same as ollama, the documentation is not clear how to use prompt formatting with the proxy or openai sdk, what are the latest models supported.

@RoryMB
Copy link

RoryMB commented Jan 2, 2025

ollama_chat vision support. (https://ollama.com/blog/llama3.2-vision)

Also, the litellm ollama docs say you recommend ollama_chat over ollama, which I strongly agree with, but then most of the examples (e.g. the main docs page) don't follow that recommendation.

@vishnu-dev
Copy link

Most companies disable API key based access as they deem it is not that secure. Instead, role based access control (RBAC) is enabled.
[REQUEST] I wish litellm has the ability to use Azure AD token provider for Azure OpenAI rather than just API keys.

@krrishdholakia
Copy link
Contributor Author

@vishnu-dev this is already supported - https://docs.litellm.ai/docs/providers/azure#authentication

@anmolbhatia05
Copy link

@krrishdholakia does litellm support making chat completion calls to finetuned mistral ai codestral models ?

@krrishdholakia
Copy link
Contributor Author

@anmolbhatia05 is it any different to this? https://docs.litellm.ai/docs/providers/codestral

if it's on vertex then it's here - https://docs.litellm.ai/docs/providers/vertex#mistral-api

@Abdelrahman1993
Copy link

Support for Amazon titan image generator

@Jflick58
Copy link

I would love for LiteLLM to offer a MCP proxy add-on.

As someone working on AI for a large enterprise, with multiple UI experiences for LLM workflows, I have began coalescing toward an architectural mode, of LiteLLM > MCP (or home built method for hosting tool servers independent of LLM or a particular frontend > Continue.dev, Custom Chat UI, Narrow focus systems, LLM integrations to legacy enterprise systems etc… I think LiteLLM could capitalize on the middle space given your lead on in the LLM abstraction space.

There are solutions such as this: https://github.com/acehoss/mcp-gateway but I would love to be able to leverage the key management and rate management capabilities that LiteLLM already has.

I would be open to wrenching on a PR if this is something that ya’ll feel would fit in the product vision of LiteLLM.

@krrishdholakia
Copy link
Contributor Author

Hi @Jflick58 a PR here with testing is welcome. i don't understand MCP well enough - but open to exploring how we can help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests