Using LiteLLM in Gradio App? #631

ZQ-Dev8 · 2023-10-17T23:17:35Z

ZQ-Dev8
Oct 17, 2023

Hello! Thank you for the awesome library. I just stumbled upon LiteLLM and I want to integrate it with existing projects to remove the overhead of dealing with prompt templates. I'm attempting to integrate LiteLLM with an existing gradio app but running into issues. Is there an existing example out there for reference?

Specifically, I am seeking to modify the inference() function you typically find in Gradio chatbot demos to include LiteLLM prompt formatting, e.g. I want to modify the for loop below to stream the response from LiteLLM, rather than the typical Gradio way:

def inference(message, history):
    try:
        flattened_history = [item for sublist in history for item in sublist]
        full_message = " ".join(flattened_history + [message])
        partial_message = ""
        for token in client.text_generation(
            full_message,
            max_new_tokens=512,
            temperature=.7,
            top_k=100,
            top_p=.9,
            repetition_penalty=1.18,
            stream=True):
            partial_message += token
            yield partial_message
    except Exception as e:
        # Print the exception to the console for debugging
        print("Exception encountered:", str(e))
        # Optionally, you can yield a message to the user
        yield f"An Error occured please 'Clear' the error and try your question again"

Answered by krrishdholakia

Oct 17, 2023

def inference(message, history):
    try:
        flattened_history = [item for sublist in history for item in sublist]
        full_message = " ".join(flattened_history + [message])
        messages = [{"role": "user", "content": full_message}]
        partial_message = ""
        for chunk in litellm.completion(
            model=<my-model>
            messages=messages,
            max_tokens=512,
            temperature=.7,
            top_p=.9,
            repetition_penalty=1.18,
            stream=True):
            partial_message += chunk
            yield partial_message
    except Exception as e:
        # Print the exception to the console for debugging
        print("Exceptio…

View full answer

krrishdholakia · 2023-10-17T23:29:58Z

krrishdholakia
Oct 17, 2023
Maintainer

Hey @dcruiz01 what's the issue you're facing?

0 replies

krrishdholakia · 2023-10-17T23:32:17Z

krrishdholakia
Oct 17, 2023
Maintainer

def inference(message, history):
    try:
        flattened_history = [item for sublist in history for item in sublist]
        full_message = " ".join(flattened_history + [message])
        messages = [{"role": "user", "content": full_message}]
        partial_message = ""
        for chunk in litellm.completion(
            model=<my-model>
            messages=messages,
            max_tokens=512,
            temperature=.7,
            top_p=.9,
            repetition_penalty=1.18,
            stream=True):
            partial_message += chunk
            yield partial_message
    except Exception as e:
        # Print the exception to the console for debugging
        print("Exception encountered:", str(e))
        # Optionally, you can yield a message to the user
        yield f"An Error occured please 'Clear' the error and try your question again"

10 replies

krrishdholakia Oct 18, 2023
Maintainer

@dcruiz01 that'd be great! Is your gradio app live? would love to check out the code/project

ZQ-Dev8 Oct 18, 2023
Author

Not live, as it's nothing fancy. Just a simple frontend to test my backend with (TGI server hosted on a separate machine). I know I'll be swapping models around in the server in the future as I experiment. Having LiteLLM integrated now means I don't have to figure out the new prompt format each time, which is awesome.

Given the gradio app is bare-bones, I figure it would make a good tutorial to put in the LiteLLM docs. I can submit a PR if you like, or just make a gist or something for your review...

krrishdholakia Oct 18, 2023
Maintainer

Either works - do you have a linkedin/twitter? happy to give a shoutout as well

ZQ-Dev8 Oct 18, 2023
Author

Just submitted PR with the tutorial. Happy to change/fix as needed. Included my twitter handle in there. Thanks @krrishdholakia!

flefevre Dec 19, 2024

Hello, I have tried your tutorial but it seems both gradio and litellm have evolved ( and that's a good thing)
Will you be able to share a new tutorial perhaps with a dedicated GitHub reposiry with a small example missing gradio and litellm?
Thanks for sharing at this Christmas period!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using LiteLLM in Gradio App? #631

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using LiteLLM in Gradio App? #631

ZQ-Dev8 Oct 17, 2023

Replies: 2 comments · 10 replies

krrishdholakia Oct 17, 2023 Maintainer

krrishdholakia Oct 17, 2023 Maintainer

krrishdholakia Oct 18, 2023 Maintainer

ZQ-Dev8 Oct 18, 2023 Author

krrishdholakia Oct 18, 2023 Maintainer

ZQ-Dev8 Oct 18, 2023 Author

flefevre Dec 19, 2024

ZQ-Dev8
Oct 17, 2023

Replies: 2 comments 10 replies

krrishdholakia
Oct 17, 2023
Maintainer

krrishdholakia
Oct 17, 2023
Maintainer

krrishdholakia Oct 18, 2023
Maintainer

ZQ-Dev8 Oct 18, 2023
Author

krrishdholakia Oct 18, 2023
Maintainer

ZQ-Dev8 Oct 18, 2023
Author