Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect finish_reason that is not stop #261

Open
RyanMarten opened this issue Dec 15, 2024 · 8 comments
Open

Detect finish_reason that is not stop #261

RyanMarten opened this issue Dec 15, 2024 · 8 comments

Comments

@RyanMarten
Copy link
Contributor

Gemini returns None messages when content_filter is the finish reason.

Other finish reasons like (maximum length) should be detected and dealt with. The default behavior should not be ignoring.

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 15, 2024

Also related, retry logic based on the error encountered.

E.g. do we want to retry on content_filter? On hitting max tokens instead of stop token, etc.

Retry (throw an exception) on None message return

For retrying we can use https://tenacity.readthedocs.io/en/latest/

@RyanMarten
Copy link
Contributor Author

litellm can access like this:
https://docs.litellm.ai/docs/completion/output

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 15, 2024

litellm provides a map between all the finish reasons to standard openai set

https://github.com/BerriAI/litellm/blob/fd583e715ec1b183cade08ac29c1cbe73e5c94f6/litellm/litellm_core_utils/core_helpers.py#L18-L54

Do they process all output with this? Yes, it seems so based on running the script below on the dataset of refusals and comparing to the one that using google's genai python sdk

from litellm import acompletion
import asyncio
from datasets import load_dataset, Dataset

ds = Dataset.from_dict({"prompt": ["Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"] * 100})

async def make_requests():
    tasks = []
    for i in range(10):
        task = acompletion(
            model="gemini/gemini-1.5-flash", 
            messages=[{"role": "user", "content": "Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"}],
            safety_settings=[
                {
                    "category": "HARM_CATEGORY_HARASSMENT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_HATE_SPEECH",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                    "threshold": "BLOCK_NONE",
                },
            ]
        )
        tasks.append(task)
    
    responses = await asyncio.gather(*tasks)
    filtered = 0
    for response in responses:
        if (response.choices[0].finish_reason == "content_filter"):
            filtered += 1

    print(f"Content filtered {filtered} out of {len(responses)} requests")

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 15, 2024

So what was actually going on is RECITATION which litellm maps to content filter (in the function I linked above).
https://github.com/BerriAI/litellm/blob/fd583e715ec1b183cade08ac29c1cbe73e5c94f6/litellm/litellm_core_utils/core_helpers.py#L42

You can see the RECITATION as the finish_reason if you use the google genai python sdk instead of litellm / curator

import asyncio
from datasets import load_dataset, Dataset
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import google.generativeai as genai

ds = Dataset.from_dict({"prompt": ["Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"] * 100})

async def make_requests():
    model = genai.GenerativeModel(model_name='gemini-1.5-flash')
    tasks = []
    for i in range(10):
        task = model.generate_content_async("Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?",
            safety_settings={
                HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
            }
        )
        tasks.append(task)
    filtered = 0
    responses = await asyncio.gather(*tasks)
    for response in responses:
        print(response)
        if (response._result.candidates[0].finish_reason == 4):
            filtered += 1

    print(f"Content filtered {filtered} out of {len(responses)} requests")

# Run the async function
asyncio.run(make_requests())

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 16, 2024

Short term, I don't think there is actually any solution to turn off the RECITATION content filter.

Others are also seeing this filter randomly on prompts that seem innocuous:

They claim to have fixed this:

It looks like this issue was primarily caused by the language preprocessor classifying certain prompts (especially those with many numbers or mixed languages) as unsupported, leading to the "recitation blocking"

But I'm still seeing the recitation blocking behave poorly.

This issue lists out some other discussions / resources

The official docs currently say
https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/GenerateContentResponse#FinishReason

RECITATION The token generation stopped because of potential recitation.

More discussion:

People are suggestion asking for > 1 completions for every request, but that is unnecessarily expensive.

We will just do a bunch of retries until we (hopefully) get a response that is not content filtered.

@RyanMarten
Copy link
Contributor Author

It might be a good idea to add this as part of the generic_response object and have both litellm and openai write to that and handle this in the base request processor

@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 17, 2024

OpenAI finish reason options:
https://github.com/openai/openai-python/blob/e94d98e9bf97a5d2d02d79d58f2abdbab26ff2bd/src/openai/types/chat/chat_completion.py#L23

finish_reason: Literal["stop", "length", "tool_calls", "content_filter", "function_call"]

@RyanMarten
Copy link
Contributor Author

Since anthropic response_format responds with finish reason tool_calls, we can't just retry when it is != 'stop'.

Fixed in

Also did a PR into litellm to update the function to latest types and make it clearer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant