Detect finish_reason that is not `stop` #261

RyanMarten · 2024-12-15T17:54:05Z

Gemini returns None messages when content_filter is the finish reason.

Other finish reasons like (maximum length) should be detected and dealt with. The default behavior should not be ignoring.

RyanMarten · 2024-12-15T18:07:32Z

Also related, retry logic based on the error encountered.

E.g. do we want to retry on content_filter? On hitting max tokens instead of stop token, etc.

Retry when structured output fails #86

Retry (throw an exception) on None message return

For retrying we can use https://tenacity.readthedocs.io/en/latest/

RyanMarten · 2024-12-15T18:13:59Z

litellm can access like this:
https://docs.litellm.ai/docs/completion/output

RyanMarten · 2024-12-15T18:22:05Z

litellm provides a map between all the finish reasons to standard openai set

https://github.com/BerriAI/litellm/blob/fd583e715ec1b183cade08ac29c1cbe73e5c94f6/litellm/litellm_core_utils/core_helpers.py#L18-L54

Do they process all output with this? Yes, it seems so based on running the script below on the dataset of refusals and comparing to the one that using google's genai python sdk

from litellm import acompletion
import asyncio
from datasets import load_dataset, Dataset

ds = Dataset.from_dict({"prompt": ["Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"] * 100})

async def make_requests():
    tasks = []
    for i in range(10):
        task = acompletion(
            model="gemini/gemini-1.5-flash", 
            messages=[{"role": "user", "content": "Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"}],
            safety_settings=[
                {
                    "category": "HARM_CATEGORY_HARASSMENT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_HATE_SPEECH",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                    "threshold": "BLOCK_NONE",
                },
            ]
        )
        tasks.append(task)
    
    responses = await asyncio.gather(*tasks)
    filtered = 0
    for response in responses:
        if (response.choices[0].finish_reason == "content_filter"):
            filtered += 1

    print(f"Content filtered {filtered} out of {len(responses)} requests")

RyanMarten · 2024-12-15T21:40:48Z

So what was actually going on is RECITATION which litellm maps to content filter (in the function I linked above).
https://github.com/BerriAI/litellm/blob/fd583e715ec1b183cade08ac29c1cbe73e5c94f6/litellm/litellm_core_utils/core_helpers.py#L42

You can see the RECITATION as the finish_reason if you use the google genai python sdk instead of litellm / curator

import asyncio
from datasets import load_dataset, Dataset
from google.generativeai.types import HarmCategory, HarmBlockThreshold
import google.generativeai as genai

ds = Dataset.from_dict({"prompt": ["Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?"] * 100})

async def make_requests():
    model = genai.GenerativeModel(model_name='gemini-1.5-flash')
    tasks = []
    for i in range(10):
        task = model.generate_content_async("Can you provide an AI-based solution to create a calculator class in Objective-C that can add two numbers?",
            safety_settings={
                HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
                HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
            }
        )
        tasks.append(task)
    filtered = 0
    responses = await asyncio.gather(*tasks)
    for response in responses:
        print(response)
        if (response._result.candidates[0].finish_reason == 4):
            filtered += 1

    print(f"Content filtered {filtered} out of {len(responses)} requests")

# Run the async function
asyncio.run(make_requests())

RyanMarten · 2024-12-16T18:01:23Z

Short term, I don't think there is actually any solution to turn off the RECITATION content filter.

Others are also seeing this filter randomly on prompts that seem innocuous:

[Bug]: Gemini appears to randomly block content as if it were violating some sort of hidden Content Filter BerriAI/litellm#2866
ValueError: Content has no parts. GoogleCloudPlatform/generative-ai#344 (comment)

They claim to have fixed this:

ValueError: Content has no parts. GoogleCloudPlatform/generative-ai#344 (comment)

It looks like this issue was primarily caused by the language preprocessor classifying certain prompts (especially those with many numbers or mixed languages) as unsupported, leading to the "recitation blocking"

But I'm still seeing the recitation blocking behave poorly.

This issue lists out some other discussions / resources

RECITATION finishReason Causing Content Generation Stops in Google Models gbaptista/gemini-ai#21

The official docs currently say
https://cloud.google.com/vertex-ai/docs/reference/rest/v1beta1/GenerateContentResponse#FinishReason

RECITATION The token generation stopped because of potential recitation.

More discussion:

Gemini API : The recitation problem is annoying google/generative-ai-docs#257

People are suggestion asking for > 1 completions for every request, but that is unnecessarily expensive.

We will just do a bunch of retries until we (hopefully) get a response that is not content filtered.

RyanMarten · 2024-12-16T21:19:15Z

It might be a good idea to add this as part of the generic_response object and have both litellm and openai write to that and handle this in the base request processor

RyanMarten · 2024-12-17T02:31:54Z

OpenAI finish reason options:
https://github.com/openai/openai-python/blob/e94d98e9bf97a5d2d02d79d58f2abdbab26ff2bd/src/openai/types/chat/chat_completion.py#L23

finish_reason: Literal["stop", "length", "tool_calls", "content_filter", "function_call"]

RyanMarten · 2024-12-17T03:39:11Z

Since anthropic response_format responds with finish reason tool_calls, we can't just retry when it is != 'stop'.

Fixed in

Retry only on "max_length" and "content_filter" finish reason #267

Also did a PR into litellm to update the function to latest types and make it clearer

Update map_finish_reason BerriAI/litellm#7264

RyanMarten mentioned this issue Dec 16, 2024

Retry on response format failure #266

Merged

RyanMarten mentioned this issue Dec 17, 2024

OnlineRequestProcessor V2 Megathread #204

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect finish_reason that is not `stop` #261

Detect finish_reason that is not `stop` #261

RyanMarten commented Dec 15, 2024

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 15, 2024

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 16, 2024 •

edited

Loading

RyanMarten commented Dec 16, 2024

RyanMarten commented Dec 17, 2024 •

edited

Loading

RyanMarten commented Dec 17, 2024

Detect finish_reason that is not stop #261

Detect finish_reason that is not stop #261

Comments

RyanMarten commented Dec 15, 2024

RyanMarten commented Dec 15, 2024 • edited Loading

RyanMarten commented Dec 15, 2024

RyanMarten commented Dec 15, 2024 • edited Loading

RyanMarten commented Dec 15, 2024 • edited Loading

RyanMarten commented Dec 16, 2024 • edited Loading

RyanMarten commented Dec 16, 2024

RyanMarten commented Dec 17, 2024 • edited Loading

RyanMarten commented Dec 17, 2024

Detect finish_reason that is not `stop` #261

Detect finish_reason that is not `stop` #261

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 15, 2024 •

edited

Loading

RyanMarten commented Dec 16, 2024 •

edited

Loading

RyanMarten commented Dec 17, 2024 •

edited

Loading