Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on response format failure #266

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

RyanMarten
Copy link
Contributor

Fixes #86

@RyanMarten RyanMarten force-pushed the ryanm/retry-on-response-format-failure branch from a3990f7 to 25bc880 Compare December 16, 2024 23:18
@RyanMarten
Copy link
Contributor Author

RyanMarten commented Dec 16, 2024

As discussed in #86, using strict: true could help. This argument make sense to expose through the context manager that Mahesh was working on.
It only an available argument for some models (subset of openai models). Should we verify ourselves with nice errors about supported param? Litellm does that. Or do we have a more sophisticated retry logic (based off tenacity - as Mahesh suggested and noted in #261 ) that doesn't retry when the response code says the request shape is not value.

https://github.com/openai/openai-python/blob/89e5a010b22136f0f16eff5ec33105f4d3e273eb/src/openai/lib/_pydantic.py#L50

@RyanMarten
Copy link
Contributor Author

Looking for examples for a minimal reproduction. @neginraoof can maybe provide.

In https://rwilinski.ai/posts/benchmarking-llms-for-structured-json-generation/ the gpt-4o-mini gets 100% on all the attempts.

@RyanMarten
Copy link
Contributor Author

Had claude help come up with an example

Which the model struggles at creating a valid response to

from bespokelabs.curator import LLM
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List, Dict, Optional
from enum import Enum
from datetime import datetime

# Complex dataset with tricky prompts
dataset = Dataset.from_dict({
    "prompt": [
        """Analyze this fictional company's performance and provide a detailed breakdown:
        TechCorp Industries faced challenges in Q3 2023. Their cloud division struggled 
        while IoT solutions showed promise. Customer satisfaction varied by region."""
    ]
})

class SentimentLevel(str, Enum):
    VERY_NEGATIVE = "very_negative"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"
    POSITIVE = "positive"
    VERY_POSITIVE = "very_positive"

class MetricScore(BaseModel):
    value: float
    confidence_interval: List[float] = Field(min_items=2, max_items=2)
    reliability_score: float = Field(ge=0, le=1)
    last_updated: datetime

class DepartmentAnalysis(BaseModel):
    department_name: str
    revenue_change: float
    key_metrics: Dict[str, MetricScore]
    risk_factors: List[Dict[str, float]]
    sentiment: SentimentLevel
    regional_breakdown: Dict[str, Dict[str, float]]
    future_projections: List[Dict[str, Optional[float]]]

class CompanyAnalysis(BaseModel):
    company_overview: Dict[str, str]
    financial_health: float = Field(ge=-1, le=1)
    departments: List[DepartmentAnalysis]
    market_position: Dict[str, List[float]]
    competitor_comparison: Dict[str, Dict[str, float]]
    risk_assessment: List[Dict[str, Optional[float]]]
    timestamp: datetime
    confidence_score: float = Field(ge=0, le=1)

prompter = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="gpt-4o-mini",
    response_format=CompanyAnalysis,
)

dataset = prompter(dataset)
print(dataset.to_pandas())

@RyanMarten RyanMarten requested a review from madiator December 17, 2024 00:23
Comment on lines 475 to 478
# Allows us to retry on responses that don't match the response format
self.convert_response_to_response_format(
generic_response.response_message, self.prompt_formatter.response_format
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns a dictionary, we doni't need it?

self, response_message: str | dict, response_format: Optional[BaseModel]
) -> Optional[dict | str]:
"""
Converts a response message to a specified Pydantic model format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit lost on the motivation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function we use anyways, I just moved it out so we don't duplicate code

We need to convert to the correct response format from the string that model returns

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry.
Also, does it make sense to put this in prompt_formatter?
So self.prompt_formatter.convert_response_to_response_format(response_message).

self, response_message: str | dict, response_format: Optional[BaseModel]
) -> Optional[dict | str]:
"""
Converts a response message to a specified Pydantic model format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry.
Also, does it make sense to put this in prompt_formatter?
So self.prompt_formatter.convert_response_to_response_format(response_message).

@RyanMarten RyanMarten merged commit c99978f into dev Dec 17, 2024
2 checks passed
@RyanMarten RyanMarten deleted the ryanm/retry-on-response-format-failure branch December 17, 2024 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants