Retry on response format failure #266

RyanMarten · 2024-12-16T23:17:17Z

Fixes #86

RyanMarten · 2024-12-16T23:24:13Z

As discussed in #86, using strict: true could help. This argument make sense to expose through the context manager that Mahesh was working on.
It only an available argument for some models (subset of openai models). Should we verify ourselves with nice errors about supported param? Litellm does that. Or do we have a more sophisticated retry logic (based off tenacity - as Mahesh suggested and noted in #261 ) that doesn't retry when the response code says the request shape is not value.

https://github.com/openai/openai-python/blob/89e5a010b22136f0f16eff5ec33105f4d3e273eb/src/openai/lib/_pydantic.py#L50

RyanMarten · 2024-12-17T00:07:50Z

Looking for examples for a minimal reproduction. @neginraoof can maybe provide.

In https://rwilinski.ai/posts/benchmarking-llms-for-structured-json-generation/ the gpt-4o-mini gets 100% on all the attempts.

RyanMarten · 2024-12-17T00:16:36Z

Had claude help come up with an example

Which the model struggles at creating a valid response to

from bespokelabs.curator import LLM
from datasets import Dataset
from pydantic import BaseModel, Field
from typing import List, Dict, Optional
from enum import Enum
from datetime import datetime

# Complex dataset with tricky prompts
dataset = Dataset.from_dict({
    "prompt": [
        """Analyze this fictional company's performance and provide a detailed breakdown:
        TechCorp Industries faced challenges in Q3 2023. Their cloud division struggled 
        while IoT solutions showed promise. Customer satisfaction varied by region."""
    ]
})

class SentimentLevel(str, Enum):
    VERY_NEGATIVE = "very_negative"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"
    POSITIVE = "positive"
    VERY_POSITIVE = "very_positive"

class MetricScore(BaseModel):
    value: float
    confidence_interval: List[float] = Field(min_items=2, max_items=2)
    reliability_score: float = Field(ge=0, le=1)
    last_updated: datetime

class DepartmentAnalysis(BaseModel):
    department_name: str
    revenue_change: float
    key_metrics: Dict[str, MetricScore]
    risk_factors: List[Dict[str, float]]
    sentiment: SentimentLevel
    regional_breakdown: Dict[str, Dict[str, float]]
    future_projections: List[Dict[str, Optional[float]]]

class CompanyAnalysis(BaseModel):
    company_overview: Dict[str, str]
    financial_health: float = Field(ge=-1, le=1)
    departments: List[DepartmentAnalysis]
    market_position: Dict[str, List[float]]
    competitor_comparison: Dict[str, Dict[str, float]]
    risk_assessment: List[Dict[str, Optional[float]]]
    timestamp: datetime
    confidence_score: float = Field(ge=0, le=1)

prompter = LLM(
    prompt_func=lambda row: row["prompt"],
    model_name="gpt-4o-mini",
    response_format=CompanyAnalysis,
)

dataset = prompter(dataset)
print(dataset.to_pandas())

madiator · 2024-12-17T02:54:21Z

src/bespokelabs/curator/request_processor/base_online_request_processor.py

+            # Allows us to retry on responses that don't match the response format
+            self.convert_response_to_response_format(
+                generic_response.response_message, self.prompt_formatter.response_format
+            )


It returns a dictionary, we doni't need it?

madiator · 2024-12-17T03:24:35Z

src/bespokelabs/curator/request_processor/base_request_processor.py

+        self, response_message: str | dict, response_format: Optional[BaseModel]
+    ) -> Optional[dict | str]:
+        """
+        Converts a response message to a specified Pydantic model format.


A bit lost on the motivation here.

this function we use anyways, I just moved it out so we don't duplicate code

We need to convert to the correct response format from the string that model returns

Ah sorry.
Also, does it make sense to put this in prompt_formatter?
So self.prompt_formatter.convert_response_to_response_format(response_message).

madiator · 2024-12-17T03:43:11Z

src/bespokelabs/curator/request_processor/base_request_processor.py

+        self, response_message: str | dict, response_format: Optional[BaseModel]
+    ) -> Optional[dict | str]:
+        """
+        Converts a response message to a specified Pydantic model format.


Ah sorry.
Also, does it make sense to put this in prompt_formatter?
So self.prompt_formatter.convert_response_to_response_format(response_message).

convert response to response format and throw

25bc880

RyanMarten force-pushed the ryanm/retry-on-response-format-failure branch from a3990f7 to 25bc880 Compare December 16, 2024 23:18

RyanMarten requested a review from madiator December 17, 2024 00:23

madiator reviewed Dec 17, 2024

View reviewed changes

madiator approved these changes Dec 17, 2024

View reviewed changes

move convert function to prompt formatter

1fe93f9

RyanMarten merged commit c99978f into dev Dec 17, 2024
2 checks passed

RyanMarten deleted the ryanm/retry-on-response-format-failure branch December 17, 2024 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry on response format failure #266

Retry on response format failure #266

RyanMarten commented Dec 16, 2024

RyanMarten commented Dec 16, 2024 •

edited

Loading

RyanMarten commented Dec 17, 2024

RyanMarten commented Dec 17, 2024

madiator Dec 17, 2024

madiator Dec 17, 2024

RyanMarten Dec 17, 2024

madiator Dec 17, 2024

madiator Dec 17, 2024

Retry on response format failure #266

Retry on response format failure #266

Conversation

RyanMarten commented Dec 16, 2024

RyanMarten commented Dec 16, 2024 • edited Loading

RyanMarten commented Dec 17, 2024

RyanMarten commented Dec 17, 2024

madiator Dec 17, 2024

Choose a reason for hiding this comment

madiator Dec 17, 2024

Choose a reason for hiding this comment

RyanMarten Dec 17, 2024

Choose a reason for hiding this comment

madiator Dec 17, 2024

Choose a reason for hiding this comment

madiator Dec 17, 2024

Choose a reason for hiding this comment

RyanMarten commented Dec 16, 2024 •

edited

Loading