-
Notifications
You must be signed in to change notification settings - Fork 17.9k
openai[patch]: fix dropping response headers while streaming / Azure #31580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Skipped Deployment
|
CodSpeed Walltime Performance ReportMerging #31580 will not alter performanceComparing
|
CodSpeed Instrumentation Performance ReportMerging #31580 will not alter performanceComparing Summary
|
b0f6b9a
to
1c1dbc4
Compare
@baskaryan tagging for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this does with a test? I don't see any update to response metadata here:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="gpt-4.1-mini",
include_response_headers=True,
stream_usage=True,
)
full = None
for chunk in llm.stream("hello"):
full = chunk if full is None else full + chunk
full.response_metadata
{'headers': {'date': 'Fri, 20 Jun 2025 23:18:14 GMT',
'content-type': 'text/event-stream; charset=utf-8',
...},
'finish_reason': 'stop',
'model_name': 'gpt-4.1-mini-2025-04-14',
'system_fingerprint': '...',
'service_tier': 'default'}
An example with llm = AzureChatOpenAI(
**azure_kwargs,
model_name="gpt-4o-mini",
streaming=True,
stream_options={"include_usage": True},
include_response_headers=True,
)
full = None
for chunk in llm.stream("hello"):
full = chunk if full is None else full + chunk
print(full.response_metadata) Without the changes in this PR,
With the changes
|
include_response_headers=True
, the response headers are set asgeneration_info
in the first chunk here._convert_chunk_to_generation_chunk
dropsgeneration_info
because it returns early. This PR makes it so that the info is preserved.