Skip to content

openai[patch]: fix dropping response headers while streaming / Azure #31580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

joshy-deshaw
Copy link
Contributor

  • Currently, while streaming with OpenAI with include_response_headers=True, the response headers are set as generation_info in the first chunk here.
  • However, the _convert_chunk_to_generation_chunk drops generation_info because it returns early. This PR makes it so that the info is preserved.

Copy link

vercel bot commented Jun 12, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jun 12, 2025 1:19pm

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jun 12, 2025
Copy link

codspeed-hq bot commented Jun 12, 2025

CodSpeed Walltime Performance Report

Merging #31580 will not alter performance

Comparing joshy-deshaw:usage-metadata-fix (1c1dbc4) with master (446a9d5)

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched benchmarks

Copy link

codspeed-hq bot commented Jun 12, 2025

CodSpeed Instrumentation Performance Report

Merging #31580 will not alter performance

Comparing joshy-deshaw:usage-metadata-fix (1c1dbc4) with master (446a9d5)

Summary

✅ 13 untouched benchmarks

@joshy-deshaw
Copy link
Contributor Author

@baskaryan tagging for review

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what this does with a test? I don't see any update to response metadata here:

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-4.1-mini",
    include_response_headers=True,
    stream_usage=True,
)

full = None
for chunk in llm.stream("hello"):
    full = chunk if full is None else full + chunk

full.response_metadata
{'headers': {'date': 'Fri, 20 Jun 2025 23:18:14 GMT',
  'content-type': 'text/event-stream; charset=utf-8',
  ...},
 'finish_reason': 'stop',
 'model_name': 'gpt-4.1-mini-2025-04-14',
 'system_fingerprint': '...',
 'service_tier': 'default'}

@ccurme ccurme self-assigned this Jun 20, 2025
@joshy-deshaw
Copy link
Contributor Author

joshy-deshaw commented Jun 23, 2025

An example with AzureChatOpenAI below, though it seems like you're already getting these headers with ChatOpenAI

llm = AzureChatOpenAI(
    **azure_kwargs,
    model_name="gpt-4o-mini",
    streaming=True,
    stream_options={"include_usage": True},
    include_response_headers=True,
)
full = None
for chunk in llm.stream("hello"):
    full = chunk if full is None else full + chunk

print(full.response_metadata)

Without the changes in this PR,

{'finish_reason': 'stop', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': '...'}

With the changes

{
  "headers": {
    "transfer-encoding": "chunked",
    "content-type": "text/event-stream; charset=utf-8",
    "apim-request-id": "...",
    "strict-transport-security": "...",
    "x-content-type-options": "nosniff",
    "x-ms-region": "...",
    "x-ratelimit-remaining-requests": "19999",
    "x-ratelimit-limit-requests": "20000",
    "x-ratelimit-remaining-tokens": "1999998",
    "x-ratelimit-limit-tokens": "2000000",
    "azureml-model-session": "...",
    "cmp-upstream-response-duration": "126",
    "x-accel-buffering": "no",
    "x-ms-rai-invoked": "true",
    "x-request-id": "...",
    "x-ms-client-request-id": "Not-Set",
    "x-ms-deployment-name": "...",
    "date": "Mon, 23 Jun 2025 05:33:39 GMT"
  },
  "finish_reason": "stop",
  "model_name": "gpt-4o-mini-2024-07-18",
  "system_fingerprint": "..."
}

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 23, 2025
@ccurme ccurme changed the title openai: don't drop generation_info while streaming openai[patch]: fix dropping response headers while streaming / Azure Jun 23, 2025
@ccurme ccurme merged commit 8a0782c into langchain-ai:master Jun 23, 2025
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants