Performance issues with Vertex Gemini in createDataStreamResponse #5053

npredey · 2025-03-03T18:52:18Z

npredey
Mar 3, 2025

I am using the AI SDK to generate a chat response like this:

return createDataStreamResponse({
      execute: (dataStream) => {
        const result = streamText({
          model: myProvider.languageModel(selectedChatModel),
          system: systemPrompt({ selectedChatModel }),
          messages,
          maxSteps: 5,
          ...

However, even on a simple "hello", the gemini model takes anywhere from 6-12 seconds to respond. In the GCP Console, this takes < 30 ms. I understand there may be some latency, but is there anything that can be done to improve this performance? Other models I've tested locally have no performance issues.

Thank you!

lgrammel · 2025-03-04T08:22:59Z

lgrammel
Mar 4, 2025
Maintainer

@npredey does this happen for a specific gemini model or for all gemini models? is it the same when you use curl/fetch directly against the gemini api?

3 replies

npredey Mar 4, 2025
Author

Hi Lars, thank you for the response. It happens for almost all the gemini models I've used, even for the newest one gemini-2.0-flash-001. For some reason, gemini-1.5-pro-002 sometimes is more performant, which is surprising.

I thought it was my code, but I reproduced the same slowness using the example vertex template in this repository: https://github.com/vercel/ai/tree/main/examples/next-google-vertex. When I change the vertex prompt to be "hello" instead of "tell me a story", it takes anywhere from 6-12 seconds to return a simple response.

When I do a curl/fetch to the gemini API, the response returned is ~1s which includes latency. < 30ms from the GCP console.

lgrammel Mar 4, 2025
Maintainer

google vertex != gemini - those are different services. still not convinced it's the ai sdk code, that's pretty unlilkely

npredey Mar 4, 2025
Author

Yes, I know they're different services. I am referring to the gemini models being called from google vertex. This video is an example of what I'm referring to:

Google.Vertex.Performance.mp4

It takes 10 seconds, then 3 seconds, then 9 seconds. For reference, when I call the google vertex python sdk library locally, the result is returned right away.

If the underlying components outside of vertex gemini are the issue, then so be it. I was raising what I've seen by using it to see if there were something with the ai sdk implementation that may be different compared to other models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with Vertex Gemini in createDataStreamResponse #5053

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Performance issues with Vertex Gemini in createDataStreamResponse #5053

npredey Mar 3, 2025

Replies: 1 comment · 3 replies

lgrammel Mar 4, 2025 Maintainer

npredey Mar 4, 2025 Author

lgrammel Mar 4, 2025 Maintainer

npredey Mar 4, 2025 Author

npredey
Mar 3, 2025

Replies: 1 comment 3 replies

lgrammel
Mar 4, 2025
Maintainer

npredey Mar 4, 2025
Author

lgrammel Mar 4, 2025
Maintainer

npredey Mar 4, 2025
Author