Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway error when processing non-200 model response #722

Open
mkrueger12 opened this issue Nov 1, 2024 · 3 comments
Open

Gateway error when processing non-200 model response #722

mkrueger12 opened this issue Nov 1, 2024 · 3 comments
Labels
bug Something isn't working triage

Comments

@mkrueger12
Copy link

mkrueger12 commented Nov 1, 2024

What Happened?

Issue Description

The gateway fails to properly parse error messages when accessing Claude models on GCP Vertex AI via the streaming endpoint. The issue occurs when the model returns a non-200 response.

Environment

  • Gateway Endpoint: /v1/chat/completions
  • Model: anthropic.claude-3-5-sonnet@20240620
  • Provider: GCP Vertex AI
  • Gateway is locally hosted

Configuration

{
  "strategy": {
    "mode": "loadbalance"
  },
  "targets": [
    {
      "provider": "vertex-ai",
      "vertex_project_id": "dev",
      "vertex_service_account_json": "SA",
      "vertex_region": "us-east5",
      "weight": 1
    }
  ]
}

Error Output

Gateway logs:

event: error
2024-11-01 11:43:41 ^
2024-11-01 11:43:41 
2024-11-01 11:43:41 SyntaxError: Unexpected token 'e', "event: err"... is not valid JSON
2024-11-01 11:43:41     at JSON.parse (<anonymous>)
2024-11-01 11:43:41     at Xt (file:///app/build/start-server.js:2:71350)
2024-11-01 11:43:41     at file:///app/build/start-server.js:2:146756
2024-11-01 11:43:41     at async file:///app/build/start-server.js:2:146361

Application logs:

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Root Cause Analysis

  • The error occurs when the model returns a non-200 response
  • Verified by running parallel requests directly against the GCP API

Additional Notes

  • This issue reproduces with other configurations, including fallback mode
  • The error is not specific to the provided configuration

What Should Have Happened?

The gateway should return an appropriate error response to the application.

Hitting the GCP api directly returns:

529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}}

Relevant Code Snippet

No response

Your Twitter/LinkedIn

https://www.linkedin.com/in/maxkrueger1/

@mkrueger12 mkrueger12 added the bug Something isn't working label Nov 1, 2024
@github-actions github-actions bot added the triage label Nov 1, 2024
@mkrueger12
Copy link
Author

2024-11-02 14:21:05 Your AI Gateway is now running on http://localhost:8787 🚀
2024-11-02 14:21:07 chunk event: error
2024-11-02 14:21:07 data: {"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"}           }
2024-11-02 14:21:07 undefined:1
2024-11-02 14:21:07 event: error
2024-11-02 14:21:07 ^
2024-11-02 14:21:07 
2024-11-02 14:21:07 SyntaxError: Unexpected token 'e', "event: err"... is not valid JSON
2024-11-02 14:21:07     at JSON.parse (<anonymous>)
2024-11-02 14:21:07     at Xt (file:///app/build/start-server.js:2:71373)
2024-11-02 14:21:07     at file:///app/build/start-server.js:2:146779
2024-11-02 14:21:07     at async file:///app/build/start-server.js:2:146384

@mkrueger12
Copy link
Author

mkrueger12 commented Nov 2, 2024

Here is the file where the error is occurring: src/providers/google-vertex-ai/chatComplete.ts

This method does not have error handling:

export const VertexAnthropicChatCompleteStreamChunkTransform: (
  response: string,
  fallbackId: string,
  streamState: Record<string, boolean>
) => string | undefined = (responseChunk, fallbackId, streamState) => {
  let chunk = responseChunk.trim();
  
  if (
    chunk.startsWith('event: ping') ||
    chunk.startsWith('event: content_block_stop') ||
    chunk.startsWith('event: vertex_event')
  ) {
    return;
  }

  if (chunk.startsWith('event: message_stop')) {
    return 'data: [DONE]\n\n';
  }

  chunk = chunk.replace(/^event: content_block_delta[\r\n]*/, '');
  chunk = chunk.replace(/^event: content_block_start[\r\n]*/, '');
  chunk = chunk.replace(/^event: message_delta[\r\n]*/, '');
  chunk = chunk.replace(/^event: message_start[\r\n]*/, '');
  chunk = chunk.replace(/^data: /, '');
  chunk = chunk.trim();

  const parsedChunk: AnthropicChatCompleteStreamResponse = JSON.parse(chunk);

  if (
    parsedChunk.type === 'content_block_start' &&
    parsedChunk.content_block?.type === 'text'
  ) {
    streamState.containsChainOfThoughtMessage = true;
    return;
  }

  if (parsedChunk.type === 'message_start' && parsedChunk.message?.usage) {
    return (
      `data: ${JSON.stringify({
        id: fallbackId,
        object: 'chat.completion.chunk',
        created: Math.floor(Date.now() / 1000),
        model: parsedChunk.message?.usage,
        provider: GOOGLE_VERTEX_AI,
        choices: [
          {
            delta: {
              content: '',
            },
            index: 0,
            logprobs: null,
            finish_reason: null,
          },
        ],
        usage: {
          prompt_tokens: parsedChunk.message?.usage?.input_tokens,
        },
      })}` + '\n\n'
    );
  }

  if (parsedChunk.type === 'message_delta' && parsedChunk.usage) {
    return (
      `data: ${JSON.stringify({
        id: fallbackId,
        object: 'chat.completion.chunk',
        created: Math.floor(Date.now() / 1000),
        model: '',
        provider: GOOGLE_VERTEX_AI,
        choices: [
          {
            index: 0,
            delta: {},
            finish_reason: parsedChunk.delta?.stop_reason,
          },
        ],
        usage: {
          completion_tokens: parsedChunk.usage?.output_tokens,
        },
      })}` + '\n\n'
    );
  }

  const toolCalls = [];
  const isToolBlockStart: boolean =
    parsedChunk.type === 'content_block_start' &&
    !!parsedChunk.content_block?.id;
  const isToolBlockDelta: boolean =
    parsedChunk.type === 'content_block_delta' &&
    !!parsedChunk.delta.partial_json;
  const toolIndex: number = streamState.containsChainOfThoughtMessage
    ? parsedChunk.index - 1
    : parsedChunk.index;

  if (isToolBlockStart && parsedChunk.content_block) {
    toolCalls.push({
      index: toolIndex,
      id: parsedChunk.content_block.id,
      type: 'function',
      function: {
        name: parsedChunk.content_block.name,
        arguments: '',
      },
    });
  } else if (isToolBlockDelta) {
    toolCalls.push({
      index: toolIndex,
      function: {
        arguments: parsedChunk.delta.partial_json,
      },
    });
  }

  return (
    `data: ${JSON.stringify({
      id: fallbackId,
      object: 'chat.completion.chunk',
      created: Math.floor(Date.now() / 1000),
      model: '',
      provider: GOOGLE_VERTEX_AI,
      choices: [
        {
          delta: {
            content: parsedChunk.delta?.text,
            tool_calls: toolCalls.length ? toolCalls : undefined,
          },
          index: 0,
          logprobs: null,
          finish_reason: parsedChunk.delta?.stop_reason ?? null,
        },
      ],
    })}` + '\n\n'
  );
};

@narengogi
Copy link
Collaborator

Here is the file where the error is occurring: src/providers/google-vertex-ai/chatComplete.ts

This method does not have error handling:

Thanks for reporting this @mkrueger12 and thanks for being so detailed in the description!!
I'll fix this.

Usually no provider returns an error in a chunk, so there is no error handling done here, but it's google, they always gotta do something weird with their API standards, xd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants