Skip to content

Bolt + Flask + Kubernetes inevitably starts throwing WebSocketConnectionClosedException #445

Open
@IanWhalen

Description

@IanWhalen

Running a bolt app using socket mode inside Flask inside kubernetes works initially but eventually always loses the connection and falls back to a WebSocketConnectionClosedException error.

Given that auto_reconnect_enabled defaults to True, I would expect any failures to just result in the app reconnecting.

I'm opening this as a question as I'm highly doubtful its an actual bug, and instead just something I need to do differently/better in my own app code.

Reproducible in:

The slack_bolt version

slack-bolt = "1.6.0"
slack-sdk = "3.8.0"
websocket-client = "1.1.0"

Python runtime version

python3.7

OS info

Problem is seen running in a container.

Steps to reproduce:

I've tried to emulate the pattern in #255 for running bolt + slack, so a simplified version of my app looks like this:

# ./app.py
from flask import Flask
from slack_app.slack_service import slack

slack.connect()

app = Flask(__name__)
# ./slack_app/slack_service.py
from slack_bolt import App
from slack_bolt.error import BoltUnhandledRequestError
from slack_bolt.adapter.socket_mode.websocket_client import SocketModeHandler

SLACK_APP_TOKEN, SLACK_BOT_TOKEN = get_slack_tokens_from_env()

app = App(
    token=SLACK_BOT_TOKEN,
    raise_error_for_unhandled_request=True,
)
slack = SocketModeHandler(app, SLACK_APP_TOKEN)

@app.error
def handle_errors(error):
    if isinstance(error, BoltUnhandledRequestError):
        pass
    else:
        logger.error(error)

I doubt the BoltUnhandledRequestError is causing this but included it in my example code just in case.

Maybe of note is that i'm using websocket_client based on the suggestion in slackapi/python-slack-sdk#1024. We were seeing the same BlockingIOError logs.

Also maybe of note is that in #255 you suggest using two threads for gunicorn and we are just currently running with:

gunicorn app:app --workers=1 --bind=0.0.0.0:8080 --timeout=3600

Lastly of note is that I am unable to repro this problem locally, and I'm just seeing it inside of our kubernetes cluster. Unfortunately I'm not savvy enough to know how to debug whether the k8s infra is causing my problem (although I am simultaneous to filing this issue working with the people who maintain that infra to investigate from that side).

Expected result:

My slack connection doesn't die.

Actual result:

My app connects fine initially, but after some period of time disconnects from slack and the logs quickly degenerate into the following error every 5 seconds:

on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions