Description
Running a bolt app using socket mode inside Flask inside kubernetes works initially but eventually always loses the connection and falls back to a WebSocketConnectionClosedException
error.
Given that auto_reconnect_enabled defaults to True, I would expect any failures to just result in the app reconnecting.
I'm opening this as a question as I'm highly doubtful its an actual bug, and instead just something I need to do differently/better in my own app code.
Reproducible in:
The slack_bolt
version
slack-bolt = "1.6.0"
slack-sdk = "3.8.0"
websocket-client = "1.1.0"
Python runtime version
python3.7
OS info
Problem is seen running in a container.
Steps to reproduce:
I've tried to emulate the pattern in #255 for running bolt + slack, so a simplified version of my app looks like this:
# ./app.py
from flask import Flask
from slack_app.slack_service import slack
slack.connect()
app = Flask(__name__)
# ./slack_app/slack_service.py
from slack_bolt import App
from slack_bolt.error import BoltUnhandledRequestError
from slack_bolt.adapter.socket_mode.websocket_client import SocketModeHandler
SLACK_APP_TOKEN, SLACK_BOT_TOKEN = get_slack_tokens_from_env()
app = App(
token=SLACK_BOT_TOKEN,
raise_error_for_unhandled_request=True,
)
slack = SocketModeHandler(app, SLACK_APP_TOKEN)
@app.error
def handle_errors(error):
if isinstance(error, BoltUnhandledRequestError):
pass
else:
logger.error(error)
I doubt the BoltUnhandledRequestError
is causing this but included it in my example code just in case.
Maybe of note is that i'm using websocket_client
based on the suggestion in slackapi/python-slack-sdk#1024. We were seeing the same BlockingIOError
logs.
Also maybe of note is that in #255 you suggest using two threads for gunicorn and we are just currently running with:
gunicorn app:app --workers=1 --bind=0.0.0.0:8080 --timeout=3600
Lastly of note is that I am unable to repro this problem locally, and I'm just seeing it inside of our kubernetes cluster. Unfortunately I'm not savvy enough to know how to debug whether the k8s infra is causing my problem (although I am simultaneous to filing this issue working with the people who maintain that infra to investigate from that side).
Expected result:
My slack connection doesn't die.
Actual result:
My app connects fine initially, but after some period of time disconnects from slack and the logs quickly degenerate into the following error every 5 seconds:
on_error invoked (error: WebSocketConnectionClosedException, message: Connection to remote host was lost.)