Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: random slack code cleanup #5307

Merged
merged 4 commits into from
Nov 29, 2024
Merged

chore: random slack code cleanup #5307

merged 4 commits into from
Nov 29, 2024

Conversation

joeyorlando
Copy link
Contributor

@joeyorlando joeyorlando commented Nov 28, 2024

What this PR does

Related to #5287

Few random "clean-ups", type improvements, etc.

Additionally, fixes a change made in #5292; we should wait to read from slack_message.channel.slack_id, until we've performed the data-migration mentioned in that PR (in the mean-time we should continue to use slack_message._channel_id).

Checklist

  • Unit, integration, and e2e (if applicable) tests updated
  • Documentation added (or pr:no public docs PR label added if not required)
  • Added the relevant release notes label (see labels prefixed w/ release:). These labels dictate how your PR will
    show up in the autogenerated release notes.

@joeyorlando joeyorlando added pr:no public docs Added to a PR that does not require public documentation updates release:patch PR will be added to "Other Changes" section of release notes labels Nov 28, 2024
@joeyorlando joeyorlando requested a review from a team as a code owner November 28, 2024 15:42
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the name of this function confusing (given the contexts in which it was invoked) + it's only doing an equality check, so I don't think we need a separate function just for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a random change, but we discussed it here (since it's only a one-liner change + this library isn't used, decided to just squeeze it in this PR)

Comment on lines +129 to -76
self._post_alert_group_to_slack(
slack_team_identity=slack_team_identity,
alert_group=alert_group,
alert=alert,
attachments=alert_group.render_slack_attachments(),
slack_channel=slack_channel,
blocks=alert_group.render_slack_blocks(),
)
self._send_first_alert(alert, slack_channel)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved logic from _send_first_alert here (it was only invoked once, here, and was just passing through to self._post_alert_group_to_slack as can be seen here)

Comment on lines 53 to 102
class IncomingAlertStep(scenario_step.ScenarioStep):
def process_signal(self, alert: Alert) -> None:
"""
Here lay dragons. Below is some added context as to why we have the explicit
`AlertGroup.objects.filter(...).update(...)` calls.

For example:
```
num_updated_rows = AlertGroup.objects.filter(
pk=alert.group.pk, slack_message_sent=False,
).update(slack_message_sent=True)

if num_updated_rows == 1:
# post new message
else:
# update existing message
```

This piece of code guarantees that when 2 alerts are created concurrently we don't end up posting a
message twice. we rely on the fact that the `UPDATE` SQL statements are atomic. this is not the same as:

```
if not alert_group.slack_message_sent:
# post new message
alert_group.slack_message_sent = True
else:
# update existing message
```

This would end up posting multiple Slack messages for a single alert group (classic race condition). And then
all kinds of unexpected behaviours would arise, because some parts of the codebase assume there can only be 1
`SlackMessage` per `AlertGroup`.
"""

alert_group = alert.group

if not alert_group:
# this case should hypothetically never happen, it's mostly to appease mypy with the
# fact that alert.group can "technically" be None
logger.warning(
f"Skip IncomingAlertStep.process_signal because alert {alert.pk} doesn't actually "
"have an alert group associated with it"
)
return

alert_group_pk = alert_group.pk
alert_receive_channel = alert_group.channel
should_skip_escalation_in_slack = alert_group.skip_escalation_in_slack
slack_team_identity = self.slack_team_identity
slack_team_identity_pk = slack_team_identity.pk
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious on others thoughts around the naming change here? Personally AlertShootingStep (🔫) was not super clear to me on what it did.

Also, main changes here are just extracting things out to variables + added some more detailed comments about why things are they way they are here (thanks to @Ferril and @vadimkerr for this conversation ❤️)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes IncomingAlertStep sounds much better IMO

Comment on lines -658 to -660
class ReadEditPostmortemStep(ResolutionNoteModalStep):
# Left for backward compatibility with slack messages created before postmortems -> resolution note change
pass
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably okay to drop this? I don't see any references to this over the last 6 weeks (logs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_start_disable_maintenance_task isn't needed.. these tests still pass without it, deleting this fixture

Comment on lines -541 to +599
with patch("apps.alerts.tasks.notify_user.compare_escalations", return_value=True):
# send notification for 2 active alert groups
send_bundled_notification(notification_bundle.id)
assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text
assert "perform bundled notification for alert groups with ids:" in caplog.text
# check bundle_uuid was set, notification for resolved alert group was deleted
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
assert notification_bundle.notifications.all().count() == 2
assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists()

# send notification for 1 active alert group
notification_bundle.notifications.update(bundle_uuid=None)
alert_group_2.resolve()
send_bundled_notification(notification_bundle.id)
assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text
assert (
f"there is only one alert group in bundled notification, perform regular notification. "
f"alert_group {alert_group_1.id}"
) in caplog.text
# check bundle_uuid was set
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
assert notification_bundle.notifications.all().count() == 1
# cleanup notifications
notification_bundle.notifications.all().delete()

# send notification for 0 active alert group
notification_bundle.append_notification(alert_group_1, notification_policy)
alert_group_1.resolve()
send_bundled_notification(notification_bundle.id)
assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text
assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text
# check all notifications were deleted
assert notification_bundle.notifications.all().count() == 0

# send notification for 2 active alert groups
send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)

assert f"alert_group {alert_group_3.id} is not active, skip notification" in caplog.text
assert "perform bundled notification for alert groups with ids:" in caplog.text

# check bundle_uuid was set, notification for resolved alert group was deleted
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
assert notification_bundle.notifications.all().count() == 2
assert not notification_bundle.notifications.filter(alert_group=alert_group_3).exists()

# send notification for 1 active alert group
notification_bundle.notifications.update(bundle_uuid=None)

# since we're calling send_bundled_notification several times within this test, we need to reset task_id
# because it gets set to None after the first call
notification_bundle.notification_task_id = task_id
notification_bundle.save()

alert_group_2.resolve()

send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)

assert f"alert_group {alert_group_2.id} is not active, skip notification" in caplog.text
assert (
f"there is only one alert group in bundled notification, perform regular notification. "
f"alert_group {alert_group_1.id}"
) in caplog.text

# check bundle_uuid was set
assert notification_bundle.notifications.filter(bundle_uuid__isnull=True).count() == 0
assert notification_bundle.notifications.all().count() == 1

# cleanup notifications
notification_bundle.notifications.all().delete()

# send notification for 0 active alert group
notification_bundle.append_notification(alert_group_1, notification_policy)

# since we're calling send_bundled_notification several times within this test, we need to reset task_id
# because it gets set to None after the first call
notification_bundle.notification_task_id = task_id
notification_bundle.save()

alert_group_1.resolve()

send_bundled_notification.apply((notification_bundle.id,), task_id=task_id)

assert f"alert_group {alert_group_1.id} is not active, skip notification" in caplog.text
assert f"no alert groups to notify about or notification is not allowed for user {user.id}" in caplog.text

# check all notifications were deleted
assert notification_bundle.notifications.all().count() == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I removed the compare_escalations function, I need to reset the notification_task_id value after each invocation of send_bundled_notification (alternative here would be to move these different test cases out to separate functions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_start_disable_maintenance_task not used

Comment on lines +153 to +170
# send debug mode notice
text = "Escalations are silenced due to Debug mode"
self._slack_client.chat_postMessage(
channel=slack_channel_id,
text=text,
attachments=[],
thread_ts=alert_group.slack_message.slack_id,
mrkdwn=True,
blocks=[
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": text,
},
},
],
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same thing here, moved _send_debug_mode_notice inline here because it's only called once (here) and is simply making a call to self._slack_client.chat_postMessage

Comment on lines +172 to -90
if not alert_group.is_maintenance_incident and not should_skip_escalation_in_slack:
send_message_to_thread_if_bot_not_in_channel.apply_async(
(alert_group_pk, slack_team_identity_pk, slack_channel_id),
countdown=1, # delay for message so that the log report is published first
)

if alert.group.is_maintenance_incident:
# not sending log report message for maintenance incident
pass
else:
# check if alert group was posted to slack before posting message to thread
if not alert.group.skip_escalation_in_slack:
self._send_message_to_thread_if_bot_not_in_channel(alert.group, slack_channel)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplified nested conditional logic here + moved self._send_message_to_thread_if_bot_not_in_channel inline (it was just making a call to send_message_to_thread_if_bot_not_in_channel.apply_async)

Comment on lines -862 to +926
self._slack_client.reactions_remove(channel=message.slack_channel_id, name="memo", timestamp=message.ts)
self._slack_client.reactions_remove(
channel=message.slack_channel_slack_id, name="memo", timestamp=message.ts
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a typo.

ResolutionNoteSlackMessage has a slack_channel_slack_id property which points to the Slack ID of the associated channel (ex. C123FB4X; slack_channel_id will actually point to the primary key ID of the associated slack_channel database record; ex. 14)

Comment on lines -873 to +937
self._slack_client.chat_delete(channel=message.slack_channel_id, ts=message.ts)
self._slack_client.chat_delete(channel=message.slack_channel_slack_id, ts=message.ts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a typo.

ResolutionNoteSlackMessage has a slack_channel_slack_id property which points to the Slack ID of the associated channel (ex. C123FB4X; slack_channel_id will actually point to the primary key ID of the associated slack_channel database record; ex. 14)

@joeyorlando joeyorlando requested a review from Ferril November 28, 2024 17:01
Copy link
Member

@vstpme vstpme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

def process_signal(self, alert: Alert) -> None:
"""
🐉 Here lay dragons 🐉
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐲 👍

Copy link
Member

@Ferril Ferril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@joeyorlando joeyorlando added this pull request to the merge queue Nov 29, 2024
Merged via the queue into dev with commit 5227ee3 Nov 29, 2024
26 checks passed
@joeyorlando joeyorlando deleted the jorlando/slack-cleanup branch November 29, 2024 13:45
joeyorlando added a commit that referenced this pull request Dec 2, 2024
…5315)

# What this PR does

Inspired by [this
discussion](#5307 (comment)).
_tldr;_ ensures that if any of our tests try making an external network
call, they will fail.

Setup an example test:

```python
def test_external_network_call():
    import requests

    response = requests.get('https://www.example.com')
    assert response.status_code == 200
```

and it worked (failed; [example CI test
run](https://github.com/grafana/oncall/actions/runs/12106416991/job/33752144727?pr=5315#step:6:389))
as expected:

```bash
__________________________ test_external_network_call __________________________
    def test_external_network_call():
        import requests
    
>       response = requests.get('https://www.example.com')
requests   = <module 'requests' from '/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/__init__.py'>
apps/test_joey.py:4: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:73: in get
    return request("get", url, params=params, **kwargs)
        kwargs     = {}
        params     = None
        url        = 'https://www.example.com'
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/api.py:59: in request
    return session.request(method=method, url=url, **kwargs)
        kwargs     = {'params': None}
        method     = 'get'
        session    = <requests.sessions.Session object at 0x7f10ebaada90>
        url        = 'https://www.example.com'
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:589: in request
    resp = self.send(prep, **send_kwargs)
        allow_redirects = True
        auth       = None
        cert       = None
        cookies    = None
        data       = None
        files      = None
        headers    = None
        hooks      = None
        json       = None
        method     = 'get'
        params     = None
        prep       = <PreparedRequest [GET]>
        proxies    = {}
        req        = <Request [GET]>
        self       = <requests.sessions.Session object at 0x7f10ebaada90>
        send_kwargs = {'allow_redirects': True, 'cert': None, 'proxies': OrderedDict(), 'stream': False, ...}
        settings   = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'verify': True}
        stream     = None
        timeout    = None
        url        = 'https://www.example.com'
        verify     = None
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/sessions.py:703: in send
    r = adapter.send(request, **kwargs)
        adapter    = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30>
        allow_redirects = True
        hooks      = {'response': []}
        kwargs     = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': None, ...}
        request    = <PreparedRequest [GET]>
        self       = <requests.sessions.Session object at 0x7f10ebaada90>
        start      = 1733064371.649901
        stream     = False
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/requests/adapters.py:667: in send
    resp = conn.urlopen(
        cert       = None
        chunked    = False
        conn       = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30>
        proxies    = OrderedDict()
        request    = <PreparedRequest [GET]>
        self       = <requests.adapters.HTTPAdapter object at 0x7f10ebaada30>
        stream     = False
        timeout    = Timeout(connect=None, read=None, total=None)
        url        = '/'
        verify     = True
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:715: in urlopen
    httplib_response = self._make_request(
        assert_same_host = False
        body       = None
        body_pos   = None
        chunked    = False
        clean_exit = False
        conn       = None
        destination_scheme = None
        err        = None
        headers    = {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
        http_tunnel_required = False
        is_new_proxy_conn = False
        method     = 'GET'
        parsed_url = Url(scheme=None, auth=None, host=None, port=None, path='/', query=None, fragment=None)
        pool_timeout = None
        redirect   = False
        release_conn = False
        release_this_conn = True
        response_kw = {'decode_content': False, 'preload_content': False}
        retries    = Retry(total=0, connect=None, read=False, redirect=None, status=None)
        self       = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30>
        timeout    = Timeout(connect=None, read=None, total=None)
        timeout_obj = Timeout(connect=None, read=None, total=None)
        url        = '/'
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:404: in _make_request
    self._validate_conn(conn)
        chunked    = False
        conn       = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60>
        httplib_request_kw = {'body': None, 'headers': {'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}}
        method     = 'GET'
        self       = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30>
        timeout    = Timeout(connect=None, read=None, total=None)
        timeout_obj = Timeout(connect=None, read=None, total=None)
        url        = '/'
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connectionpool.py:1060: in _validate_conn
    conn.connect()
        __class__  = <class 'urllib3.connectionpool.HTTPSConnectionPool'>
        conn       = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60>
        self       = <urllib3.connectionpool.HTTPSConnectionPool object at 0x7f10ebaadd30>
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:363: in connect
    self.sock = conn = self._new_conn()
        self       = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60>
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/connection.py:174: in _new_conn
    conn = connection.create_connection(
        extra_kw   = {'socket_options': [(6, 1, 1)]}
        self       = <urllib3.connection.HTTPSConnection object at 0x7f10ebaadd60>
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/urllib3/util/connection.py:85: in create_connection
    sock.connect(sa)
        address    = ('www.example.com', 443)
        af         = <AddressFamily.AF_INET: 2>
        canonname  = ''
        err        = None
        family     = <AddressFamily.AF_UNSPEC: 0>
        host       = 'www.example.com'
        port       = 443
        proto      = 6
        res        = (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('93.184.215.14', 443))
        sa         = ('93.184.215.14', 443)
        sock       = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)>
        socket_options = [(6, 1, 1)]
        socktype   = <SocketKind.SOCK_STREAM: 1>
        source_address = None
        timeout    = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
inst = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)>
args = (('93.184.215.14', 443),), host = '93.184.215.14'
    def guarded_connect(inst, *args):
        host = host_from_connect_args(args)
        if host in allowed_ip_hosts_and_hostnames or (
            _is_unix_socket(inst.family) and allow_unix_socket
        ):
            return _true_connect(inst, *args)
    
>       raise SocketConnectBlockedError(allowed_list, host)
E       pytest_socket.SocketConnectBlockedError: A test tried to use socket.socket.connect() with host "93.184.215.14" (allowed: "calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139,2607:f8b0:4004:c09::65,2607:f8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b),localhost (127.0.0.1,::1),oncall-dev-mariadb ()").
allow_unix_socket = False
allowed_ip_hosts_and_hostnames = {'127.0.0.1', '142.251.167.100', '142.251.167.101', '142.251.167.102', '142.251.167.113', '142.251.167.138', ...}
allowed_list = ['calendar.google.com (142.251.167.100,142.251.167.101,142.251.167.102,142.251.167.113,142.251.167.138,142.251.167.139...8b0:4004:c09::66,2607:f8b0:4004:c09::71,2607:f8b0:4004:c09::8b)', 'localhost (127.0.0.1,::1)', 'oncall-dev-mariadb ()']
args       = (('93.184.215.14', 443),)
host       = '93.184.215.14'
inst       = <socket.socket fd=12, family=2, type=1, proto=6, laddr=('0.0.0.0', 0)>
/opt/hostedtoolcache/Python/3.12.3/x64/lib/python3.12/site-packages/pytest_socket.py:252: SocketConnectBlockedError
```

## Checklist

- [x] Unit, integration, and e2e (if applicable) tests updated
- [x] Documentation added (or `pr:no public docs` PR label added if not
required)
- [x] Added the relevant release notes label (see labels prefixed w/
`release:`). These labels dictate how your PR will
    show up in the autogenerated release notes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr:no public docs Added to a PR that does not require public documentation updates release:patch PR will be added to "Other Changes" section of release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants