Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe Logger metadata should affect grouping #837

Open
ruslandoga opened this issue Dec 17, 2024 · 7 comments
Open

Maybe Logger metadata should affect grouping #837

ruslandoga opened this issue Dec 17, 2024 · 7 comments

Comments

@ruslandoga
Copy link
Contributor

ruslandoga commented Dec 17, 2024

Right now logged errors seem to be grouped by message only.

iex> Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 1")}})
iex> Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 2")}})
# 16:06:06.217 [warning] Event dropped due to being a duplicate of a previously-captured event.
:ok

Would it make sense to consider a hash of :sentry metadata as well?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 17, 2024
@whatyouhide
Copy link
Collaborator

Mmmm what do we do for errors, do you know? I haven't looked and won't have time til the end of the day.

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

@whatyouhide this is what we do:

  # Used to compare events for deduplication. See "Sentry.Dedupe".
  @doc false
  @spec hash(t()) :: non_neg_integer()
  def hash(%__MODULE__{} = event) do
    :erlang.phash2([
      event.exception,
      event.message,
      event.level,
      event.fingerprint
    ])
  end

Adding metadata to this sounds like a good idea to me.

@whatyouhide
Copy link
Collaborator

I’m not sure because metadata might contain fleeting data that gets updated on every call (imagine a timestamp) but that shouldn't affect deduplication. What do other SDKs do?

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

I’m not sure because metadata might contain fleeting data that gets updated on every call (imagine a timestamp) but that shouldn't affect deduplication. What do other SDKs do?

I'm not sure - @sl0thentr0py do you happen to know?

@whatyouhide
Copy link
Collaborator

@solnic you could probably grep through source code of at least the Python and Ruby SDKs because I remember finding this in there pretty easily.

@solnic
Copy link
Collaborator

solnic commented Dec 19, 2024

@whatyouhide I did look for it in ruby and couldn't find it 😅 I'll give it another shot hah

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Dec 19, 2024

Maybe:

  • allow disabling deduplication for loggers (Sentry sinks can do rate-limiting per grouping if needed)
  • deduplicate at least on three (or so) the available fields (can be though of degrees of freedom in this scenario), e.g. if for a Logger.error only two of these are present, level (which is always error? so it's not really a degree of freedom) and message, we can pick additional optional field, e.g. metadata from the call, and we can use a probabilistic view of the metadata fields
Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 1")}})
Logger.error("error from logs", sentry: %{extra: %{response: Bamboo.ApiError.build_api_error("info 1")}})

# Mr Deduplicator: hm, I have seen your hash before, you shall not pass!
# Log event: but that's because I don't have enough entropy!
# Mr Deduplicator: then how about we look into your metadata?
# Log event: but my metadata is super high entropy!
# Mr Deduplicator: then just take a few random fields and be done with it! Garhgh!
# ... etc
  • replace exact hash matching with SimHash/MinHash and some similarity measure
  • we just make this issue a note in documentation and explain how to set a custom fingerprint on Logger.error calls

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Waiting for: Product Owner
Development

No branches or pull requests

3 participants