Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top n unhandled exceptions and (if available) code file + line number #568

Open
aaronsteers opened this issue Feb 17, 2023 · 3 comments
Open

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Feb 17, 2023

@pnadolny13 - Related to knowing how to prioritize this item

and this general theme:

are you able to generate a stack-ranked report of the highest frequency exceptions raised by Meltano in telemetry?

I guess it's a two part question:

  1. Do we capture the exception data in telemetry to see how often specific exceptions or errors occur?
    • Line numbers and code files are nice to have, but not strictly necessary. Even without line numbers, we know where "Broken Pipe" errors are coming from.
  2. If yes to the above, do you have time to create the report? Any estimate of level of effort to pull the data?

If yes to 1 but no to 2, then perhaps someone on the engineering team could assist in pulling the data.

Thanks in advance!

@pnadolny13
Copy link
Contributor

@aaronsteers it sounds like this might be similar to #492. We do get exception data but its not in a useful form as of today so I cant easily pull this. I'd need to build some dbt models to parse these exceptions/line numbers.

If we wanted an ad hoc query approach I can see that most broken pipes are coming from these as the lowest level in the traceback:

count file line
11947 lib/python3.9/asyncio/streams.py 197
3934 lib/python3.8/asyncio/streams.py 197
494 lib/python3.9/asyncio/unix_events.py 687
385 lib/python3.9/asyncio/streams.py 359
367 lib/python3.10/asyncio/streams.py 344
293 .../unix_events.py 687
286 lib/python3.10/asyncio/streams.py 178
175 lib/python3.8/asyncio/unix_events.py 687
174 .../unix_events.py 698
162 lib/python3.8/asyncio/unix_events.py 665

Is that what youre looking for? With a bit more work I could probably detect the last file/line where the package was meltano vs a dependency.

@aaronsteers
Copy link
Contributor Author

@pnadolny13 - The above might be sufficient, actually... it looks like great info on top n customer-impacting Meltano failures.

Can you provide context for the above, such as to confirm if these raw counts of the lowest level stack traces? Can you tell what the time period is - for instance all time vs 3 months? And how can we iterate on the above? Is there a SQL query or report that generated the above output?

@pnadolny13
Copy link
Contributor

@aaronsteers cool yeah that was just an example to see if thats what you were looking for. That is for BrokenPipeError only, for all time. An example of what the exception traceback looks like in the warehouse is:

[
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 155
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 243
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 280
  },
  {
    "file": "lib/python3.9/site-packages/meltano/cli/elt.py",
    "line_number": 339
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
    "line_number": 219
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/runner/singer.py",
    "line_number": 147
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 219
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 188
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 359
  },
  {
    "file": "lib/python3.9/site-packages/meltano/core/logging/utils.py",
    "line_number": 185
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 387
  },
  {
    "file": "lib/python3.9/asyncio/streams.py",
    "line_number": 197
  }

I'm parsing it to get the last line in the array and grouping by the context_uuid which is the execution ID.

WITH base AS (
    SELECT
        context_uuid,
        parse_json(exception):traceback::STRING,
        array_slice(parse_json(exception):traceback::ARRAY, -1, 9999999)[0] AS final_traceback,
        final_traceback:file AS file_name,
        final_traceback:line_number  AS line_number
    FROM PREP.workspace.unstruct_event_flattened
    WHERE parse_json(exception):type::STRING = 'BrokenPipeError'
)
SELECT
    count(DISTINCT context_uuid),
    file_name,
    line_number
FROM base GROUP BY 2, 3;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants