Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile events support #455

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kozzztik
Copy link

@kozzztik kozzztik commented Sep 18, 2024

Adds support of profile events on native protocol. There are many different parameters inside like network timings, locks, memory usage, that may be very helpful for debug and monitoring queries.

Not sure is it needed to update docs.

Checklist:

  • Add tests that demonstrate the correct behavior of the change. Tests should fail without the change.
  • Add or update relevant docs, in the docs folder and in code.
  • Ensure PR doesn't contain untouched code reformatting: spaces, etc.
  • Run flake8 and fix issues.
  • Run pytest no tests failed. See https://clickhouse-driver.readthedocs.io/en/latest/development.html.

@@ -142,3 +143,14 @@ def store_progress(self, progress):

def store_elapsed(self, elapsed):
self.elapsed = elapsed

def store_profile_events(self, packet):
data = QueryResult([packet]).get_result()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have 'static' attributes that stores statistics: https://clickhouse-driver.readthedocs.io/en/latest/features.html#query-execution-statistics: client.last_query.progress.total_rows, client.last_query.progress.total_bytes, etc.

I'd prefer to store statistics in the same way if it's possible: client.last_query.stats.select_query, client.last_query.stats.selected_rows.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use content of this data for analyzing queries, and they may be very different. Metrics here vary on query type and even queried table engine. I guess server version may have effect too (I use only v.23 at this moment). So, finally, that is not stable list of metrics. And number of options too big (may be >100 finally).
I found that ~20 of them most common for queries and most intersting. I use pydantic model to get them:

class ClickhouseStats(pydantic.BaseModel):
    elapsed: int = pydantic.Field(alias="elapsed")
    is_insert: int | None = pydantic.Field(alias="InsertQuery", default=None)
    read_bytes: int | None = pydantic.Field(alias="ReadCompressedBytes", default=None)
    write_bytes: int | None = pydantic.Field(alias="WriteBufferFromFileDescriptorWriteBytes", default=None)
    network_recv_bytes: int | None = pydantic.Field(alias="NetworkReceiveBytes", default=None)
    network_recv_time: int | None = pydantic.Field(alias="NetworkReceiveElapsedMicroseconds", default=None)
    network_send_bytes: int | None = pydantic.Field(alias="NetworkSendBytes", default=None)
    network_send_time: int | None = pydantic.Field(alias="NetworkSendElapsedMicroseconds", default=None)
    memory_usage: int | None = pydantic.Field(alias="MemoryTrackerUsage", default=None)
    memory_peak: int | None = pydantic.Field(alias="MemoryTrackerPeakUsage", default=None)
    file_open: int | None = pydantic.Field(alias="FileOpen", default=None)
    function_execute: int | None = pydantic.Field(alias="FunctionExecute", default=None)
    write_time: int | None = pydantic.Field(alias="DiskWriteElapsedMicroseconds", default=None)
    insert_rows: int | None = pydantic.Field(alias="InsertedRows", default=None)
    insert_bytes: int | None = pydantic.Field(alias="InsertedBytes", default=None)
    select_rows: int | None = pydantic.Field(alias="SelectedRows", default=None)
    select_bytes: int | None = pydantic.Field(alias="SelectedBytes", default=None)
    insert_parts: int | None = pydantic.Field(alias="InsertedCompactParts", default=None)
    real_time: int | None = pydantic.Field(alias="RealTimeMicroseconds", default=None)
    system_time: int | None = pydantic.Field(alias="SystemTimeMicroseconds", default=None)

    def __init__(self, result: CursorResult | None = None, query_info: QueryInfo | None = None):
        if query_info is None:
            query_info: QueryInfo = result.context.query_info
        super().__init__(elapsed=int(query_info.elapsed * 1000), **(query_info.stats or {}))  # TODO: 1000?

I can add them here(without pydantic), but wouldn't that be too much?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. It's too much. Dict will be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query execution statistics add memory_usage
2 participants