Skip to content

Tight polling loop in execute() / fetch() causes 100% CPU on long-running queries #614

Description

@fcfangcc

Expected behavior

When executing a long-running query (e.g., ALTER TABLE ... EXECUTE optimize), the Python client should wait for Trino to finish with low resource consumption — ideally using long-polling (maxWait parameter on next_uri) or a configurable backoff interval between polls. The JDBC driver already supports this.

Actual behavior

The Python client enters a tight polling loop with zero delay between successive fetch() calls, saturating a full CPU core for the entire duration of the query. This happens even though the client has no real work to do — it is simply waiting for Trino to finish.

Steps To Reproduce

Steps To Reproduce

  1. Start a Trino server with an Iceberg table that has many partitions.
  2. Run a long-running DDL via the Python client:
    from trino.dbapi import connect
    conn = connect(host="...", port=8080, user="...", catalog="iceberg", schema="...")
    cur = conn.cursor()
    cur.execute("ALTER TABLE \"iceberg\".\"db\".\"large_table\" EXECUTE optimize(file_size_threshold => '256MB')")
    cur.fetchall()
  3. Observe the Python process CPU usage during execution — it stays at ~100% of a single core until the query completes.

Root Cause

There are two tight polling loops in trino/client.py that call fetch() with no delay between iterations:

TrinoQuery.execute() — blocks until at least one row arrives or the query finishes

while not self.finished and not self.cancelled and len(self._result.rows) == 0:
    self._result.rows += self.fetch()   # no sleep, no backoff

TrinoResult.__iter__() — iterates until the query finishes and all rows are consumed

while not self._query.finished or self._rows is not None:
    next_rows = self._query.fetch() if not self._query.finished else None
    ...
    self._rows = next_rows               # no sleep between fetches

Each fetch() issues an HTTP GET to next_uri. For a long-running query, Trino responds almost instantly with a new next_uri pointing to the same in-progress state. The Python client immediately issues another request — no backoff, no minimum interval — burning CPU in a sub-millisecond HTTP request loop for the entire duration of the query.

DDL statements like ALTER TABLE EXECUTE optimize return zero data rows until completion. The execute() loop condition len(self._result.rows) == 0 stays true for minutes, so the tight loop runs uninterrupted.

The underlying issue: the Trino protocol's next_uri serves dual purpose — (a) ACK to advance query processing, and (b) status polling. The client treats both identically with no delay. The protocol itself supports maxWait on next_uri to enable long-polling, but the Python client does not use it.

Log output

No response

Operating System

PRETTY_NAME="Ubuntu 24.04 LTS"

Trino Python client version

0.337.0

Trino Server version

479

Python version

3.13

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions