Skip to content

Iceberg DataLakeCatalog queries no longer emit ParquetMetaDataCacheHits in system.query_log #1277

@CarlosFelipeOR

Description

@CarlosFelipeOR

Describe the bug

Starting from Antalya 25.3.3.20183, Iceberg queries executed via DataLakeCatalog (REST catalog) no longer emit the ParquetMetaDataCacheHits ProfileEvent in system.query_log, even when Parquet metadata cache is explicitly enabled and the same query is executed repeatedly (warm runs).

This is a regression: the same behavior worked correctly in earlier Antalya 25.3 builds, and it is still failing in Antalya 25.8.

As a result, Parquet metadata caching cannot be observed or validated via ProfileEvents for Iceberg database engine queries, breaking cache validation and violating the documented SRS behavior.


To Reproduce

Option A — Reproduce via automation (recommended)

The issue can be reproduced using the existing Iceberg cache regression test.
The test will pause immediately after the failure, allowing live investigation of the ClickHouse instance.

python3 -u iceberg/regression.py \
  --only "/iceberg/iceberg cache/rest catalog/iceberg database engine/cache/*" \
  --local \
  --clickhouse docker://altinity/clickhouse-server:25.8.12.20747.altinityantalya \
  --clickhouse-version 25.8.12.20747 \
  --pause-on-fail "/iceberg/iceberg cache/rest catalog/iceberg database engine/cache/"

Option B — Manual reproduction

Prerequisites

  • ClickHouse Antalya build >= 25.3.3.20183
  • S3-compatible object storage (e.g. MinIO)
  • Iceberg REST catalog reachable from ClickHouse (may require authentication)
  • An existing Iceberg table in the catalog/warehouse (created via Iceberg tooling; must include a date_col column used in the query)
  • ClickHouse has access to:
    • the REST catalog endpoint (including required auth), and
    • the object storage warehouse location (credentials + endpoint)

Steps

  1. Enable Iceberg database engine:
SET allow_experimental_database_iceberg = 1;
  1. Create an Iceberg catalog database using REST catalog:
CREATE DATABASE iceberg_db
ENGINE = DataLakeCatalog('http://ice-rest-catalog:5000', 'admin', 'password')
SETTINGS
    catalog_type = 'rest',
    storage_endpoint = 'http://minio:9000/warehouse',
    warehouse = 's3://bucket1/',
    auth_header = 'Authorization: <...>'; -- if your REST catalog requires it
  1. Enable Parquet metadata cache:
SET input_format_parquet_use_metadata_cache = 1;
  1. Execute the same query once (cold run) and then repeat it multiple times (warm runs):
SELECT *
FROM iceberg_db.`<namespace>.<table>`
WHERE date_col > '3030-01-01'
SETTINGS log_comment = 'repro_parquet_metadata_cache'
FORMAT TabSeparated;

(repeat the query ~20–100 times)

  1. Inspect ProfileEvents:
SELECT
    count() AS total_rows,
    sum(mapContains(ProfileEvents, 'ParquetMetaDataCacheHits')) AS rows_with_hits_key,
    sum(ProfileEvents['ParquetMetaDataCacheHits']) AS hits_sum
FROM system.query_log
WHERE type = 'QueryFinish'
  AND log_comment = 'repro_parquet_metadata_cache';

Expected behavior

When input_format_parquet_use_metadata_cache = 1 is enabled and the same Iceberg query is executed repeatedly:

  • Parquet metadata should be cached
  • ParquetMetaDataCacheHits must be emitted in system.query_log → ProfileEvents
  • The counter should be > 0 for warm runs

This matches the documented SRS requirement:

ClickHouse SHALL track parquet metadata cache performance metrics in system.query_log via ParquetMetaDataCacheHits.


Actual behavior

In Antalya builds >= 25.3.3.20183:

  • ParquetMetaDataCacheHits is not emitted
  • The key does not appear in ProfileEvents, even after many warm runs
  • The same query path emitted this event correctly in earlier builds

Key information

  • Project Antalya Build Version

    • ✅ PASS: 25.3.3.20143.altinityantalya (Jun 13, 2025)
    • ❌ FAIL: 25.3.3.20183.altinityantalya (Jul 10, 2025)
    • ❌ FAIL: 25.8.12.20747.altinityantalya
  • Cloud provider: N/A (local repro)

  • Object storage: MinIO (S3-compatible)

  • Iceberg catalog: REST catalog

  • Iceberg access method: DataLakeCatalog database engine

Release notes of the first failing build:
https://github.com/Altinity/ClickHouse/releases/tag/v25.3.3.20183.altinityantalya


Additional context

  • Historical test data and re-runs against multiple versions confirm this is not intermittent.
  • Other Iceberg-related cache ProfileEvents (e.g. IcebergMetadataFilesCacheHits) may still be emitted.
  • The issue seems specific to Parquet metadata cache ProfileEvents for Iceberg database engine queries.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions