September 08, 2025: Weekly Status Update in Gluten #10652
GlutenPerfBot
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This weekly update is generated by LLMs. You're welcome to join our Github for in-depth discussions.
Overall Activity Summary
The past 7 days saw 38 merged PRs and 29 open PRs across Velox, ClickHouse, Flink and build/infra areas. Velox backend dominated with daily version bumps, shuffle-read optimizations, and new function enablements. Flink activity surged (7 PRs) around Nexmark benchmark support, while Iceberg/Delta lake features and GPU/cuDF connectors are gaining momentum. Community is preparing for Gluten 1.5.0 release with documentation clean-ups and CI improvements.
Key Ongoing Projects
VeloxResizeBatches
, cutting deserialize overhead in sort-based shuffle.--add-opens
options to MAVEN_OPTS for Java 17 compatibility #10572).Priority Items
Notable Discussions
Emerging Trends
identifyBatchType
(Avoid repeated calls to identifiyBatchType #10649),StrictRule
simplification ([GLUTEN-10559] Simplify StrictRule and remove unnecessary DummyLeafExec #10553), and hash-table build configs ([GLUTEN-10660][VL] Adding configuration for hash table build #10634) all target driver-side CPU.Good First Issues
MakeYMInterval
– pure CH backend, no native changes.date_from_unix_date
for ClickHouse – follow existing date function pattern.split_part
string function in ClickHouse – straightforward string splitting logic.SparkPartitionID
in ClickHouse backend – reuse Spark’s partition ID.MapZipWith
for ClickHouse – entry-level map function, good for learning CH UDF framework.All CH good-first issues need basic C++ and ClickHouse function registration knowledge; unit tests and documentation updates are expected.
Beta Was this translation helpful? Give feedback.
All reactions