Skip to content

[GLUTEN-6887][VL] Daily Update Velox Version (2026_04_01)#11860

Open
GlutenPerfBot wants to merge 14 commits intoapache:mainfrom
GlutenPerfBot:tagging-2026_04_01
Open

[GLUTEN-6887][VL] Daily Update Velox Version (2026_04_01)#11860
GlutenPerfBot wants to merge 14 commits intoapache:mainfrom
GlutenPerfBot:tagging-2026_04_01

Conversation

@GlutenPerfBot
Copy link
Copy Markdown
Contributor

@GlutenPerfBot GlutenPerfBot commented Apr 1, 2026

Upstream Velox's New Commits:

24e6ab97b by Chengcheng Jin, fix(cudf): Fix complex data type name in format conversion and add tests(Part1) (16818)
d92b90029 by Natasha Sehgal, refactor: Propagate CastRule cost through canCoerce (16821)
361a42252 by Rui Mo, fix(fuzzer): Reduce Spark aggregate fuzzer test pressure (16964)
2c2fe2ab7 by root, fix: Ignore string column statistics for parquet-mr versions before 1.8.2 (16744)
7faf27a86 by Chengcheng Jin, feat(cudf): Add the log to show detailed fallback messgae (16900)
e603315e5 by Chang chen, feat(parquet): Add type widening support for INT and Decimal types with configurable narrowing (16611)
1e1674dd8 by Rajeev Singh, docs: Add blog post for Adaptive per-function CPU tracking (16945)
0c6b89d61 by Masha Basmanova, fix(build): Guard fuzzer examples subdirectory with VELOX_BUILD_TESTING (16992)
8d6355d8d by Pratik Pugalia, build: Improve build impact comment layout (16971)
44d561990 by Masha Basmanova, refactor: Add ConnectorRegistry class with tryGet and unregisterAll (16977)
793f13f16 by Rajeev Singh, feat(expr-eval):Adaptive per-function CPU sampling for Velox expression evaluation (16646)
1a4dc7a5a by Pratik Pugalia, fix: Off-by-one boundary bug in make_timestamp validation (16944)
7f2c75c26 by Pratik Pugalia, Fix incorrect substr length in Tokenizer::matchUnquotedSubscript (16972)
22b90045e by Masha Basmanova, docs: Add truncate markers to blog posts for cleaner listing page (16975)

velox_branch: https://github.com/IBM/velox/commits/dft-2026_04_01

Related issue: #6887

Upstream Velox's New Commits:
24e6ab97b by Chengcheng Jin, fix(cudf): Fix complex data type name in format conversion and add tests(Part1) (#16818)
d92b90029 by Natasha Sehgal, refactor: Propagate CastRule cost through canCoerce (#16821)
361a42252 by Rui Mo, fix(fuzzer): Reduce Spark aggregate fuzzer test pressure (#16964)
2c2fe2ab7 by root, fix: Ignore string column statistics for parquet-mr versions before 1.8.2 (#16744)
7faf27a86 by Chengcheng Jin, feat(cudf): Add the log to show detailed fallback messgae (#16900)
e603315e5 by Chang chen, feat(parquet): Add type widening support for INT and Decimal types with configurable narrowing (#16611)
1e1674dd8 by Rajeev Singh, docs: Add blog post for Adaptive per-function CPU tracking (#16945)
0c6b89d61 by Masha Basmanova, fix(build): Guard fuzzer examples subdirectory with VELOX_BUILD_TESTING (#16992)
8d6355d8d by Pratik Pugalia, build: Improve build impact comment layout (#16971)
44d561990 by Masha Basmanova, refactor: Add ConnectorRegistry class with tryGet and unregisterAll (#16977)
793f13f16 by Rajeev Singh, feat(expr-eval):Adaptive per-function CPU sampling for Velox expression evaluation (#16646)
1a4dc7a5a by Pratik Pugalia, fix: Off-by-one boundary bug in make_timestamp validation (#16944)
7f2c75c26 by Pratik Pugalia, Fix incorrect substr length in Tokenizer::matchUnquotedSubscript (#16972)
22b90045e by Masha Basmanova, docs: Add truncate markers to blog posts for cleaner listing page (#16975)

Signed-off-by: glutenperfbot <glutenperfbot@glutenproject-internal.com>
baibaichen and others added 5 commits April 1, 2026 16:26
…olumns

When Gluten creates HiveTableHandle, it was passing all columns (including
partition columns) as dataColumns. This caused Velox's convertType() to
validate partition column types against the Parquet file's physical types,
failing when they differ (e.g., LongType in file vs IntegerType from
partition inference).

Fix: build dataColumns excluding partition columns (ColumnType::kPartitionKey).
Partition column values come from the partition path, not from the file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With OAP INT narrowing commit replaced by upstream Velox PR #15173:
- Remove 2 excludes now passing: LongType->IntegerType, LongType->DateType
- Add 2 excludes for new failures: IntegerType->ShortType (OAP removed)

Exclude 63 (net unchanged: -2 +2). Test results: 21 pass / 63 ignored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This suite tests the READ path only. Disable native writer so Spark's
writer produces correct V2 encodings (DELTA_BINARY_PACKED/DELTA_BYTE_ARRAY).
- Remove 10 excludes for decimal widening tests now passing

Remaining 38 excludes:
- 34: Velox native reader rejects incompatible decimal conversions
  regardless of reader config (no parquet-mr fallback)
- 4: Velox does not support DELTA_BYTE_ARRAY encoding

Test results: 46 pass / 38 ignored.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Velox native reader always behaves like Spark's vectorized reader,
so tests that rely on parquet-mr behavior (vectorized=false) fail.
Instead of just excluding these 33 tests, add testGluten overrides
with expectError=true to verify Velox correctly rejects incompatible
conversions.

- 16 unsupported INT->Decimal conversions
- 6 decimal precision narrowing cases
- 11 decimal precision+scale narrowing/mixed cases

VeloxTestSettings: 38 excludes (parent tests) + 33 testGluten overrides
Test results: 79 pass / 38 ignored (33 excluded parent + 5 truly excluded)
Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions github-actions bot added the CORE works for Gluten Core label Apr 1, 2026
Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Run Gluten Clickhouse CI on x86

zhouyuan added 2 commits April 1, 2026 21:23
Signed-off-by: Yuan <yuanzhou@apache.org>
Signed-off-by: Yuan <yuanzhou@apache.org>
@zhouyuan zhouyuan force-pushed the tagging-2026_04_01 branch from f02193f to 82d1a35 Compare April 1, 2026 21:16
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

Run Gluten Clickhouse CI on x86

"/test-data/parquet-thrift-compat.snappy.parquet"

testGluten("Read Parquet file generated by parquet-thrift") {
// TODO: https://github.com/apache/gluten/issues/11865
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baibaichen seems due to missing fix from one old OAP patch: https://github.com/IBM/velox/pull/35/changes

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyuan
Copy link
Copy Markdown
Member

zhouyuan commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

the testing data on clickhouse side is not upated, so revert to use the old query

Signed-off-by: Yuan <yuanzhou@apache.org>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

protected val tablesPath: String = UTSystemParameters.tpcdsDecimalDataPath + "/"
protected val db_name: String = "tpcdsdb"
// TODO: fix to use the new DS queries https://github.com/apache/gluten/issues/11871
protected val tpcdsQueries: String =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhouyuan
Copy link
Copy Markdown
Member

zhouyuan commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

@zhouyuan
Copy link
Copy Markdown
Member

zhouyuan commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants