feat(blob): Create blobs in Spark SQL by the-other-tim-brown · Pull Request #18347 · apache/hudi

the-other-tim-brown · 2026-03-19T01:36:20Z

Describe the issue this Pull Request addresses

Summary and Changelog

Adds support for creating a Blob field in Spark SQL.

Impact

Allows users to leverage the blob field when using Spark SQL to define their tables

Risk Level

low

Documentation Update

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

codecov-commenter · 2026-03-19T08:59:03Z

Codecov Report

❌ Patch coverage is 82.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.67%. Comparing base (a179555) to head (49d09f8).
⚠️ Report is 6 commits behind head on master.

Files with missing lines	Patch %	Lines
...l/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala	85.71%	0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark3_4ExtendedSqlAstBuilder.scala	85.71%	0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala	85.71%	0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark4_0ExtendedSqlAstBuilder.scala	85.71%	0 Missing and 2 partials ⚠️
...k/sql/parser/HoodieSpark3_3ExtendedSqlParser.scala	50.00%	0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark3_4ExtendedSqlParser.scala	50.00%	0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark3_5ExtendedSqlParser.scala	50.00%	0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark4_0ExtendedSqlParser.scala	50.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18347      +/-   ##
============================================
- Coverage     69.26%   68.67%   -0.59%     
- Complexity    27117    27585     +468     
============================================
  Files          2391     2423      +32     
  Lines        129572   132473    +2901     
  Branches      15366    15974     +608     
============================================
+ Hits          89746    90974    +1228     
- Misses        32969    34290    +1321     
- Partials       6857     7209     +352

Flag	Coverage Δ
common-and-other-modules	`44.35% <13.63%> (-0.03%)`	⬇️
hadoop-mr-java-client	`45.17% <0.00%> (+0.01%)`	⬆️
spark-client-hadoop-common	`48.33% <0.00%> (-0.01%)`	⬇️
spark-java-tests	`48.78% <2.85%> (+1.30%)`	⬆️
spark-scala-tests	`45.38% <81.42%> (-0.15%)`	⬇️
utilities	`38.61% <0.00%> (-0.11%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...va/org/apache/hudi/common/schema/HoodieSchema.java	`81.39% <100.00%> (+0.02%)`	⬆️
...quet/avro/AvroSchemaConverterWithTimestampNTZ.java	`75.57% <ø> (ø)`
...in/scala/org/apache/spark/sql/types/BlobType.scala	`100.00% <100.00%> (ø)`
...k/sql/parser/HoodieSpark3_3ExtendedSqlParser.scala	`67.64% <50.00%> (+0.48%)`	⬆️
...k/sql/parser/HoodieSpark3_4ExtendedSqlParser.scala	`67.64% <50.00%> (+0.48%)`	⬆️
...k/sql/parser/HoodieSpark3_5ExtendedSqlParser.scala	`67.64% <50.00%> (+0.48%)`	⬆️
...k/sql/parser/HoodieSpark4_0ExtendedSqlParser.scala	`60.00% <50.00%> (ø)`
...l/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala	`19.11% <85.71%> (+5.63%)`	⬆️
...l/parser/HoodieSpark3_4ExtendedSqlAstBuilder.scala	`18.84% <85.71%> (+5.56%)`	⬆️
...l/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala	`19.47% <85.71%> (+5.94%)`	⬆️
... and 1 more

... and 80 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rahil-c · 2026-03-19T16:52:31Z

@yihua @voonhous @balaji-varadarajan-ai if you can take a look as well, ill try doing an initial pass.

rahil-c · 2026-03-20T12:34:40Z

...-spark3.3.x/src/main/scala/org/apache/spark/sql/parser/HoodieSpark3_3ExtendedSqlParser.scala

      normalized.contains("show indexes") ||
-      normalized.contains("refresh index")
+      normalized.contains("refresh index") ||
+      normalized.contains(" blob")


same concern as expressed here #18098 (comment) (although understand now this is not related to the read_blob).

However trying to understand if this would match for things not related to creating a blob col, for example:

-- A table with a column named blob_path — NOT a BLOB type column CREATE TABLE t (id BIGINT, blob_path STRING)

If this is not a real concern let me know just wanted to bring it up.

rahil-c · 2026-03-20T12:39:23Z

...-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/types/BlobType.scala

+   * @return StructType with blob structure
+   */
+  def apply(): DataType = {
+    HoodieSparkSchemaConverters.toSqlType(HoodieSchema.createBlob())._1


[nit] During sql parsing visitPrimitiveDataType would always be invoked and call this later. I think we might want to cache the call at L40, maybe something like this?

object BlobType { val dataType: DataType = HoodieSparkSchemaConverters.toSqlType(HoodieSchema.createBlob())._1 def apply(): DataType = dataType }

rahil-c · 2026-03-20T12:55:57Z

Outside this comment:#18347 (comment), pr looks good to me!
@voonhous @yihua @balaji-varadarajan-ai to get a committer to take one pass.

hudi-bot · 2026-03-20T15:47:54Z

CI report:

09b71f3 Azure: FAILURE
49d09f8 Azure: PENDING

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

the-other-tim-brown added 3 commits March 18, 2026 21:35

add in code and test for handling blob type in SQL table creation

09b71f3

remove spacing changes

117448d

add basic testing, cleanup constants in HoodieSchema

49d09f8

the-other-tim-brown mentioned this pull request Mar 19, 2026

feat(blob): Create and Read Blobs in Spark SQL #18098

Open

3 tasks

rahil-c requested review from rahil-c, voonhous and yihua March 19, 2026 01:59

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Mar 19, 2026

rahil-c reviewed Mar 20, 2026

View reviewed changes

rahil-c requested a review from bvaradar March 20, 2026 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(blob): Create blobs in Spark SQL#18347

feat(blob): Create blobs in Spark SQL#18347
the-other-tim-brown wants to merge 3 commits intoapache:masterfrom
the-other-tim-brown:spark-blob-type

the-other-tim-brown commented Mar 19, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Mar 19, 2026

Uh oh!

rahil-c commented Mar 19, 2026

Uh oh!

rahil-c Mar 20, 2026

Uh oh!

rahil-c Mar 20, 2026

Uh oh!

rahil-c commented Mar 20, 2026

Uh oh!

hudi-bot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

the-other-tim-brown commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

codecov-commenter commented Mar 19, 2026

Codecov Report

Uh oh!

rahil-c commented Mar 19, 2026

Uh oh!

rahil-c Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

rahil-c Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

rahil-c commented Mar 20, 2026

Uh oh!

hudi-bot commented Mar 20, 2026

CI report:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

the-other-tim-brown commented Mar 19, 2026 •

edited

Loading