Skip to content

feat(blob): Create blobs in Spark SQL#18347

Open
the-other-tim-brown wants to merge 3 commits intoapache:masterfrom
the-other-tim-brown:spark-blob-type
Open

feat(blob): Create blobs in Spark SQL#18347
the-other-tim-brown wants to merge 3 commits intoapache:masterfrom
the-other-tim-brown:spark-blob-type

Conversation

@the-other-tim-brown
Copy link
Contributor

@the-other-tim-brown the-other-tim-brown commented Mar 19, 2026

Describe the issue this Pull Request addresses

Summary and Changelog

Adds support for creating a Blob field in Spark SQL.

Impact

Allows users to leverage the blob field when using Spark SQL to define their tables

Risk Level

low

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@rahil-c rahil-c requested review from rahil-c, voonhous and yihua March 19, 2026 01:59
@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Mar 19, 2026
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 82.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.67%. Comparing base (a179555) to head (49d09f8).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
...l/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala 85.71% 0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark3_4ExtendedSqlAstBuilder.scala 85.71% 0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala 85.71% 0 Missing and 2 partials ⚠️
...l/parser/HoodieSpark4_0ExtendedSqlAstBuilder.scala 85.71% 0 Missing and 2 partials ⚠️
...k/sql/parser/HoodieSpark3_3ExtendedSqlParser.scala 50.00% 0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark3_4ExtendedSqlParser.scala 50.00% 0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark3_5ExtendedSqlParser.scala 50.00% 0 Missing and 1 partial ⚠️
...k/sql/parser/HoodieSpark4_0ExtendedSqlParser.scala 50.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18347      +/-   ##
============================================
- Coverage     69.26%   68.67%   -0.59%     
- Complexity    27117    27585     +468     
============================================
  Files          2391     2423      +32     
  Lines        129572   132473    +2901     
  Branches      15366    15974     +608     
============================================
+ Hits          89746    90974    +1228     
- Misses        32969    34290    +1321     
- Partials       6857     7209     +352     
Flag Coverage Δ
common-and-other-modules 44.35% <13.63%> (-0.03%) ⬇️
hadoop-mr-java-client 45.17% <0.00%> (+0.01%) ⬆️
spark-client-hadoop-common 48.33% <0.00%> (-0.01%) ⬇️
spark-java-tests 48.78% <2.85%> (+1.30%) ⬆️
spark-scala-tests 45.38% <81.42%> (-0.15%) ⬇️
utilities 38.61% <0.00%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...va/org/apache/hudi/common/schema/HoodieSchema.java 81.39% <100.00%> (+0.02%) ⬆️
...quet/avro/AvroSchemaConverterWithTimestampNTZ.java 75.57% <ø> (ø)
...in/scala/org/apache/spark/sql/types/BlobType.scala 100.00% <100.00%> (ø)
...k/sql/parser/HoodieSpark3_3ExtendedSqlParser.scala 67.64% <50.00%> (+0.48%) ⬆️
...k/sql/parser/HoodieSpark3_4ExtendedSqlParser.scala 67.64% <50.00%> (+0.48%) ⬆️
...k/sql/parser/HoodieSpark3_5ExtendedSqlParser.scala 67.64% <50.00%> (+0.48%) ⬆️
...k/sql/parser/HoodieSpark4_0ExtendedSqlParser.scala 60.00% <50.00%> (ø)
...l/parser/HoodieSpark3_3ExtendedSqlAstBuilder.scala 19.11% <85.71%> (+5.63%) ⬆️
...l/parser/HoodieSpark3_4ExtendedSqlAstBuilder.scala 18.84% <85.71%> (+5.56%) ⬆️
...l/parser/HoodieSpark3_5ExtendedSqlAstBuilder.scala 19.47% <85.71%> (+5.94%) ⬆️
... and 1 more

... and 80 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rahil-c
Copy link
Collaborator

rahil-c commented Mar 19, 2026

@yihua @voonhous @balaji-varadarajan-ai if you can take a look as well, ill try doing an initial pass.

normalized.contains("show indexes") ||
normalized.contains("refresh index")
normalized.contains("refresh index") ||
normalized.contains(" blob")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same concern as expressed here #18098 (comment) (although understand now this is not related to the read_blob).

However trying to understand if this would match for things not related to creating a blob col, for example:

-- A table with a column named blob_path — NOT a BLOB type column
CREATE TABLE t (id BIGINT, blob_path STRING)

If this is not a real concern let me know just wanted to bring it up.

* @return StructType with blob structure
*/
def apply(): DataType = {
HoodieSparkSchemaConverters.toSqlType(HoodieSchema.createBlob())._1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] During sql parsing visitPrimitiveDataType would always be invoked and call this later. I think we might want to cache the call at L40, maybe something like this?

object BlobType {
  val dataType: DataType = HoodieSparkSchemaConverters.toSqlType(HoodieSchema.createBlob())._1
  def apply(): DataType = dataType
}

@rahil-c
Copy link
Collaborator

rahil-c commented Mar 20, 2026

Outside this comment:#18347 (comment), pr looks good to me!
@voonhous @yihua @balaji-varadarajan-ai to get a committer to take one pass.

@rahil-c rahil-c requested a review from bvaradar March 20, 2026 14:06
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants