-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Limited Read Support for Deletion Vectors on Databricks 14.3 [databricks] #12238
base: branch-25.04
Are you sure you want to change the base?
Add Limited Read Support for Deletion Vectors on Databricks 14.3 [databricks] #12238
Conversation
483b309
to
599498e
Compare
Signed-off-by: Raza Jafri <[email protected]>
599498e
to
221abdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look at the code in depth. Just did a quick once over.
* limitations under the License. | ||
*/ | ||
|
||
package com.databricks.sql.transaction.tahoe.rapids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the plan to add this back in later? Or is this really gone gone?
@delta_lake | ||
@ignore_order | ||
@pytest.mark.skipif(not supports_delta_lake_deletion_vectors(), \ | ||
reason="Deletion vectors new in Delta Lake 2.4 / Apache Spark 3.4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not 100% accurate.
https://docs.delta.io/latest/delta-deletion-vectors.html
Do we want to distinguish between the different operators being supported? I am fine with 2.4 being the base for support because scan with nothing that can write them is not really a supported feature we can test.
This PR adds read-support for deletion vectors on Databricks 14.3 when the
spark.rapids.sql.format.parquet.reader.type
is set toPERFILE
.The
spark.rapids.sql.format.parquet.reader.type
defaults toPERFILE
on Databricks 14.3 if the user doesn't set a preferred value.