Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-nullable under nullable struct write #11781

Merged
merged 5 commits into from
Dec 17, 2024

Conversation

revans2
Copy link
Collaborator

@revans2 revans2 commented Nov 26, 2024

This fixes #11762

But we need to be careful and coordinate this with #11763 because I think we are going to disable boolean writes for all of ORC.

Signed-off-by: Robert (Bobby) Evans <[email protected]>
@revans2
Copy link
Collaborator Author

revans2 commented Nov 26, 2024

build

jihoonson
jihoonson previously approved these changes Nov 26, 2024
Copy link
Collaborator

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@revans2
Copy link
Collaborator Author

revans2 commented Dec 2, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Dec 2, 2024

@jlowe please take another look

jlowe
jlowe previously approved these changes Dec 2, 2024
@jlowe
Copy link
Member

jlowe commented Dec 11, 2024

build

@jlowe
Copy link
Member

jlowe commented Dec 12, 2024

Test failures in CI

[2024-12-11T22:26:54.226Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip_nullable_struct[native-Boolean(not_null)][DATAGEN_SEED=1733949546, TZ=UTC] - pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.command.DataWritingCommandExec
[2024-12-11T22:26:54.226Z] Execute InsertIntoHadoopFsRelationCommand file:/tmp/pyspark_tests/premerge-ci-2-jenkins-rapids-premerge-github-10369-qjm3n-vdg90-gw1-2497-46613182/ORC_DATA/GPU, false, ORC, [path=/tmp/pyspark_tests//premerge-ci-2-jenkins-rapids-premerge-github-10369-qjm3n-vdg90-gw1-2497-46613182//ORC_DATA/GPU], ErrorIfExists, [a]
[2024-12-11T22:26:54.226Z] +- Scan ExistingRDD[a#161900]
[2024-12-11T22:26:54.226Z] FAILED ../../src/main/python/orc_write_test.py::test_write_round_trip_nullable_struct[hive-Boolean(not_null)][DATAGEN_SEED=1733949546, TZ=UTC] - pyspark.sql.utils.IllegalArgumentException: Part of the plan is not columnar class org.apache.spark.sql.execution.command.DataWritingCommandExec
[2024-12-11T22:26:54.226Z] Execute InsertIntoHadoopFsRelationCommand file:/tmp/pyspark_tests/premerge-ci-2-jenkins-rapids-premerge-github-10369-qjm3n-vdg90-gw1-2497-86215200/ORC_DATA/GPU, false, ORC, [path=/tmp/pyspark_tests//premerge-ci-2-jenkins-rapids-premerge-github-10369-qjm3n-vdg90-gw1-2497-86215200//ORC_DATA/GPU], ErrorIfExists, [a]
[2024-12-11T22:26:54.226Z] +- Scan ExistingRDD[a#162002]

I suspect this is because booleans are in the datagen and we recently changed to fallback on ORC writes of booleans.

@revans2
Copy link
Collaborator Author

revans2 commented Dec 13, 2024

build

@revans2
Copy link
Collaborator Author

revans2 commented Dec 13, 2024

@jlowe please take another look

@revans2
Copy link
Collaborator Author

revans2 commented Dec 13, 2024

build

@revans2 revans2 merged commit 9465328 into NVIDIA:branch-25.02 Dec 17, 2024
50 checks passed
@revans2 revans2 deleted the nullable_structs_writes branch December 17, 2024 17:54
@sameerz sameerz added the bug Something isn't working label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Non-nullable bools in a nullable struct fails
4 participants