Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27995: Fix inconsistent behavior of LOAD DATA command for partitoned and non-partitioned tables #5000

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

shivjha30
Copy link
Contributor

@shivjha30 shivjha30 commented Jan 11, 2024

What changes were proposed in this pull request?

Earlier, the code flows for partitioned and non-partitioned tables were different. The partitioned tables skipped constraints checks before submitting the job for execution. This Pull request ensures that both, the partitioned and non-partitioned tables go through the constraints validations in applyConstraintsAndGetFiles function.

Why are the changes needed?

For partitioned tables, while executing LOAD DATA/ LOAD DATA LOCAL commands, the check for file existence is not executed on HiveServer2, and this in turn throws java.io.FileNotFoundException during Runtime once the job is launched.
This PR prevents such cases at compile time.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

The test cases already exist. The error messages prompted back to the user are now consistent if the file is not found at HiveServer2

Load Data Error (Non Partitioned Tables)
Load Data Error

File Not Found Exception (Partitioned Tables)
Load

Fixed: For partitioned tables
Load (1)

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add a test, there are NegativeCLIDrivers which can be used.

There are bunch of return statements below the current line & the place where it is there right now, So, if that returned was called earlier, this applyConstraintsAndGetFiles(fromURI, ts.tableHandle); must not have been called, but now you would be calling that, right?. So, that is some additional cost just for error message?

@shivjha30
Copy link
Contributor Author

@ayushtkn Thanks for the review, i have added the test case as mentioned above.please review once

@shivjha30
Copy link
Contributor Author

There is a test failure in Jenkins, i ran the test in in local and that test is passing which can be viewed in the below screenshot:
image

Copy link

sonarcloud bot commented May 3, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link

github-actions bot commented Jul 4, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@github-actions github-actions bot added the stale label Jul 4, 2024
@chinnaraolalam
Copy link
Contributor

Overall changes looks good to me, But locally when i run newly added test case it falling

[ERROR] TestNegativeLlapCliDriver.testCliDriver:59 Client Execution succeeded but contained differences (error code = 1) after executing load_data_partition.q 15,21c15 < FAILED: SemanticException Line 2:17 Invalid path ''hdfs://### HDFS PATH ###'': No files matching path hdfs://### HDFS PATH ### < FAILED: AssertionError java.lang.AssertionError: Client Execution succeeded but contained differences (error code = 1) after executing load_data_partition.q < 15c15 < < FAILED: SemanticException Line 2:17 Invalid path ''hdfs://### HDFS PATH ###'': No files matching path hdfs://### HDFS PATH ###

@github-actions github-actions bot removed the stale label Jul 12, 2024
Copy link

sonarcloud bot commented Jul 12, 2024

POSTHOOK: type: CREATETABLE
POSTHOOK: Output: database:default
POSTHOOK: Output: default@validate_load_data
#### A masked pattern was here ####
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here there is no error, I think it is failing before the expected scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants