Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27847: Casting NUMERIC types to TIMESTAMP is prohibited #4851

Closed
wants to merge 3 commits into from

Conversation

basapuram-kumar
Copy link

What changes were proposed in this pull request?

As per this jira HIVE-27847 , this fix will prevent query failures for Casting NUMERIC types to TIMESTAMP.

Why are the changes needed?

To avoid query failures on master branch, 4.0.0-alpha-1 & 2 branches.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Reproducible steps are added in the jira HIVE-27847, after applying this fix, the failed query gets succeded.

Copy link

@aturoczy aturoczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please create a test around it? The ticket is well described what it is about. But would be great to have test(s) to check what are the new results. Depend of the input

Copy link

sonarcloud bot commented Nov 11, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@okumin
Copy link
Contributor

okumin commented May 14, 2024

@basapuram-kumar @deniskuzZ Hi, we are also hitting this problem, and I found the issue happens when both of the following conditions are satisfied.

  • GenericUDFTimestamp is evaluated at runtime where no hive-site.xml is located, which typically means TezChild
  • It is vectorized. VectorSelectOperator is unlikely to have the issue

I guess the current implementation partially works but it could not be perfect as intToTimestampInSeconds can be still wrongly configured. Could you please try to fix the issue in the correct way? We can take it over if you prefer it.

This is the minimal reproduction in my environment.

CREATE TABLE test AS SELECT 0 AS ts;

set hive.fetch.task.conversion=none;
set hive.strict.timestamp.conversion=false;
set hive.vectorized.execution.enabled=false;

SELECT CAST(ts AS TIMESTAMP) FROM test;
Caused by: java.lang.RuntimeException: Map operator initialization failed
	at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:351)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:292)
	... 16 more
Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting NUMERIC types to TIMESTAMP is prohibited (hive.strict.timestamp.conversion)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDFTimestamp.initialize(GenericUDFTimestamp.java:91)
	at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
	at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:184)
	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1073)
	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1099)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:74)
	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:360)

@okumin
Copy link
Contributor

okumin commented May 15, 2024

Also, GenericUDFTimestamp.java could not be the only file that accesses runtime configurations. We should review all UDFs accessing HiveConf.

intToTimestampInSeconds = SessionState.get() != null ? SessionState.get().getConf()
.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new HiveConf()
.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);

if (strict) {
SessionState ss = SessionState.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use more convenient names like sessionState here.

@@ -0,0 +1,12 @@
set hive.strict.timestamp.conversion=false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not see the q.out file for this test in this commit.

Copy link
Contributor

@zratkai zratkai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my comments!

if (strict) {
SessionState ss = SessionState.get();

if (ss != null && ss.getConf().getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root cause is SessionState is not available at runtime. So, we need to inject the proper config from anywhere else.
For example, we implement GenericUDF#configure for some UDFs in case that SessionState is unavailable.

@Override
public void configure(MapredContext context) {
if (context != null) {
formatter = InstantFormatter.ofConfiguration(context.getJobConf());
String timeZoneStr = HiveConf.getVar(context.getJobConf(), HiveConf.ConfVars.HIVE_LOCAL_TIME_ZONE);
timeZone = TimestampTZUtil.parseTimeZone(timeZoneStr);
}
}

@Override
public void configure(MapredContext context) {
if (context != null) {
if(HiveConf.getBoolVar(context.getJobConf(), HiveConf.ConfVars.HIVE_USE_GOOGLE_REGEX_ENGINE)){
this.useGoogleRegexEngine=true;
}
}


@Override
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
checkArgsSize(arguments, 1, 1);
checkArgPrimitive(arguments, 0);
checkArgGroups(arguments, 0, tsInputTypes, STRING_GROUP, DATE_GROUP, NUMERIC_GROUP, VOID_GROUP, BOOLEAN_GROUP);

strict = SessionState.get() != null ? SessionState.get().getConf()
.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION) : new HiveConf()
.getBoolVar(ConfVars.HIVE_STRICT_TIMESTAMP_CONVERSION);
intToTimestampInSeconds = SessionState.get() != null ? SessionState.get().getConf()
.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS) : new HiveConf()
.getBoolVar(ConfVars.HIVE_INT_TIMESTAMP_CONVERSION_IN_SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this also needs to be correctly configured.

@dengzhhu653
Copy link
Member

Hello @basapuram-kumar, thank you for the contribution, could you please address the suggested comments if have cycles? Thank you in advance!

@dengzhhu653
Copy link
Member

Would anyone be willing to take over this PR since we have no heard from basapuram-kumar? cc @okumin @zratkai

@okumin
Copy link
Contributor

okumin commented Jun 3, 2024

So, let me take a glance.

@okumin
Copy link
Contributor

okumin commented Jun 5, 2024

@dengzhhu653 I created a new PR because I have no permission to directly update @basapuram-kumar 's branch.
#5278

@dengzhhu653 dengzhhu653 closed this Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants