Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28337: TestMetaStoreUtils fails for invalid timestamps #5309

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

KiranVelumuri
Copy link
Contributor

What changes were proposed in this pull request?

Timestamp represented as LocalDateTime is changed to ZonedDateTime to take time-zone into account while creating the timestamp to avoid creating invalid timestamps.

Why are the changes needed?

https://issues.apache.org/jira/browse/HIVE-28337

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

mvn test -Dtest=TestMetaStoreUtils

Before the patch:
with Random(seed: 23) in line 65(default case)
image

with Random(seed: 24) in line 65
image

After the patch:
image

@wecharyu
Copy link
Contributor

Nice catch~ This failed test was also found in #5004 and we can revert kgyrtkirk/hive-dev-box#16 once this patch is merged.
@zabetak could you also take a look?

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of the test is to ensure that conversions from/to string do not alter the value no matter the timezone because the DATE/TIMESTAMP data types should be timezone agnostic.

Changing LocalDateTime to ZonedDateTime defeats the purpose of the tests and actually hides a bug that affects the MetaStoreUtils APIs.

I left a similar comment under the JIRA ticket.

@KiranVelumuri
Copy link
Contributor Author

The 2 test cases mentioned in https://issues.apache.org/jira/browse/HIVE-28337 are failing for me locally in my machine with Java version 1.8.0_331. I tried running these test cases in #5004, but the DST related one was successful. I was investigating as to why this was happening, and now I get it seeing the previous comments that this is occurring due to JDK bug.

@KiranVelumuri
Copy link
Contributor Author

The intention of the test is to ensure that conversions from/to string do not alter the value no matter the timezone because the DATE/TIMESTAMP data types should be timezone agnostic.

Changing LocalDateTime to ZonedDateTime defeats the purpose of the tests and actually hides a bug that affects the MetaStoreUtils APIs.

I left a similar comment under the JIRA ticket.

Yes, I agree with you regarding the intention and purpose of the test. I am looking into the issue and would re-work this PR taking the MetaStoreUtils APIs into account.

@KiranVelumuri
Copy link
Contributor Author

I have updated the tests and the MetaStoreUtils APIs to always have Timestamp created in UTC timezone, since the timestamps during the daylight savings transition could not be represented in the local time zone.

However, there are 2 known issues for the timestamps mentioned below but it is the expected behaviour from Java's Timestamp:

  1. Timestamps with year 0000 would be converted to year 0001.
  2. Timestamps in the range [1582-10-05 00:00:00, 1582-10-14 23:59:59] (both inclusive) would have 10 days added due to the shift from Julian calendar to Gregorian calendar. (The next day after 1582-10-04 is 1582-10-15)

References:
[1] https://www.ibm.com/docs/en/db2/11.5?topic=dttmddtija-date-time-timestamp-values-that-can-cause-problems-in-jdbc-sqlj-applications#imjcc_r0053436__title__5
[2] https://strasbourg.eleusal.com/en/what-happened-between-5-and-14-october-1582/

Copy link

sonarcloud bot commented Jul 1, 2024

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that due to bugs the tests may fail in certain environments.

If the main motivation of this PR is to avoid the failures when this happens it seems more appropriate to add conditions for detecting skipping the specific test with appropriate bug reference.

If the intention is to fix the problems revealed by the tests then more work is needed since the proposed changes cannot land as they are.

@@ -123,7 +123,13 @@ public static String normalizeDate(String date) {
* @return Timestamp in string format.
*/
public static String convertTimestampToString(Timestamp timestamp) {
return TIMESTAMP_FORMATTER.format(timestamp.toLocalDateTime());
TimeZone defaultTimeZone = TimeZone.getDefault();
TimeZone.setDefault(TimeZone.getTimeZone("UTC"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeZone.setDefault changes the zone for the whole application. In Hive, many things are happening in parallel so this is not something that should be done in utility APIs. It can unpredictable side effects and hard to diagnose race conditions.

}

@Parameterized.Parameters(name = "zoneId={0}, timestamp={1}")
public static Collection<Object[]> generateZoneTimestampPairs() {
List<Object[]> params = new ArrayList<>();
long minDate = LocalDate.of(0, 1, 1).atStartOfDay().toEpochSecond(ZoneOffset.UTC);
long minDate = LocalDate.of(1, 1, 1).atStartOfDay().toEpochSecond(ZoneOffset.UTC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the data type specification of DATE 0000-­01-­01 is a valid and supported date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants