Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28190: Fix heartbeatLockMaterializationRebuild #5186

Conversation

InvisibleProgrammer
Copy link
Contributor

@InvisibleProgrammer InvisibleProgrammer commented Apr 9, 2024

There is a bug in heartbeatLockMaterializationRebuild that is trivial to fix:
https://github.com/apache/hive/blob/5e78ce0e1f1577c4b68f9149430c1d6ad26fd91d/stand[…]/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java

        "UPDATE \"MATERIALIZATION_REBUILD_LOCKS\"" +
            " SET \"MRL_LAST_HEARTBEAT\" = " + Instant.now().toEpochMilli() +
            " WHERE \"MRL_TXN_ID\" = " + txnId +
            " AND \"MRL_DB_NAME\" = ?" +
            " AND \"MRL_TBL_NAME\" = ?",
        new MapSqlParameterSource()
            .addValue("now", Instant.now().toEpochMilli())
            .addValue("txnId", txnId)
            .addValue("dbName", dbName)
            .addValue("tableNane", tableName),

As you can see, the paremterized query expects 2 parameter and it gets 4. It cannot possibly run at all.

The reason why it haven't discovered yet is it is a heartbeating function that runs in a background - so it can be only spotted by checking the logs:

org.springframework.dao.InvalidDataAccessApiUsageException: SQL [UPDATE "MATERIALIZATION_REBUILD_LOCKS" SET "MRL_LAST_HEARTBEAT" = 1712571919559 WHERE "MRL_TXN_ID" = 2297 AND "MRL_DB_NAME" = ? AND "MRL_TBL_NAME" = ?]: given 2 parameters but expected 0 

The fix fixes an other issue as well, in https://github.com/apache/hive/blob/881d38fb4d7af3ed4fa55b71598b5057080f8829/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/jdbc/functions/HeartbeatTxnRangeFunction.java#L85

    if (updateCnt == numTxnsToHeartbeat) {
      //fast pass worked, i.e. all txns we were asked to heartbeat were Open as expected
      context.rollbackToSavepoint(savePoint);

It says if we were able to update all the transactions in the given range successfully, just kindly roll it back instead of committing the changes.

What changes were proposed in this pull request?

Fix those two issues

Why are the changes needed?

Because it is wrong

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Firstly, it was checked manually, by modifying an existing qtest materialized_view_create_rewrite_6.q and checking the logs, like this:

set hive.txn.timeout=1;
alter materialized view mat1 rebuild;

It makes the heartbeat times out immediately, right after the transaction starts.

At the end solution, I added the following tests:

  • In TestTxnHandler, added a new test, testGetMaterializationInvalidationInfoWithValidReaderWriteIdListWhen to test out the jdbc parameter issue.
  • Created a new test class, ql/src/test/org/apache/hadoop/hive/ql/materializedview/TestHeartbeatTxnRangeFunction.java to test out the commit issue in HeartbeatTxnRangeFunction

@InvisibleProgrammer
Copy link
Contributor Author

@kasakrisz , what do you think, what should be the expected behaviour when the heartbeat fails:

  • should it cancel the query and Hive give an exception, so that the user will know about the issue (I would prefer that way)
  • or we can still ignore it. In that case, only a pure luck can help spotting that kind of issues.

Copy link
Contributor

@kasakrisz kasakrisz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kasakrisz
Copy link
Contributor

@kasakrisz , what do you think, what should be the expected behaviour when the heartbeat fails:

  • should it cancel the query and Hive give an exception, so that the user will know about the issue (I would prefer that way)
  • or we can still ignore it. In that case, only a pure luck can help spotting that kind of issues.

IMHO It should work like in case of other DML statements (INSERT SELECT, INSERT OVERWRITE SELECT) and it can be a separate jira/pr

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't merge without the tests, see TestTxnHandler#testHeartbeatLock

@InvisibleProgrammer InvisibleProgrammer force-pushed the HIVE-28190_Fix_heartbeatLockMaterializationRebuild branch from 4107ccd to c8a9a0c Compare April 18, 2024 12:19
Copy link

sonarcloud bot commented Apr 19, 2024

Quality Gate Passed Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@deniskuzZ deniskuzZ merged commit 1a969f6 into apache:master Apr 22, 2024
6 checks passed
@InvisibleProgrammer
Copy link
Contributor Author

@deniskuzZ , thank you for your help.

@InvisibleProgrammer InvisibleProgrammer deleted the HIVE-28190_Fix_heartbeatLockMaterializationRebuild branch May 6, 2024 12:02
dengzhhu653 pushed a commit to dengzhhu653/hive that referenced this pull request Jun 13, 2024
…i, reviewed by Attila Turoczy, Denys Kuzmenko, Krisztian Kasa, Zoltan Ratkai)

Closes apache#5186
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants