You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider the following case if the heartbeatIntervalInMs = 60 * 1000 and numTolerableHeartbeatMisses = 10, so maxAllowableHeartbeatIntervalInMs = 600 * 1000
00:00,write application start
00:01, 1st heartbeat send success
00:02, The hdfs network is abnormal or other network causes, send heartbeat failed
00:03-00:10, send heartbeat failed everytimes
00:11, heartbeat is expired because currentTime[00:11] - lastHeartbeatTime[00:01] >= maxAllowableHeartbeatIntervalInMs, according to the code logic, lastHeartbeatTime will be never updated
10:00, write application has been running for 10h to execute all the logic
10:00, write application start to commit by BaseHoodieWriteClient::commitStats, but it find that heartbeat has been expired, so fail the application by throwing exception
So we spent 10 hours running an app that we knew at 00:11 was not going to be successful.
Should we support fail-fast to save some unnecessary resource consumption?
Consider the following case if the
heartbeatIntervalInMs = 60 * 1000
andnumTolerableHeartbeatMisses = 10
, somaxAllowableHeartbeatIntervalInMs = 600 * 1000
currentTime[00:11] - lastHeartbeatTime[00:01] >= maxAllowableHeartbeatIntervalInMs
, according to the code logic,lastHeartbeatTime
will be never updatedBaseHoodieWriteClient::commitStats
, but it find that heartbeat has been expired, so fail the application by throwing exceptionSo we spent 10 hours running an app that we knew at 00:11 was not going to be successful.
Should we support fail-fast to save some unnecessary resource consumption?
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
The text was updated successfully, but these errors were encountered: