Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Metadata compaction periodically fails/hangs #12261

Open
liiang-huang opened this issue Nov 15, 2024 · 3 comments
Open

[SUPPORT] Metadata compaction periodically fails/hangs #12261

liiang-huang opened this issue Nov 15, 2024 · 3 comments
Labels
metadata metadata table priority:critical production down; pipelines stalled; Need help asap.

Comments

@liiang-huang
Copy link

liiang-huang commented Nov 15, 2024

Describe the problem you faced

Hi Hudi community, I have a glue job that is ingesting data to a Hudi MOR table. However, this job periodically failed in the below stage
image
image
image

Could you help to investigate this issue? I have went through this issue, but doesn't seem like the same issue. When I deleted the requested/inflight deltacommit, also tried to increase resources, the errors still persisted. Thanks!

Environment Description

  • Hudi version : 0.13.1

  • Spark version : 3.1

  • Storage (HDFS/S3/GCS..) : S3

Additional context

Add any other context about the problem here.

Stacktrace

Exception in User Class: jp.ne.paypay.daas.data.exceptions.JobFatalError : Streaming batch load failed with error: Could not compact s3://pay2-datalake-prod-standard/datasets/bronze/payment-accounting-db1-20241010-aurora-prod/payment_accounting/sub_payments_accounting-1761348391


Job aborted due to stage failure: Task 169 in stage 87.0 failed 4 times, most recent failure: Lost task 169.3 in stage 87.0 (TID 21675) (10.12.56.40 executor 13): ExecutorLostFailure (executor 13 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 508519 ms
--



@liiang-huang liiang-huang changed the title [SUPPORT] Metadata compaction periodically failure/hang [SUPPORT] Metadata compaction periodically fails/hangs Nov 15, 2024
@ad1happy2go
Copy link
Collaborator

@liiang-huang Can you collect more stats from metadata table? I see executors getting lost.
You can open spark UI and executors page and see the reason for the executor loss.
How many files you see under .metadata directory? is colstats or RLI enabled. Please share the hudi configs.

@ad1happy2go ad1happy2go added metadata metadata table priority:critical production down; pipelines stalled; Need help asap. labels Nov 15, 2024
@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Nov 15, 2024
@liiang-huang
Copy link
Author

liiang-huang commented Nov 18, 2024

@ad1happy2go Yes, the reason is

Executor heartbeat timed out after 636587 ms

There are 229 objects in .hoodie/metadata/.hoodie folder, there is a column_stats in metadata folder. Let me know what should I look for further!

@rangareddy
Copy link

Hi @liiang-huang

Could you please share hudi writer configuration and spark configuration? It is possible to provide timeline to check our end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
metadata metadata table priority:critical production down; pipelines stalled; Need help asap.
Projects
Status: Awaiting Triage
Development

No branches or pull requests

3 participants