Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Hudi CloudWatchReporter error after 0.15.0 upgrade #12182

Open
kirillklimenko opened this issue Oct 30, 2024 · 3 comments
Open

[SUPPORT] Hudi CloudWatchReporter error after 0.15.0 upgrade #12182

kirillklimenko opened this issue Oct 30, 2024 · 3 comments

Comments

@kirillklimenko
Copy link

Description

After upgrading our Amazon EMR cluster from version 7.2.0 to 7.3.0 (release notes), and Apache Hudi from 0.14.1 to 0.15.0 within the EMR bundle (release notes), we noticed that Hudi metrics stopped reporting to CloudWatch. The error observed is as follows:

ERROR ScheduledReporter: Exception thrown from CloudWatchReporter#report. Exception was suppressed.
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.stageMetricDatum(CloudWatchReporter.java:281) ~[hudi-aws-bundle-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.lambda$processGauge$2(CloudWatchReporter.java:250) ~[hudi-aws-bundle-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.processGauge(CloudWatchReporter.java:250) ~[hudi-aws-bundle-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:189) ~[hudi-aws-bundle-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237) ~[hudi-spark3-bundle_2.12-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at org.apache.hudi.com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177) ~[hudi-spark3-bundle_2.12-0.15.0-amzn-0.jar:0.15.0-amzn-0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

Environment Description

  • Amazon EMR Version: 7.3.0
  • Hudi Version: 0.15.0
  • Spark Version: 3.5.1
  • Previous EMR Version (working): 7.2.0 with Hudi 0.14.1
  • AWS CloudWatch Setup: Default settings for Hudi metrics reporting

Steps to Reproduce

  1. Upgrade an Amazon EMR cluster from version 7.2.0 to 7.3.0.
  2. Enable Hudi metrics reporting to CloudWatch for the MOR table.
{
    "hoodie.metrics.on": True,
    "hoodie.metrics.reporter.type": "CLOUDWATCH",
    "hoodie.metrics.cloudwatch.namespace": "ULH",
}
  1. Write to the MOR table and monitor Hudi CloudWatchReporter reporting errors.

Observed Behavior

After the upgrade, Hudi stopped sending metrics to CloudWatch. The ArrayIndexOutOfBoundsException exception is thrown in the CloudWatchReporter.stageMetricDatum function during each reporting interval.

Expected Behavior

Metrics should be reported to CloudWatch without errors.

Additional Context

This error suggests an issue with how the CloudWatchReporter processes or formats metrics data for CloudWatch, potentially related to an array handling bug in the stageMetricDatum method. This issue only appeared after upgrading to Hudi 0.15.0, included in EMR 7.3.0.

Could you confirm if this is a known issue or if there is a workaround? Any insight or suggested fixes would be appreciated.

Full Hudi config for insert:

_TABLE_OPTIONS = {
    "hoodie.database.name": "ulh",
    "hoodie.table.name": "bronze",
    "hoodie.index.type": "BUCKET",
    "hoodie.index.bucket.engine": "CONSISTENT_HASHING",
    "hoodie.bucket.index.num.buckets": 32,
    "hoodie.enable.data.skipping": True,
    "hoodie.datasource.query.type": "read_optimized",
}

_WRITE_OPTIONS = {
    "hoodie.datasource.write.table.type": "MERGE_ON_READ",
    "hoodie.datasource.write.recordkey.field": "sha512",
    "hoodie.datasource.write.partitionpath.field": "year,month,day,data_origin",
    "hoodie.datasource.write.precombine.field": "updated_dt",
    "hoodie.datasource.write.hive_style_partitioning": True,
    "hoodie.datasource.write.keygenerator.class": "org.apache.hudi.keygen.ComplexKeyGenerator",
}

_METADATA_OPTIONS = {
    "hoodie.metadata.enable": True,
    "hoodie.parquet.compression.codec": "zstd",
    "hoodie.storage.layout.partitioner.class": "org.apache.hudi.table.action.commit.SparkBucketIndexPartitioner",
}

_METRICS_OPTIONS = {
    "hoodie.metrics.on": True,
    "hoodie.metrics.reporter.type": "CLOUDWATCH",
    "hoodie.metrics.cloudwatch.namespace": "ULH",
}

INSERT_OPTIONS = {
    **_TABLE_OPTIONS,
    **_METADATA_OPTIONS,
    **_METRICS_OPTIONS,
    **_WRITE_OPTIONS,
    "hoodie.datasource.write.operation": "insert",
    "hoodie.datasource.write.payload.class": "org.apache.hudi.common.model.OverwriteWithLatestAvroPayload",
}
@kirillklimenko
Copy link
Author

kirillklimenko commented Nov 19, 2024

UPD: I'm getting the same error after replacing the AWS Hudi JARs on EMR with open-source ones:

24/11/19 10:05:16 ERROR ScheduledReporter: Exception thrown from CloudWatchReporter#report. Exception was suppressed.
java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.stageMetricDatum(CloudWatchReporter.java:281) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.lambda$processGauge$2(CloudWatchReporter.java:250) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at java.util.Optional.ifPresent(Optional.java:178) ~[?:?]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.processGauge(CloudWatchReporter.java:250) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:189) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at org.apache.hudi.com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:237) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at org.apache.hudi.com.codahale.metrics.ScheduledReporter.lambda$start$0(ScheduledReporter.java:177) ~[hudi-aws-bundle-0.15.0.jar:0.15.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) [?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

@hryz
Copy link

hryz commented Dec 4, 2024

I have the same issue and can add more details.
The issue happens here.
stageMetricDatum expects metrics names to contain a dot. A metric table_service_execution_status doesn't have a dot in its name. It causes ArrayIndexOutOfBoundsException because there is no 2nd element in the array.
I guess, the value of this metric is of type Number and it passes the condition here.

@chatnord
Copy link

chatnord commented Dec 5, 2024

Same issue for us :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Awaiting Triage
Development

No branches or pull requests

4 participants