Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4547: Add Tez AM JobID to the JobConf #339

Merged
merged 8 commits into from
Aug 5, 2024

Conversation

VenkatSNarayanan
Copy link
Contributor

Some committers require a job-wide UUID to function correctly. Adding the AM JobID to the JobConf
will allow applications to pass that to
the committers that need it.

Some committers require a job-wide UUID to function
correctly. Adding the AM JobID to the JobConf
will allow applications to pass that to
the committers that need it.
@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@@ -417,6 +418,7 @@ protected List<Event> initializeBase() throws IOException, InterruptedException
.createMockTaskAttemptID(getContext().getApplicationId().getClusterTimestamp(),
getContext().getTaskVertexIndex(), getContext().getApplicationId().getId(),
getContext().getTaskIndex(), getContext().getTaskAttemptNumber(), isMapperOutput);
jobConf.set(MRJobConfig.MR_PARENT_JOB_ID, new JobID(String.valueOf(getContext().getApplicationId().getClusterTimestamp()), getContext().getApplicationId().getId()).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to line above TaskAttemptID taskAttemptId =

Comment on lines 151 to 152
assertNotEquals(parentJobID,invalidJobID);
assertNotEquals(output.jobConf.get(org.apache.hadoop.mapred.JobContext.TASK_ATTEMPT_ID),parentJobID);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix code formatting. space after ,

Copy link
Contributor

@shameersss1 shameersss1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, Hive/Pig would use the value from mapreduce.parent.job.id to set the correct committer UUID right?

@@ -119,6 +119,7 @@ public void abortOutput(VertexStatus.State finalState) throws IOException {
|| jobConf.getBoolean("mapred.mapper.new-api", false)) {
newApiCommitter = true;
}
jobConf.set(MRJobConfig.MR_PARENT_JOB_ID, new org.apache.hadoop.mapred.JobID(String.valueOf(getContext().getApplicationId().getClusterTimestamp()), getContext().getApplicationId().getId()).toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have String.valueOf(getContext().getApplicationId().getClusterTimestamp()), getContext().getApplicationId().getId()).toString( inside a method and re-use it in MROutput.java as well ?

@VenkatSNarayanan
Copy link
Contributor Author

If my understanding is correct, Hive/Pig would use the value from mapreduce.parent.job.id to set the correct committer UUID right?

Yes, that was the plan. The property name was just chosen arbitrarily so I could put the PR up, any suggestions for a better one are welcome.

This commit also adds the DAG identifier to the job UUID
to ensure that multiple jobs within the same session will
be assigned different UUIDs.
@tez-yetus

This comment was marked as outdated.

@tez-yetus

This comment was marked as outdated.

@VenkatSNarayanan
Copy link
Contributor Author

@shameersss1 We could actually just set fs.s3a.committer.uuid directly instead of the indirection through the other setting.

Switch UUID property name to the
one required by S3A committers.
@tez-yetus

This comment was marked as outdated.

Copy link
Contributor

@shameersss1 shameersss1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@shameersss1
Copy link
Contributor

@abstractdog - Could you please review the same ?

Refactors the implementation to reuse Tez's
DAGID type instead of hand-rolling our own.
@tez-yetus

This comment was marked as outdated.

@VenkatSNarayanan
Copy link
Contributor Author

@abstractdog @shameersss1 Is there anything else needed?

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor

left minor comments on this @VenkatSNarayanan , other than that, this looks good to me

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 25m 52s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+0 🆗 mvndep 6m 14s Maven dependency ordering for branch
+1 💚 mvninstall 12m 57s master passed
+1 💚 compile 1m 56s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 compile 1m 44s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 checkstyle 1m 58s master passed
+1 💚 javadoc 1m 44s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javadoc 1m 29s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+0 🆗 spotbugs 1m 20s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 47s master passed
_ Patch Compile Tests _
+0 🆗 mvndep 0m 10s Maven dependency ordering for patch
+1 💚 mvninstall 1m 9s the patch passed
+1 💚 compile 1m 17s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javac 1m 17s the patch passed
+1 💚 compile 1m 7s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 javac 1m 7s the patch passed
-0 ⚠️ checkstyle 0m 13s tez-api: The patch generated 1 new + 16 unchanged - 0 fixed = 17 total (was 16)
-0 ⚠️ checkstyle 0m 19s tez-mapreduce: The patch generated 3 new + 368 unchanged - 0 fixed = 371 total (was 368)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 52s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javadoc 0m 52s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 findbugs 3m 5s the patch passed
_ Other Tests _
+1 💚 unit 2m 17s tez-api in the patch passed.
+1 💚 unit 1m 23s tez-mapreduce in the patch passed.
+1 💚 unit 5m 0s tez-dag in the patch passed.
+1 💚 asflicense 0m 34s The patch does not generate ASF License warnings.
78m 8s
Subsystem Report/Notes
Docker ClientAPI=1.46 ServerAPI=1.46 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/8/artifact/out/Dockerfile
GITHUB PR #339
JIRA Issue TEZ-4547
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux e2f1dda150af 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 19b2351
Default Java Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/8/artifact/out/diff-checkstyle-tez-api.txt
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/8/artifact/out/diff-checkstyle-tez-mapreduce.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/8/testReport/
Max. process+thread count 423 (vs. ulimit of 5500)
modules C: tez-api tez-mapreduce tez-dag U: .
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/8/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog self-requested a review June 28, 2024 04:05
@abstractdog
Copy link
Contributor

one more thing @VenkatSNarayanan , please address checkstyle comments where applicable, thanks!

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 0m 20s Docker failed to build yetus/tez:86b11997b.
Subsystem Report/Notes
GITHUB PR #339
JIRA Issue TEZ-4547
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-339/9/console
versions git=2.34.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

/**
* Used by committers to set a job-wide UUID.
*/
public static final String JOB_COMMITTER_UUID = "job.committer.uuid";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the setting used by s3 committer right? How will it work ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a corresponding change I have in my Hadoop code where it will consult this property similar to how it consults the property Spark sets for this purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you can confirm this will work with job.committer.uuid, right?
can you link that point in hadoop code for later reference?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VenkatSNarayanan May i ask that if the hadoop s3 committer can work with Hive+Tez after this change?
IMO, s3/magic committer can avoid some operation like rename on s3, which can speed up/improve hive job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abstractdog I haven't publicly posted the Hadoop PR yet, but the change I have is to check for this property around here: https://github.com/apache/hadoop/blob/51cb858cc8c23d873d4adfc21de5f2c1c22d346f/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/AbstractS3ACommitter.java#L1372 similar to how the Spark property is checked. I have tested these changes together already alongside my Hive implementation.

@zhangbutao There are some corresponding changes to Hadoop and Hive that also need to be merged which I have. Once all 3 PRs(Tez, Hadoop and Hive have been merged), then the magic committer will be usable with Hive.

Copy link
Contributor

@abstractdog abstractdog Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go into Tez 0.10.4? if so, it would be good to have it in 1-2 weeks, just FYI, regarding planning the hadoop change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.10.4 would be ideal. In that case, let me loop in the Hadoop folks to see if they have any strong opinions about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VenkatSNarayanan https://issues.apache.org/jira/browse/HIVE-16295 I found a old ticket about integrating s3a committer, and it seems that supporting this needs lots of Hive code change.
I am not sure if you have done similar change in Hive to support the MagicS3GuardCommitter.
Anyway, I think it is very good to support this committer in Hive&Tez. Look forward to your further work.
Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://issues.apache.org/jira/browse/HADOOP-19091 I just saw your Hadoop ticket, and Hive change patch is also there too. Maybe you need create a PR against Hive latest master branch once you have done preparatory work. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There haven't been any objections from the Hadoop folks, I think it should be safe to go ahead with the patch as it is @abstractdog .

Copy link

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented. all s3a committers save a json _SUCCESS file (parser in hadoop-aws for older hadoop releases, in hadoop-mapreduce more recently). you can verify job id end to end with this,

@@ -78,6 +79,7 @@ public void initialize() throws IOException {
jobConf.getCredentials().mergeAll(UserGroupInformation.getCurrentUser().getCredentials());
jobConf.setInt(MRJobConfig.APPLICATION_ATTEMPT_ID,
getContext().getDAGAttemptNumber());
jobConf.set(MRJobConfig.JOB_COMMITTER_UUID, Utils.getDAGID(getContext()));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this unique across all jobs which may be writing to a table, even from other processes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This ID is unique to a DAG + attempt number - so if we have some other job, it'll have a different application ID component, while if an attempt fails and the DAG retries, the attempt number will be different.

@steveloughran
Copy link

ok. if you look into _SUCCESS json from an s3a or the manifest committer, then the job id is one of the root attributes, as is the source

there's a java definition of this in org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.ManifestSuccessData in recent hadoop-mapreduce binaries

@abstractdog
Copy link
Contributor

guys: @steveloughran , @VenkatSNarayanan : please let me know if this PR is fine to be merged to tez (from hadoop's point of view)? I'm about to start the release process of 0.10.4 soon
latest comment is that no objections, so I'm assuming we're fine with the current name of this config property

@abstractdog
Copy link
Contributor

guys: @steveloughran , @VenkatSNarayanan : please let me know if this PR is fine to be merged to tez (from hadoop's point of view)? I'm about to start the release process of 0.10.4 soon latest comment is that no objections, so I'm assuming we're fine with the current name of this config property

FYI: I'm about to merge this tomorrow to have this in tez 0.10.4

@abstractdog abstractdog merged commit 563b494 into apache:master Aug 5, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants