Skip to content

Conversation

@arunreddyav
Copy link

@arunreddyav arunreddyav commented Nov 4, 2025

Description of PR

JIRA: HDFS-17849. Fix for NN crash issue during token cleanup after updating the kerb auth rules to pickup new realm configuration from existing one.

How was this patch tested?

I have tested this change on Hadoop 3.4.1 by replacing the hadoop-common JARs.
The log message appears correctly in the NameNode logs as shared below, and the NameNode starts up successfully.

2025-10-30 14:49:54,582 WARN  delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:removeExpiredToken(776)) - Ignoring the exception in removeTokenForOwnerStats to remove expired delegation tokens from cache and proceeding to remove
java.lang.IllegalArgumentException: Illegal principal name spark/<hostname>@<old_realm>: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to spark/<hostname>@<old_realm>
        at org.apache.hadoop.security.User.<init>(User.java:51)
        at org.apache.hadoop.security.User.<init>(User.java:43)
        at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1458)
        at org.apache.hadoop.security.UserGroupInformation.createRemoteUser(UserGroupInformation.java:1441)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenIdentifier.getUser(AbstractDelegationTokenIdentifier.java:80)
        at org.apache.hadoop.hdfs.security.token.delegation.DelegationTokenIdentifier.getUser(DelegationTokenIdentifier.java:81)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.getTokenRealOwner(AbstractDelegationTokenSecretManager.java:923)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeTokenForOwnerStats(AbstractDelegationTokenSecretManager.java:945)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.removeExpiredToken(AbstractDelegationTokenSecretManager.java:774)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.access$400(AbstractDelegationTokenSecretManager.java:71)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:850)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: No rules applied to spark/<hostname>@<old_realm>
        at org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:429)
        at org.apache.hadoop.security.User.<init>(User.java:48)
        ... 11 more

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

…delegation tokens from cache and proceeding to remove
@arunreddyav arunreddyav changed the title Ignoring the exception in removeTokenForOwnerStats to remove expired … HDFS-17849 : Fix for Namenode crashed while cleaning up Expired Delegation tokens of older realm Nov 4, 2025
expiredTokens.add(entry.getKey());
removeTokenForOwnerStats(entry.getKey());
try {
removeTokenForOwnerStats(entry.getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @arunreddyav for your report and contribution, I am a little confused the token could be leak when thrown exception here. I think the smooth way is config the hadoop.security.auth_to_local when change the realm. What do you think about? Thanks again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The token will not be leaked as I'm catching the exception and cleaned up in the logExpireTokens(expiredTokens);.
  • Including the older rules under hadoop.security.auth_to_local could be a possible approach; however, the customer prefers not to include the older rules for security reasons (for ex :- when moved to more secure zone old keytabs should not be allowed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Got it. Make sense to me. However tokenOwnerStats could not be clean, this is one nit issue.
  2. 'when moved to more secure zone old keytabs should not be allowed' - I think this should be resolved at KDC side not Hadoop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @Hexiaoqiao . Our customers create cluster does a sample job checks with local kerberoes. Once initial setup is done they will configure LDAP/Active Directory through Ambari . Once AD realm is configured they can not keep old realm based auth rules as its against the security

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 21m 38s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 1s trunk passed
+1 💚 compile 17m 17s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 compile 17m 20s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 52s trunk passed
+1 💚 javadoc 1m 20s trunk passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 16s trunk passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
-1 ❌ spotbugs 3m 10s /branch-spotbugs-hadoop-common-project_hadoop-common-warnings.html hadoop-common-project/hadoop-common in trunk has 448 extant spotbugs warnings.
+1 💚 shadedclient 34m 13s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 10s the patch passed
+1 💚 compile 16m 9s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javac 16m 9s the patch passed
+1 💚 compile 17m 21s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 58s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 1 new + 23 unchanged - 0 fixed = 24 total (was 23)
+1 💚 mvnsite 1m 50s the patch passed
+1 💚 javadoc 1m 18s the patch passed with JDK Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 15s the patch passed with JDK Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
+1 💚 spotbugs 3m 23s the patch passed
+1 💚 shadedclient 33m 13s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 22m 43s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 asflicense 1m 2s The patch does not generate ASF License warnings.
240m 57s
Reason Tests
Failed junit tests hadoop.security.ssl.TestDelegatingSSLSocketFactory
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8054/1/artifact/out/Dockerfile
GITHUB PR #8054
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 8e4466ebe5a7 5.15.0-156-generic #166-Ubuntu SMP Sat Aug 9 00:02:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 6005bdd
Default Java Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.7+6-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.15+6-Ubuntu-0ubuntu120.04
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8054/1/testReport/
Max. process+thread count 3149 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8054/1/console
versions git=2.25.1 maven=3.9.11 spotbugs=4.9.7
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@sujith71955
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants