-
Notifications
You must be signed in to change notification settings - Fork 9.2k
HDFS-15413. add dfs.client.read.striped.datanode.max.attempts to fix read ecfile timeout #5829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💔 -1 overall
This message was automatically generated. |
|
cc @zhangshuyan0 Would you mind to take a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add tests around this, which can reproduce these issues, maybe by setting a lower value for socket timeout.
Should cover scenarios, where
- Connection to DN containing DataBlock is established.
- Connection to DN containing ParityBlock is established.
- When there are missing/lost nodes in the pipeline
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Outdated
Show resolved
Hide resolved
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Outdated
Show resolved
Hide resolved
|
Please also check the checkstyle and blannks reported by Yetus. Thanks. @Neilxzn |
Fix these checkstyle and add unit test. Please review it again. Thanks |
|
💔 -1 overall
This message was automatically generated. |
|
@ayushtkn @zhangshuyan0 looks like the remaining failing checks are unrelated, and the feedback was addressed. Any chance for another look? |
ayushtkn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this test is reproducing the issue for me, I reverted the changes in StripeReader and ran the test & it still passed.
If that gets sorted, We should add a test, where one DN is dead, like same test, but kill a DN
...t/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java
Show resolved
Hide resolved
...t/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSStripedInputStreamWithTimeout.java
Outdated
Show resolved
Hide resolved
|
Hi @Neilxzn Any progress here? Thanks. |
|
Hi @Neilxzn , any chance you have time to finish this up? |
Sorry for the late reply. I have been busy with other things recently. I will try to submit a new unit test tomorrow. |
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
|
I can pass the unit test hadoop.hdfs.TestDFSStripedInputStreamWithTimeout in my local development environment, but it fails on GitHub Jenkins. |
|
@Neilxzn I tried & it fails locally To reproduce: |
Thank you. I will check it again soon. |
|
Hi @Neilxzn Any progress here? Thanks. this PR is still necessary, there are some similar problems in our environment~ |
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Outdated
Show resolved
Hide resolved
|
💔 -1 overall
This message was automatically generated. |
|
please fix checkstyle, thanks~ |
Should we suppress this checkstyle warning? Or are there any better suggestions? |
|
🎊 +1 overall
This message was automatically generated. |
|
I believe we've started encountering this issue as well, would be great to get this in |
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Show resolved
Hide resolved
|
Hi, @Neilxzn . Thanks for reporting this problem. Can we push it forward? |
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/StripeReader.java
Show resolved
Hide resolved
|
Rebase it. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |

Description of PR
https://issues.apache.org/jira/browse/HDFS-15413
Offer a available patch to fix HDFS-15413. This patch add dfs.client.read.striped.datanode.max.attempts config to allow users to adjust the number of dn retries to solve the problem of Datanode timeout when reading EC files.
How was this patch tested?
no add test. just test in our cluster
For code changes:
add dfs.client.read.striped.datanode.max.attempts config