Skip to content

HADOOP-18296. Memory fragmentation in ChecksumFileSystem readVectored() #7732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: branch-3.4
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

Description of PR

  • Lets you turn off checksumming in local fs (but not hdfs!) with option file.verify-checksum
  • Has copy of Parquet's TrackingByteBufferAllocator, modified for Hadoop APIs and named
    TrackingByteBufferPool; not yet used in tests.
  • New capability "fs.capability.vectoredio.sliced" to declare that you slice buffers.

How was this patch tested?

no tests yet.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran steveloughran marked this pull request as draft June 9, 2025 18:16
@steveloughran steveloughran requested a review from Copilot June 9, 2025 18:16
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses memory fragmentation issues in the vectored IO implementation within ChecksumFileSystem by introducing configurable checksum verification, enhanced logging, and improved buffer management. Key changes include:

  • Adding a configurable option (file.verify-checksum) and corresponding accessor in LocalFileSystem and ChecksumFileSystem.
  • Introducing a new TrackingByteBufferPool for tracking buffer usage.
  • Updating vectored IO logic and related documentation, including buffer slicing and new capability definitions.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
TestViewFileSystemDelegation.java Added an @OverRide annotation to the getVerifyChecksum() method.
fsdatainputstream.md Updated and expanded documentation sections covering direct buffer reads and buffer slicing.
Sizes.java Introduced a new constant S_0 for representing 0 bytes.
TrackingByteBufferPool.java Added a new buffer pool implementation to track allocations and releases.
VectoredReadUtils.java Added a new hasVectorIOCapability() method that supports vectored IO capability lookup.
StreamCapabilities.java Added a new constant VECTOREDIO_BUFFERS_SLICED and updated interface documentation.
RawLocalFileSystem.java Updated capability handling and logging for vectored read operations.
LocalFileSystemConfigKeys.java Introduced the LOCAL_FS_VERIFY_CHECKSUM config key with documentation.
LocalFileSystem.java Integrated checksum verification setting during initialization with logging.
ChecksumFileSystem.java Added a getVerifyChecksum() method, updated vectored read logic and capability handling.
Comments suppressed due to low confidence (4)

hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md:692

  • [nitpick] The sentence in the Buffer Slicing section appears to have an extraneous comma and could be rephrased for clarity. Consider revising it to clearly state the conditions under which buffer slicing is applied.
those with checksums,  

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/Sizes.java:35

  • [nitpick] The constant name 'S_0' may be unclear at first glance. Consider renaming it to 'ZERO_BYTES' for improved readability.
public static final int S_0 = 0;

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ChecksumFileSystem.java:1186

  • Ensure that returning getVerifyChecksum() for the VECTOREDIO_BUFFERS_SLICED capability correctly reflects the intended semantics for when buffers are sliced. It may be beneficial to explicitly document this behavior or consider whether a separate flag is more appropriate.
case StreamCapabilities.VECTOREDIO_BUFFERS_SLICED:

hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/VectoredReadUtils.java:488

  • Consider expanding the hasVectorIOCapability() method to either support the 'fs.capability.vectoredio.sliced' capability or document that it intentionally only handles 'in:readvectored'.
switch (capability.toLowerCase(Locale.ENGLISH)) {

@steveloughran steveloughran force-pushed the s3/HADOOP-18296-readbuffer-leak-branch-3.4 branch from a19a65f to 86d6bac Compare June 9, 2025 18:53
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 40s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 40m 9s branch-3.4 passed
+1 💚 compile 19m 6s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 16m 52s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 16s branch-3.4 passed
+1 💚 mvnsite 1m 36s branch-3.4 passed
+1 💚 javadoc 1m 8s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 51s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 2m 32s branch-3.4 passed
+1 💚 shadedclient 37m 6s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 17m 47s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 47s the patch passed
+1 💚 compile 16m 15s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 16m 15s the patch passed
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
-0 ⚠️ checkstyle 1m 8s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 10 new + 71 unchanged - 0 fixed = 81 total (was 71)
+1 💚 mvnsite 1m 34s the patch passed
+1 💚 javadoc 1m 9s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 50s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 2m 43s the patch passed
+1 💚 shadedclient 38m 23s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 24s hadoop-common in the patch passed.
+1 💚 asflicense 1m 5s The patch does not generate ASF License warnings.
223m 45s
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/artifact/out/Dockerfile
GITHUB PR #7732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux a0b236b49d33 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / a19a65f
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/testReport/
Max. process+thread count 2093 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 19m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ branch-3.4 Compile Tests _
+1 💚 mvninstall 39m 7s branch-3.4 passed
+1 💚 compile 19m 28s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 20m 30s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 1m 26s branch-3.4 passed
+1 💚 mvnsite 1m 54s branch-3.4 passed
+1 💚 javadoc 1m 27s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 54s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 4s branch-3.4 passed
+1 💚 shadedclient 42m 58s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 56s the patch passed
+1 💚 compile 18m 20s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 18m 20s the patch passed
+1 💚 compile 17m 52s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 17m 52s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 1m 16s /results-checkstyle-hadoop-common-project_hadoop-common.txt hadoop-common-project/hadoop-common: The patch generated 10 new + 71 unchanged - 0 fixed = 81 total (was 71)
+1 💚 mvnsite 1m 36s the patch passed
+1 💚 javadoc 1m 9s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 50s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 2m 43s the patch passed
+1 💚 shadedclient 39m 43s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 35s hadoop-common in the patch passed.
+1 💚 asflicense 1m 4s The patch does not generate ASF License warnings.
256m 12s
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/artifact/out/Dockerfile
GITHUB PR #7732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets markdownlint
uname Linux eae6368fc6d0 5.15.0-139-generic #149-Ubuntu SMP Fri Apr 11 22:06:13 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / 86d6bac
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/testReport/
Max. process+thread count 1263 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@steveloughran
Copy link
Contributor Author

trying to put this together. No attempts to address, just

  • explicit ability to turn it off in localfs
  • capability to declare when slicing still takes place (which does nothing for older releases)

Wire up TrackingByteBufferPool to vector read tests,
with test suite AbstractContractVectoredReadTest
adding test testBufferSlicing() to generate
conditions which may trigger slicing.

Only files which declare that they may
slicing buffers are permitted to return
buffers to the pool which aren't known about,
or to close the pool with outstanding entries.

So: no fix, just public declaration of behavior
and test to verify that no other streams are doing
it.
@steveloughran steveloughran force-pushed the s3/HADOOP-18296-readbuffer-leak-branch-3.4 branch from 1096112 to 995fcc4 Compare June 10, 2025 16:47
@hadoop-yetus

This comment was marked as outdated.

@hadoop-yetus

This comment was marked as outdated.

@steveloughran
Copy link
Contributor Author

./hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/impl/TrackingByteBufferPool.java:101:  public static class LeakDetectorHeapByteBufferPoolException extends RuntimeException {: Class LeakDetectorHeapByteBufferPoolException should be declared as final. [FinalClass]
./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java:643:    TrackingByteBufferPool pool = TrackingByteBufferPool.wrap(getPool());:28: 'pool' hides a field. [HiddenField]
./hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/contract/AbstractContractVectoredReadTest.java:679:      LOG.info("Slicing is enabled; we saw leaked buffers: {} after {} releases of unknown bufferfs",: Line is longer than 100 characters (found 101). [LineLength]

* <p>
* {@value}.
*/
public static final String LOCAL_FS_VERIFY_CHECKSUM = "file.verify-checksum";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename fs.file.checksum.very and delclare in core-defaults to make more visible

@steveloughran steveloughran marked this pull request as ready for review June 11, 2025 15:29
@hadoop-yetus

This comment was marked as outdated.

* The key maps by the object id of the buffer, and refers to either a common stack trace
* or one dynamically created for each allocation.
*/
private final Map<Key, ByteBufferAllocationStacktraceException> allocated =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be replaced with java.util.IdentityHashMap, which will make the Key class redundant:

Suggested change
private final Map<Key, ByteBufferAllocationStacktraceException> allocated =
private final Map<ByteBuffer, ByteBufferAllocationStacktraceException> allocated = new IdentityHashMap();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense; I just lifted the parquet code

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 20s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+0 🆗 xmllint 0m 1s xmllint was not available.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 5 new or modified test files.
_ branch-3.4 Compile Tests _
+0 🆗 mvndep 1m 44s Maven dependency ordering for branch
+1 💚 mvninstall 36m 15s branch-3.4 passed
+1 💚 compile 17m 45s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 16m 8s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 4m 32s branch-3.4 passed
+1 💚 mvnsite 2m 33s branch-3.4 passed
+1 💚 javadoc 1m 51s branch-3.4 passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 45s branch-3.4 passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 3m 56s branch-3.4 passed
+1 💚 shadedclient 35m 56s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 32s Maven dependency ordering for patch
+1 💚 mvninstall 1m 28s the patch passed
+1 💚 compile 17m 59s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 59s the patch passed
+1 💚 compile 16m 48s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 javac 16m 48s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 29s /results-checkstyle-root.txt root: The patch generated 1 new + 72 unchanged - 0 fixed = 73 total (was 72)
+1 💚 mvnsite 2m 30s the patch passed
+1 💚 javadoc 1m 52s the patch passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 43s the patch passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 4m 12s the patch passed
+1 💚 shadedclient 36m 6s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 19m 30s hadoop-common in the patch passed.
+1 💚 unit 3m 12s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 7s The patch does not generate ASF License warnings.
251m 9s
Subsystem Report/Notes
Docker ClientAPI=1.50 ServerAPI=1.50 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/artifact/out/Dockerfile
GITHUB PR #7732
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint markdownlint
uname Linux 9cb8ad193468 5.15.0-136-generic #147-Ubuntu SMP Sat Mar 15 15:53:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision branch-3.4 / de78242
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/testReport/
Max. process+thread count 3148 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7732/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@@ -400,7 +398,9 @@ private void initiateRead() {
for(int i = 0; i < ranges.size(); ++i) {
FileRange range = ranges.get(i);
buffers[i] = allocateRelease.getBuffer(false, range.getLength());
channel.read(buffers[i], range.getOffset(), i, this);
final ByteBuffer buffer = buffers[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this refactoring and making it final?

into the file formats themselves, or the underlying storage system.
Even when verification is enabled, files without associated checksum files
.$FILENAME.crc are never be verified.
When checksum verification is disabled, vector reads of date will always returne
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: returne

The local filesystem will not slice buffers if the checksum file
of `filename + ".crc"` is not found. This is not declared in the
filesystem `hasPathCapability(filename, "fs.capability.vectoredio.sliced")`
call, as no checks for the checksum file are made then.ddddddddddddddddd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: then.ddddddddddddddddd

@iwasakims
Copy link
Member

LGTM overall pending comments about typos.

The expected message was emitted on AbstractContractVectoredReadTest.

2025-06-18 22:19:40,235 [JUnit-testBufferSlicing[Buffer type : array]] INFO  contract.AbstractContractVectoredReadTest (AbstractContractVectoredReadTest.java:testBufferSlicing(679)) - Slicing is enabled; we saw leaked buffers: 16 after 8 releases of unknown buffers

The modified ITestS3AContractVectoredRead passed against Tokyo region without messages mentioning sliced/leaked buffers.

$ mvn verify -Dtest=x -Dit.test=ITestS3AContractVectoredRead
...
[INFO] -------------------------------------------------------
[INFO]  T E S T S
[INFO] -------------------------------------------------------
[INFO] Running org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead
[INFO] Tests run: 60, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 38.202 s - in org.apache.hadoop.fs.contract.s3a.ITestS3AContractVectoredRead

@iwasakims
Copy link
Member

The failure of TestCommonConfigurationFields looks caused by the property added to core-default.xml.

[ERROR] testCompareXmlAgainstConfigurationClass(org.apache.hadoop.conf.TestCommonConfigurationFields)  Time elapsed: 0.27 s  <<< FAILURE!
java.lang.AssertionError: core-default.xml has 1 properties missing in  class org.apache.hadoop.fs.CommonConfigurationKeys  class org.apache.hadoop.fs.CommonConfigurationKeysPublic  class org.apache.hadoop.fs.local.LocalConfigKeys  class org.apache.hadoop.fs.ftp.FtpConfigKeys  class org.apache.hadoop.ha.SshFenceByTcpPort  class org.apache.hadoop.security.LdapGroupsMapping  class org.apache.hadoop.ha.ZKFailoverController  class org.apache.hadoop.security.ssl.SSLFactory  class org.apache.hadoop.security.CompositeGroupsMapping  class org.apache.hadoop.io.erasurecode.CodecUtil  class org.apache.hadoop.security.RuleBasedLdapGroupsMapping Entries:   fs.file.checksum.verify expected:<0> but was:<1>

* Configuring this class to log at DEBUG will trigger their collection.
* @see ByteBufferAllocationStacktraceException
* <p>
* Adapted from Parquet class {@code org.apache.parquet.bytes.TrackingByteBufferAllocator}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest mentioning which version of Parquet you adapted this from. In the future, that will help us understand if our copy needs to pick up additional bug fixes.

reading data may not be detected during the read operation.
Use with care in production.)

Filesystem instances which spit buffersduring vector read operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "...which split buffers during..."

@@ -42,5 +42,6 @@ public class LocalFileSystemConfigKeys extends CommonConfigurationKeys {
public static final String LOCAL_FS_CLIENT_WRITE_PACKET_SIZE_KEY =
"file.client-write-packet-size";
public static final int LOCAL_FS_CLIENT_WRITE_PACKET_SIZE_DEFAULT = 64*1024;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change?

return true;
default:
return false;
return hasVectorIOCapability(capability);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this actually change the logic? It seems like hasVectorIOCapability checks StreamCapabilities.VECTOREDIO, just like the old code did.

* <p>
* Adapted from Parquet class {@code org.apache.parquet.bytes.TrackingByteBufferAllocator}.
*/
public final class TrackingByteBufferPool implements ByteBufferPool, AutoCloseable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put visibility annotations on this?

Alternatively, if it's only ever intended for use in tests, should it go into the test path instead of main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants